Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14124

super slow i/o on client maybe related to low grant

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.12.8, Lustre 2.15.0
    • Lustre 2.12.5
    • None
    • centos7, x86_64, OPA, lustre 2.12.5, zfs 0.8.5
    • 3
    • 9223372036854775807

    Description

      Hi,

      we're seeing slow i/o (~kB/s) to some OSTs from some clients. this seems to be correlated with low grant on the client.

      here is a reproducer that sets up striped dirs (one per OST) and then does a small dd to each.

      for f in {0..19}; do mkdir $f ; lfs setstripe -i $f $f; done
      for f in {0..19}; do echo $f ; dd if=/tmp/50m of=$f/blah9i & done
      

      where /tmp/50m is a almost incompressible file of size 64149552 bytes.

      a few seconds later, some dd's have finished quickly, and others will literally take 6 hours

      [me@farnarkle2 test]$ ls -l */blah9i | sort -n -k9
      -rw-r--r-- 1 me oz997 64149552 Nov  6 18:56 0/blah9i
      -rw-r--r-- 1 me oz997   423424 Nov  6 18:58 1/blah9i
      -rw-r--r-- 1 me oz997 64149552 Nov  6 18:56 2/blah9i
      -rw-r--r-- 1 me oz997   409600 Nov  6 18:58 3/blah9i
      -rw-r--r-- 1 me oz997   409088 Nov  6 18:58 4/blah9i
      -rw-r--r-- 1 me oz997   253440 Nov  6 18:58 5/blah9i
      -rw-r--r-- 1 me oz997 64149552 Nov  6 18:56 6/blah9i
      -rw-r--r-- 1 me oz997 64149552 Nov  6 18:56 7/blah9i
      -rw-r--r-- 1 me oz997   158720 Nov  6 18:58 8/blah9i
      -rw-r--r-- 1 me oz997   570368 Nov  6 18:58 9/blah9i
      -rw-r--r-- 1 me oz997   574464 Nov  6 18:58 10/blah9i
      -rw-r--r-- 1 me oz997   569344 Nov  6 18:58 11/blah9i
      -rw-r--r-- 1 me oz997   379904 Nov  6 18:58 12/blah9i
      -rw-r--r-- 1 me oz997 64149552 Nov  6 18:56 13/blah9i
      -rw-r--r-- 1 me oz997 64149552 Nov  6 18:56 14/blah9i
      -rw-r--r-- 1 me oz997 64149552 Nov  6 18:56 15/blah9i
      -rw-r--r-- 1 me oz997    84992 Nov  6 18:58 16/blah9i
      -rw-r--r-- 1 me oz997 64149552 Nov  6 18:56 17/blah9i
      -rw-r--r-- 1 me oz997 64149552 Nov  6 18:56 18/blah9i
      -rw-r--r-- 1 me oz997   389120 Nov  6 18:58 19/blah9i
      

      the slow i/o seems to mostly but not completely correlate with low grant on the client

      [me@farnarkle2 test]$ grep -r . /proc/fs/lustre/osc/dagg-OST*/*grant*
      /proc/fs/lustre/osc/dagg-OST0000-osc-ffff9f785aa2a800/cur_grant_bytes:3114734
      /proc/fs/lustre/osc/dagg-OST0001-osc-ffff9f785aa2a800/cur_grant_bytes:1832833
      /proc/fs/lustre/osc/dagg-OST0002-osc-ffff9f785aa2a800/cur_grant_bytes:1881800703
      /proc/fs/lustre/osc/dagg-OST0003-osc-ffff9f785aa2a800/cur_grant_bytes:2050095
      /proc/fs/lustre/osc/dagg-OST0004-osc-ffff9f785aa2a800/cur_grant_bytes:1902694
      /proc/fs/lustre/osc/dagg-OST0005-osc-ffff9f785aa2a800/cur_grant_bytes:1946593
      /proc/fs/lustre/osc/dagg-OST0006-osc-ffff9f785aa2a800/cur_grant_bytes:1879834623
      /proc/fs/lustre/osc/dagg-OST0007-osc-ffff9f785aa2a800/cur_grant_bytes:1876295679
      /proc/fs/lustre/osc/dagg-OST0008-osc-ffff9f785aa2a800/cur_grant_bytes:1867376
      /proc/fs/lustre/osc/dagg-OST0009-osc-ffff9f785aa2a800/cur_grant_bytes:2035056
      /proc/fs/lustre/osc/dagg-OST000a-osc-ffff9f785aa2a800/cur_grant_bytes:1674002
      /proc/fs/lustre/osc/dagg-OST000b-osc-ffff9f785aa2a800/cur_grant_bytes:1790763
      /proc/fs/lustre/osc/dagg-OST000c-osc-ffff9f785aa2a800/cur_grant_bytes:1979790
      /proc/fs/lustre/osc/dagg-OST000d-osc-ffff9f785aa2a800/cur_grant_bytes:30273
      /proc/fs/lustre/osc/dagg-OST000e-osc-ffff9f785aa2a800/cur_grant_bytes:60219801
      /proc/fs/lustre/osc/dagg-OST000f-osc-ffff9f785aa2a800/cur_grant_bytes:251557488
      /proc/fs/lustre/osc/dagg-OST0010-osc-ffff9f785aa2a800/cur_grant_bytes:1891954
      /proc/fs/lustre/osc/dagg-OST0011-osc-ffff9f785aa2a800/cur_grant_bytes:33611568
      /proc/fs/lustre/osc/dagg-OST0012-osc-ffff9f785aa2a800/cur_grant_bytes:9298722
      /proc/fs/lustre/osc/dagg-OST0013-osc-ffff9f785aa2a800/cur_grant_bytes:1591979
      

      this effect is most noticeable with the above default bs=512 byte dd. the slow files are writing at a few kB/s. the normal OSTs write this small i/o at ballpark 150 MB/s.
      with dd bs=1M it's the same slow OSTs, but they run at a minimum of ~1 MB/s. normal OSTs write at ballpark 700 MB/s with 1M i/o's.
      so it kinda feels like ops/second or rcps_in_flight is being limited rather than absolute MB/s, if you know what I mean.

      from other client nodes with larger grant showing, we don't have the same issues.
      the above client is a login node, so has a lot of random stuff going on.

      rebooting the client, or rebooting the servers with the OSTs involved seems to clear the problem, but I don't have any other workarounds. I can't seem to find any grant tweakables on client or server, or in the manual, but perhaps I'm missing something.

      there are ~250 client nodes. OSTs are very large at ~300T. OSS ram is 192g, but we've told zfs ARC to use 75% of it. reducing zfs_arc_max on the fly to try to give more OSS ram to lustre doesn't seem to immediately help.

      we updated lustre on servers from 2.12.4 + zfs 0.8.3 to 2.12.5 + zfs 0.8.5 this week to see if this issue was resolved in 2.12.5, but the issue still seems to be there. clients have been 2.12.5 for ages.

      I forget if lustre grant is related to quota. anyway, this is a group quota'd filesystem and the above oz997 group is not close.

      Disk quotas for grp oz997 (gid 10273):
           Filesystem    used   quota   limit   grace   files   quota   limit   grace
                /fred  4.859T      0k     10T       -  609477       0 2000000       -
      

      any ideas how to debug this further or for a less crappy workaround than rebooting things?
      thanks.

      cheers,
      robin

      Attachments

        Issue Links

          Activity

            People

              tappro Mikhail Pershin
              scadmin SC Admin
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: