Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13439

DNE3: MDT QOS tuning to avoid full MDTs completely

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.15.0
    • None
    • 3
    • 9223372036854775807

    Description

      Testing for LU-13417 showed that "lfs setdirstripe -D -c 1 -i -1 /mnt/testfs" now caused subdirectories to be created on different MDTs when the qos_threshold_rr was reduced. However, there were still errors hit when one MDT ran out of space, when there are free inodes (and also reported multiple kernel errors). For mkdir this is a real problem because each directory needs at least one block, so the QOS code should completely avoid selection of MDTs with little free space (e.g. below 5% of the average MDT free space).

      # lfs df
      UUID                   1K-blocks        Used   Available Use% Mounted on
      testfs-MDT0000_UUID       125368        9508      104624   9% /mnt/testfs[MDT:0]
      testfs-MDT0001_UUID       125368       93560       20572  82% /mnt/testfs[MDT:1]
      
      # lfs df -i
      UUID                      Inodes       IUsed       IFree IUse% Mounted on
      testfs-MDT0000_UUID       100000       20295       79705  21% /mnt/testfs[MDT:0]
      testfs-MDT0001_UUID       100000       40580       59420  41% /mnt/testfs[MDT:1]
      
      # ./createmany -d /mnt/testfs/dir 1000
      total: 1000 mkdir in 0.57 seconds: 1768.49 ops/second
      [root@centos7 tests]# lfs getdirstripe -m /mnt/testfs/dir* | sort | uniq -c
          871 0
          129 1
      
      # ./createmany -d /mnt/testfs/dsub/d 1000
      total: 1000 mkdir in 1.64 seconds: 608.97 ops/second
      # lfs getdirstripe -m /mnt/testfs/dsub/d[0-9]* | sort | uniq -c
          860 0
          140 1
      

      These showed a reasonable distribution of directories, over 85% of directories going to MDT0000.

      However, when creating more directories the space balance doesn't change very much:

      # ./createmany -d /mnt/testfs/dsub/d 1000 9000
       - mkdir 5742 (time 1586376135.48 total 10.00 last 574.16)
      mkdir(/mnt/testfs/dsub/d9621) error: No space left on device
      total: 8621 mkdir in 15.30 seconds: 563.46 ops/second
      # lfs df
      UUID                   1K-blocks        Used   Available Use% Mounted on
      testfs-MDT0000_UUID       125368       48572       65560  43% /mnt/testfs[MDT:0]
      testfs-MDT0001_UUID       125368      125368           0 100% /mnt/testfs[MDT:1]
      
      # lfs df -i
      UUID                      Inodes       IUsed       IFree IUse% Mounted on
      testfs-MDT0000_UUID       100000       29764       70236  30% /mnt/testfs[MDT:0]
      testfs-MDT0001_UUID       100000       50330       49670  51% /mnt/testfs[MDT:1]
      

      This shows that the mkdir is failing with -ENOSPC in the "-i -1" directory even though MDT0000 is still having a lot of free blocks and space. Checking the distribution of files that were created show that the distribution didn't change very much:

      # lfs getdirstripe -m /mnt/testfs/dsub/d1[0-9][0-9][0-9] | sort | uniq -c
          891 0
          109 1
      # lfs getdirstripe -m /mnt/testfs/dsub/d2[0-9][0-9][0-9] | sort | uniq -c
          882 0
          118 1
      # lfs getdirstripe -m /mnt/testfs/dsub/d3[0-9][0-9][0-9] | sort | uniq -c
          887 0
          113 1
      # lfs getdirstripe -m /mnt/testfs/dsub/d4[0-9][0-9][0-9] | sort | uniq -c
          881 0
          119 1
      # lfs getdirstripe -m /mnt/testfs/dsub/d5[0-9][0-9][0-9] | sort | uniq -c
          884 0
          116 1
      # lfs getdirstripe -m /mnt/testfs/dsub/d6[0-9][0-9][0-9] | sort | uniq -c
          862 0
          138 1
      # lfs getdirstripe -m /mnt/testfs/dsub/d7[0-9][0-9][0-9] | sort | uniq -c
          884 0
          116 1
      # lfs getdirstripe -m /mnt/testfs/dsub/d8[0-9][0-9][0-9] | sort | uniq -c
          886 0
          114 1
      # lfs getdirstripe -m /mnt/testfs/dsub/d9[0-9][0-9][0-9] | sort | uniq -c
          554 0
           67 1
      

      I figure that this may be related to the "qos_maxage=60" on the client causing it not to get a new space update while "createmany -d" is running, and the relatively small amount of space on the MDTs. However, even if I waited a long time it is not allowing files to create on the empty MDT:

      # ./createmany -d /mnt/testfs/dsub/d 10000 1000
      mkdir(/mnt/testfs/dsub/d10001) error: No space left on device
      total: 1 mkdir in 0.01 seconds: 104.60 ops/second
      

      I think two improvements are needed:

      • the QOS code should avoid allocating on an MDT before it becomes too full. We should limit the space/inode used to minimum ~10% of the average free space across all MDTs. This will avoid hitting -ENOSPC during creation, either from the directory or the llogs. Since directories take space, we should consider either free blocks or inodes much lower than average as a reason not to use the MDT.
      • the default "qos_threshold_rr=17%" is too high to start balancing directory creation across MDTs. This might mean that a large MDT0000 is used for many millions of files and top-level directories before any balancing is even started. At that point it will be harder to return the balance of the MDTs because so many top-level directories and subdirectories have been created on MDT0000. I think it would be better to have a smaller "qos_threshold_rr=5%" or "=10%" by default, to avoid the MDTs becoming too imbalanced before starting QOS.

      Attachments

        Issue Links

          Activity

            People

              laisiyao Lai Siyao
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: