Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16522

"lfs setstripe -i N" with deactivated OST(s) always picks next active OST

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.14.0, Lustre 2.16.0, Lustre 2.15.0
    • None
    • 3
    • 9223372036854775807

    Description

      When OSTs are disabled with osp.*.max_create_count=0 and files are created using a specific file layout (i.e. one that explicitly selects the starting OST index) for an OST that is inactive, this will cause the MDS object allocator to always select the first available OST after the disabled ones.

      For example, on an 8-OST filesystem where OST0000-OST0003 are all disabled, trying to create files explicitly on any of those OSTs will always result in the objects being allocated from OST0004.

      # lctl set_param osp.testfs-OST000[0-3]*.max_create_count=0
      osp.testfs-OST0000-osc-MDT0000.max_create_count=0
      osp.testfs-OST0001-osc-MDT0000.max_create_count=0
      osp.testfs-OST0002-osc-MDT0000.max_create_count=0
      osp.testfs-OST0003-osc-MDT0000.max_create_count=0
      # for O in {0..3}; do lfs setstripe -i $O /mnt/testfs/ost$O; done
      # lfs getstripe -i /mnt/testfs/ost* | sort | uniq -c
            4 
            4 4
      

      This issue does not affect "normal" file creations that do not specify the starting OST index of files:

      # touch /mnt/testfs/file{0..99}
      # lfs getstripe -i /mnt/testfs/file{0..99} | sort | uniq -c 
          100
           25 4
           25 5
           25 6
           25 7
      

      It does affect "lfs migrate" without any given layout due to it copying the layout from the files that includes the starting index (LU-16500).

      # lfs getstripe -i /mnt/testfs/old* | sort | uniq -c
          100 
           13 0
           13 1
           13 2
           12 3
           12 4
           12 5
           12 6
           13 7
      # lfs migrate /mnt/testfs/old*
      # lfs getstripe -i /mnt/testfs/old* | sort | uniq -c
          100 
           55 4
           15 5
           15 6
           15 7
      

      If a large number of OSTs were disabled while "lfs migrate" is used without any arguments (copying the specific layout from the source file prior to patch https://review.whamcloud.com/49865 "LU-16500 utils: set default ost index for lfs migrate" being applied, as in the example above), this would cause all migrated files to use the same (first) OST after the disabled OSTs. Running the "lfs migrate" command with a new layout does not hit this problem:

      # lfs getstripe -i /mnt/testfs/old* | sort | uniq -c
          100 
           13 0
           13 1
           12 2
           12 3
           12 4
           12 5
           13 6
           13 7
      # lfs migrate -c 1 /mnt/testfs/old*
      # lfs getstripe -i /mnt/testfs/old* | sort | uniq -c
          100 
           25 4
           25 5
           25 6
           25 7
      

      Fixing "lfs migrate" to reset the OST index in the source layout avoids the problem in this case, but it would also be worthwhile to also fix the problem in the MDS LOD OST selection code, so that other tools which provide specific layouts via saved/copied xattrs (e.g. "tar", and maybe "rsync" or "cp" in the future) will not encounter the same problem. If the client explicitly requests an OST index that is inactive or disabled, the MDS should pick a random or weighted OST index (possibly within the same OST pool) rather than just picking the next available OST index.

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: