Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13425

"run 'lfs migrate -m 1 -c 1 -H 3 dir1' to finish migration" is broken

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.14.0
    • Lustre 2.14.0, Lustre 2.12.5
    • 3
    • 9223372036854775807

    Description

      While testing LU-13424 I hit an error during directory migration. Then, I tried mirroring the directory back to the original, just to check what would happen:

      tests# lfs migrate -m 1 /mnt/testfs/dir1
      lfs migrate: /mnt/testfs/dir1/hosts migrate failed: Operation not supported (95)
      tests# lfs migrate -m 0 /mnt/testfs/dir1
      LustreError: 30963:0:(mdd_dir.c:4209:mdd_migrate()) testfs-MDD0000: 'dir1' migration was interrupted, run 'lfs migrate -m 1 -c 1 -H 3 dir1' to finish migration.
      tests# lfs migrate -m1 -c 1 -H 3 /mnt/testfs/dir1
      lfs migrate migrate: bad stripe hash type '3'
      tests# lfs getdirstripe /mnt/testfs/dir1
      lmv_stripe_count: 2 lmv_stripe_offset: 1 lmv_hash_type: crush,migrating
      mdtidx           FID[seq:oid:ver]
           1           [0x240001b71:0xf:0x0]          
           0           [0x200001b72:0xe480:0x0]
      

      so it printed the "run 'lfs migrate ...'" error to the console, but in fact that command doesn't work because the numeric hash value "-H 3" is not accepted by "lfs migrate".

      The simplest fix is to allow specifying the numeric hash type like "lfs migrate ... -H 3" in order to resume directory migration, as stated in the error message.

      I don't think that "lfs" or the client should even try to validate this hash type before passing it to the MDS, since the client may be old, and the directory is using a new hash that it doesn't know about. The MDS should reject invalid hash types from the client anyway (e.g. malicious user, or new client and old server).

      The MDS really shouldn't even need the hash type or other arguments to be passed, since it already knows this information itself (since it generated the message in the first place). It would be better (if possible) to just print "run 'lfs migrate <full_path>' to finish migration" (maybe using fid2path to generate the pathname?). Best would be to restart the migration automatically if this is hit (at least once, but not repeatedly if it is broken for some reason like LU-13424).

      Attachments

        Issue Links

          Activity

            People

              emoly.liu Emoly Liu
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: