Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13492

lfs migrate -m returns Operation not permitted

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.12.4
    • None
    • CentOS 7.6 Kernel 3.10.0-957.27.2.el7_lustre.pl2.x86_64
    • 3
    • 9223372036854775807

    Description

      Hello!

      When using lfs migrate -m to migrate directories across MDTs, we sometimes face LU-13298 (lfs migrate does not work yet with DoM files) for which we do have a workaround (ie. we restripe the files first without DoM). However, we are now having a different problem this time, I think.

      We're trying to migrate files from MDT0003 to MDT0001. While running a migration of a full user directory as follow:

      lfs migrate -m 1 /fir/users/apatel6
      

      we hit "operation not permitted" errors on multiple directories, and even retrying the migration is leading to the same error:

      [root@fir-rbh01 storage]# lfs migrate -m 1 /fir/users/apatel6/data/10-scalingNEB/01-relaxwater/02-N
      /fir/users/apatel6/data/10-scalingNEB/01-relaxwater/02-N migrate failed: Operation not permitted (-1)
      
      [root@fir-rbh01 storage]# lfs getdirstripe /fir/users/apatel6/data/10-scalingNEB/01-relaxwater/02-N
      lmv_stripe_count: 2 lmv_stripe_offset: 3 lmv_hash_type: fnv_1a_64,migrating
      mdtidx           FID[seq:oid:ver]
           3           [0x2800394ad:0x3c7c:0x0]
           3           [0x280038894:0x124ee:0x0]
      

      I also noticed when writing this ticket that something seems wrong here as there are two mdtidx = "3". Usually, when a directory is migrating from 3 to 1, we can see mdtidx 1 and 3.

      Quick check of the FIDs above:

      [root@fir-rbh01 storage]# lfs fid2path /fir 0x2800394ad:0x3c7c:0x0
      /fir/users/apatel6/data/10-scalingNEB/01-relaxwater/02-N
      [root@fir-rbh01 storage]# lfs fid2path /fir 0x280038894:0x124ee:0x0
      /fir/users/apatel6/data/10-scalingNEB/01-relaxwater/02-N
      

      MDT0001 (not MDT0003!) shows this log message when attemping the failed command:

      Apr 29 08:35:06 fir-md1-s2 kernel: LustreError: 22437:0:(mdd_dir.c:4496:mdd_migrate()) fir-MDD0001: '02-N' migration was interrupted, run 'lfs migrate -m 3 -c 1 -H 2 02-N' to finish migration.
      

      I don't see anything else, but there might be debug flags that could be interesting?
      In any case, let me know how we could help troubleshoot this issue. We're using Lustre 2.12.4 here even on the client that performs the lfs migrate. Thanks!

      Attachments

        Issue Links

          Activity

            People

              hongchao.zhang Hongchao Zhang
              sthiell Stephane Thiell
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: