Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8569

Sharded DNE directory full of files that don't exist

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.10.0
    • None
    • 3
    • 9223372036854775807

    Description

      On our DNE testbed, one of our sharded directories seems to contain files that are all in a broken state. Currently both servers and clients are running 2.8.0_0.0.llnlpreview.40 (see the lustre-release-fe-llnl repo).

      We can get a directory listing, but nothing listed is actually accessible. Here is an excerpt from running ls -l:

      # pwd
      /p/lquake/casses1/opal-jet/simul_2
      # ls -l
      ls: cannot access simul_link.2243: No such file or directory
      ls: cannot access simul_link.3161: No such file or directory
      ls: cannot access simul_link.3129: No such file or directory
      ls: cannot access simul_link.3893: No such file or directory
      ls: cannot access simul_link.691: No such file or directory
      ls: cannot access simul_link.3233: No such file or directory
      ls: cannot access simul_link.235: No such file or directory
      ls: cannot access simul_link.1653: No such file or directory
      ls: cannot access simul_link.3167: No such file or directory
      ls: cannot access simul_link.681: No such file or directory
      ls: cannot access simul_link.835: No such file or directory
      ls: cannot access simul_link.3857: No such file or directory
      ls: cannot access simul_link.1591: No such file or directory
      ls: cannot access simul_link.1175: No such file or directory
      [cut]
      -????????? ? ? ? ?            ? simul_link.937
      -????????? ? ? ? ?            ? simul_link.94
      -????????? ? ? ? ?            ? simul_link.940
      -????????? ? ? ? ?            ? simul_link.941
      -????????? ? ? ? ?            ? simul_link.942
      -????????? ? ? ? ?            ? simul_link.943
      -????????? ? ? ? ?            ? simul_link.944
      -????????? ? ? ? ?            ? simul_link.947
      [cut]
      

      Here is the striping information:

      # lfs getdirstripe .
      .
      lmv_stripe_count: 16 lmv_stripe_offset: 12
      mdtidx           FID[seq:oid:ver]
          12           [0x50000996c:0x14fed:0x0]
          13           [0x54000919d:0x14fed:0x0]
          14           [0x58000a086:0x14fed:0x0]
          15           [0x5c000996b:0x14fed:0x0]
           0           [0x200006b03:0x14fed:0x0]
           1           [0x3000089cc:0x14fed:0x0]
           2           [0x38000996d:0x14fed:0x0]
           3           [0x4c000b0df:0x14fed:0x0]
           4           [0x2c000a142:0xec09:0x0]
           5           [0x3c000b8b2:0xec09:0x0]
           6           [0x34000a143:0xec09:0x0]
           7           [0x40000a143:0xec09:0x0]
           8           [0x44000a142:0xec09:0x0]
           9           [0x24000a143:0xec09:0x0]
          10           [0x2800091a4:0xec09:0x0]
          11           [0x4800091a3:0xec09:0x0]
      

      I ran lfsck on all services (at least those started by the "--all" option), but that did not address this situation.

      The problem files cannot be unlinked:

      # rm simul_link.999
      rm: cannot remove 'simul_link.999': No such file or directory
      

      Attachments

        1. getstripelogs.tar.gz
          0.2 kB
        2. jet-link-logs-part1.tar.gz
          0.2 kB
        3. jet-link-logs-part2.tar.gz
          0.2 kB
        4. jet-link-logs-part3.tar.gz
          0.2 kB
        5. jet-link-logs-part4.tar.gz
          0.2 kB
        6. lfsck_namespace_state-9-28-2016.log
          24 kB

        Issue Links

          Activity

            [LU-8569] Sharded DNE directory full of files that don't exist
            pjones Peter Jones added a comment -

            All patches landed to master for 2.10. Ports to 2.8 and 2.9 FE branches will be tracked separately.

            pjones Peter Jones added a comment - All patches landed to master for 2.10. Ports to 2.8 and 2.9 FE branches will be tracked separately.

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23741/
            Subject: LU-8569 lfsck: handle linkEA overflow
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 048a8740ae26e3406a7eab3bca383a90490cef93

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23741/ Subject: LU-8569 lfsck: handle linkEA overflow Project: fs/lustre-release Branch: master Current Patch Set: Commit: 048a8740ae26e3406a7eab3bca383a90490cef93
            pjones Peter Jones added a comment -

            Giuseppe

            The ticket will be marked resolved when the patches land to master but the ticket will remain on the LLNL prority list until the equivalent patches have been ported and landed to the 2.8 FE branch

            Peter

            pjones Peter Jones added a comment - Giuseppe The ticket will be marked resolved when the patches land to master but the ticket will remain on the LLNL prority list until the equivalent patches have been ported and landed to the 2.8 FE branch Peter

            Before this closes, can these patches also be ported to the 2.8FE branch?

            dinatale2 Giuseppe Di Natale (Inactive) added a comment - Before this closes, can these patches also be ported to the 2.8FE branch?

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23500/
            Subject: LU-8569 linkea: linkEA size limitation
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e760042016bb5b12f9b21568304c02711930720f

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23500/ Subject: LU-8569 linkea: linkEA size limitation Project: fs/lustre-release Branch: master Current Patch Set: Commit: e760042016bb5b12f9b21568304c02711930720f

            Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/23741
            Subject: LU-8569 lfsck: handle linkEA overflow
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 94f5d2fec9edb6e1e5359ceebea9882cb5bb2719

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/23741 Subject: LU-8569 lfsck: handle linkEA overflow Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 94f5d2fec9edb6e1e5359ceebea9882cb5bb2719

            Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/23500
            Subject: LU-8569 linkea: linkEA size limitation
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 0d8fe108f7b7f267fa790320954fc55e996af964

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/23500 Subject: LU-8569 linkea: linkEA size limitation Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 0d8fe108f7b7f267fa790320954fc55e996af964

            Yes, I think it is reasonable to limit linkEA size in this case. The Linux kernel xattr API is also similarly limited by the size of individual xattrs, and ldiskfs has a 4KB limit for xattrs, so the Lustre code is already expecting that not all links will be stored for a given file.

            adilger Andreas Dilger added a comment - Yes, I think it is reasonable to limit linkEA size in this case. The Linux kernel xattr API is also similarly limited by the size of individual xattrs, and ldiskfs has a 4KB limit for xattrs, so the Lustre code is already expecting that not all links will be stored for a given file.
            di.wang Di Wang added a comment -

            Just did some tests on ZFS and it looks like the problem is because the linkEA on ZFS reach above the llog chunk size (32768), which our current update llog system can not handle. i.e. one update operation (update op + its parameter) size can not > llog chunk size (32KB).

            So is it ok to limit the linkea size here?

            di.wang Di Wang added a comment - Just did some tests on ZFS and it looks like the problem is because the linkEA on ZFS reach above the llog chunk size (32768), which our current update llog system can not handle. i.e. one update operation (update op + its parameter) size can not > llog chunk size (32KB). So is it ok to limit the linkea size here?
            di.wang Di Wang added a comment - - edited

            Just looked the debug log, it looks like update log is too long, which seems not right.

            .............
            0x23:47025: 200000020:00000040:9.0:1476399235.972447:0:154190:0:(update_trans.c:93:top_multiple_thandle_dump())  cookie 0x23:47025: 1
            

            too much log cookies ( > 1k) for this transaction, each cookie can hold 32k update records. So I do not understand why link can generate such big record size. Hmm, even though the linkea size might be big in your test. (Do we limit linkea size for zfs?) the problem might be in
            sub_updates_write. and related with this patch http://review.whamcloud.com/21334 , I will check.

            I suspect this test might reproduce the problem, sigh, I do not have zfs environment here,

            diff --git a/lustre/tests/sanity.sh b/lustre/tests/sanity.sh
            index c61e3bc..0a3a82c 100755
            --- a/lustre/tests/sanity.sh
            +++ b/lustre/tests/sanity.sh
            @@ -15196,6 +15196,29 @@ test_300q() {
             }
             run_test 300q "create remote directory under orphan directory"
             
            +test_300r() {
            +       [ $PARALLEL == "yes" ] && skip "skip parallel run" && return
            +       [ $(lustre_version_code $SINGLEMDS) -lt $(version_code 2.7.55) ] &&
            +               skip "Need MDS version at least 2.7.55" && return
            +       [ $MDSCOUNT -lt 2 ] && skip "needs >= 2 MDTs" && return
            +       local stripe_count
            +       local file
            +
            +       mkdir $DIR/$tdir
            +
            +       $LFS setdirstripe -i1 -c3 $DIR/$tdir/remote_dir ||
            +               error "set striped dir error"
            +
            +       touch $DIR/$tdir/$tfile
            +       for ((i = 0; i < 50000; i++)); do
            +               ln $DIR/$tdir/$tfile $DIR/$tdir/remote_dir/fffffffffffffffffffffffffffffffffffffffff-$i ||
            +                       error "ln remote file fails"
            +       done
            +
            +       return 0
            +}
            +run_test 300r "test remote ln under striped directory"
            +
             prepare_remote_file() {
                    mkdir $DIR/$tdir/src_dir ||
                            error "create remote source failed"
            
            
            di.wang Di Wang added a comment - - edited Just looked the debug log, it looks like update log is too long, which seems not right. ............. 0x23:47025: 200000020:00000040:9.0:1476399235.972447:0:154190:0:(update_trans.c:93:top_multiple_thandle_dump()) cookie 0x23:47025: 1 too much log cookies ( > 1k) for this transaction, each cookie can hold 32k update records. So I do not understand why link can generate such big record size. Hmm, even though the linkea size might be big in your test. (Do we limit linkea size for zfs?) the problem might be in sub_updates_write. and related with this patch http://review.whamcloud.com/21334 , I will check. I suspect this test might reproduce the problem, sigh, I do not have zfs environment here, diff --git a/lustre/tests/sanity.sh b/lustre/tests/sanity.sh index c61e3bc..0a3a82c 100755 --- a/lustre/tests/sanity.sh +++ b/lustre/tests/sanity.sh @@ -15196,6 +15196,29 @@ test_300q() { } run_test 300q "create remote directory under orphan directory" +test_300r() { + [ $PARALLEL == "yes" ] && skip "skip parallel run" && return + [ $(lustre_version_code $SINGLEMDS) -lt $(version_code 2.7.55) ] && + skip "Need MDS version at least 2.7.55" && return + [ $MDSCOUNT -lt 2 ] && skip "needs >= 2 MDTs" && return + local stripe_count + local file + + mkdir $DIR/$tdir + + $LFS setdirstripe -i1 -c3 $DIR/$tdir/remote_dir || + error "set striped dir error" + + touch $DIR/$tdir/$tfile + for ((i = 0; i < 50000; i++)); do + ln $DIR/$tdir/$tfile $DIR/$tdir/remote_dir/fffffffffffffffffffffffffffffffffffffffff-$i || + error "ln remote file fails" + done + + return 0 +} +run_test 300r "test remote ln under striped directory" + prepare_remote_file() { mkdir $DIR/$tdir/src_dir || error "create remote source failed"
            pjones Peter Jones added a comment -

            Got it. For future reference it is possible to make adjustments to git commit messages when landing, so it would have been possible to use the correct JIRA reference without delaying things.

            pjones Peter Jones added a comment - Got it. For future reference it is possible to make adjustments to git commit messages when landing, so it would have been possible to use the correct JIRA reference without delaying things.

            People

              yong.fan nasf (Inactive)
              morrone Christopher Morrone (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: