Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8569

Sharded DNE directory full of files that don't exist

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.10.0
    • None
    • 3
    • 9223372036854775807

    Description

      On our DNE testbed, one of our sharded directories seems to contain files that are all in a broken state. Currently both servers and clients are running 2.8.0_0.0.llnlpreview.40 (see the lustre-release-fe-llnl repo).

      We can get a directory listing, but nothing listed is actually accessible. Here is an excerpt from running ls -l:

      # pwd
      /p/lquake/casses1/opal-jet/simul_2
      # ls -l
      ls: cannot access simul_link.2243: No such file or directory
      ls: cannot access simul_link.3161: No such file or directory
      ls: cannot access simul_link.3129: No such file or directory
      ls: cannot access simul_link.3893: No such file or directory
      ls: cannot access simul_link.691: No such file or directory
      ls: cannot access simul_link.3233: No such file or directory
      ls: cannot access simul_link.235: No such file or directory
      ls: cannot access simul_link.1653: No such file or directory
      ls: cannot access simul_link.3167: No such file or directory
      ls: cannot access simul_link.681: No such file or directory
      ls: cannot access simul_link.835: No such file or directory
      ls: cannot access simul_link.3857: No such file or directory
      ls: cannot access simul_link.1591: No such file or directory
      ls: cannot access simul_link.1175: No such file or directory
      [cut]
      -????????? ? ? ? ?            ? simul_link.937
      -????????? ? ? ? ?            ? simul_link.94
      -????????? ? ? ? ?            ? simul_link.940
      -????????? ? ? ? ?            ? simul_link.941
      -????????? ? ? ? ?            ? simul_link.942
      -????????? ? ? ? ?            ? simul_link.943
      -????????? ? ? ? ?            ? simul_link.944
      -????????? ? ? ? ?            ? simul_link.947
      [cut]
      

      Here is the striping information:

      # lfs getdirstripe .
      .
      lmv_stripe_count: 16 lmv_stripe_offset: 12
      mdtidx           FID[seq:oid:ver]
          12           [0x50000996c:0x14fed:0x0]
          13           [0x54000919d:0x14fed:0x0]
          14           [0x58000a086:0x14fed:0x0]
          15           [0x5c000996b:0x14fed:0x0]
           0           [0x200006b03:0x14fed:0x0]
           1           [0x3000089cc:0x14fed:0x0]
           2           [0x38000996d:0x14fed:0x0]
           3           [0x4c000b0df:0x14fed:0x0]
           4           [0x2c000a142:0xec09:0x0]
           5           [0x3c000b8b2:0xec09:0x0]
           6           [0x34000a143:0xec09:0x0]
           7           [0x40000a143:0xec09:0x0]
           8           [0x44000a142:0xec09:0x0]
           9           [0x24000a143:0xec09:0x0]
          10           [0x2800091a4:0xec09:0x0]
          11           [0x4800091a3:0xec09:0x0]
      

      I ran lfsck on all services (at least those started by the "--all" option), but that did not address this situation.

      The problem files cannot be unlinked:

      # rm simul_link.999
      rm: cannot remove 'simul_link.999': No such file or directory
      

      Attachments

        1. getstripelogs.tar.gz
          0.2 kB
        2. jet-link-logs-part1.tar.gz
          0.2 kB
        3. jet-link-logs-part2.tar.gz
          0.2 kB
        4. jet-link-logs-part3.tar.gz
          0.2 kB
        5. jet-link-logs-part4.tar.gz
          0.2 kB
        6. lfsck_namespace_state-9-28-2016.log
          24 kB

        Issue Links

          Activity

            [LU-8569] Sharded DNE directory full of files that don't exist

            Yes, I think it is reasonable to limit linkEA size in this case. The Linux kernel xattr API is also similarly limited by the size of individual xattrs, and ldiskfs has a 4KB limit for xattrs, so the Lustre code is already expecting that not all links will be stored for a given file.

            adilger Andreas Dilger added a comment - Yes, I think it is reasonable to limit linkEA size in this case. The Linux kernel xattr API is also similarly limited by the size of individual xattrs, and ldiskfs has a 4KB limit for xattrs, so the Lustre code is already expecting that not all links will be stored for a given file.

            Just did some tests on ZFS and it looks like the problem is because the linkEA on ZFS reach above the llog chunk size (32768), which our current update llog system can not handle. i.e. one update operation (update op + its parameter) size can not > llog chunk size (32KB).

            So is it ok to limit the linkea size here?

            di.wang Di Wang (Inactive) added a comment - Just did some tests on ZFS and it looks like the problem is because the linkEA on ZFS reach above the llog chunk size (32768), which our current update llog system can not handle. i.e. one update operation (update op + its parameter) size can not > llog chunk size (32KB). So is it ok to limit the linkea size here?
            di.wang Di Wang (Inactive) added a comment - - edited

            Just looked the debug log, it looks like update log is too long, which seems not right.

            .............
            0x23:47025: 200000020:00000040:9.0:1476399235.972447:0:154190:0:(update_trans.c:93:top_multiple_thandle_dump())  cookie 0x23:47025: 1
            

            too much log cookies ( > 1k) for this transaction, each cookie can hold 32k update records. So I do not understand why link can generate such big record size. Hmm, even though the linkea size might be big in your test. (Do we limit linkea size for zfs?) the problem might be in
            sub_updates_write. and related with this patch http://review.whamcloud.com/21334 , I will check.

            I suspect this test might reproduce the problem, sigh, I do not have zfs environment here,

            diff --git a/lustre/tests/sanity.sh b/lustre/tests/sanity.sh
            index c61e3bc..0a3a82c 100755
            --- a/lustre/tests/sanity.sh
            +++ b/lustre/tests/sanity.sh
            @@ -15196,6 +15196,29 @@ test_300q() {
             }
             run_test 300q "create remote directory under orphan directory"
             
            +test_300r() {
            +       [ $PARALLEL == "yes" ] && skip "skip parallel run" && return
            +       [ $(lustre_version_code $SINGLEMDS) -lt $(version_code 2.7.55) ] &&
            +               skip "Need MDS version at least 2.7.55" && return
            +       [ $MDSCOUNT -lt 2 ] && skip "needs >= 2 MDTs" && return
            +       local stripe_count
            +       local file
            +
            +       mkdir $DIR/$tdir
            +
            +       $LFS setdirstripe -i1 -c3 $DIR/$tdir/remote_dir ||
            +               error "set striped dir error"
            +
            +       touch $DIR/$tdir/$tfile
            +       for ((i = 0; i < 50000; i++)); do
            +               ln $DIR/$tdir/$tfile $DIR/$tdir/remote_dir/fffffffffffffffffffffffffffffffffffffffff-$i ||
            +                       error "ln remote file fails"
            +       done
            +
            +       return 0
            +}
            +run_test 300r "test remote ln under striped directory"
            +
             prepare_remote_file() {
                    mkdir $DIR/$tdir/src_dir ||
                            error "create remote source failed"
            
            
            di.wang Di Wang (Inactive) added a comment - - edited Just looked the debug log, it looks like update log is too long, which seems not right. ............. 0x23:47025: 200000020:00000040:9.0:1476399235.972447:0:154190:0:(update_trans.c:93:top_multiple_thandle_dump()) cookie 0x23:47025: 1 too much log cookies ( > 1k) for this transaction, each cookie can hold 32k update records. So I do not understand why link can generate such big record size. Hmm, even though the linkea size might be big in your test. (Do we limit linkea size for zfs?) the problem might be in sub_updates_write. and related with this patch http://review.whamcloud.com/21334 , I will check. I suspect this test might reproduce the problem, sigh, I do not have zfs environment here, diff --git a/lustre/tests/sanity.sh b/lustre/tests/sanity.sh index c61e3bc..0a3a82c 100755 --- a/lustre/tests/sanity.sh +++ b/lustre/tests/sanity.sh @@ -15196,6 +15196,29 @@ test_300q() { } run_test 300q "create remote directory under orphan directory" +test_300r() { + [ $PARALLEL == "yes" ] && skip "skip parallel run" && return + [ $(lustre_version_code $SINGLEMDS) -lt $(version_code 2.7.55) ] && + skip "Need MDS version at least 2.7.55" && return + [ $MDSCOUNT -lt 2 ] && skip "needs >= 2 MDTs" && return + local stripe_count + local file + + mkdir $DIR/$tdir + + $LFS setdirstripe -i1 -c3 $DIR/$tdir/remote_dir || + error "set striped dir error" + + touch $DIR/$tdir/$tfile + for ((i = 0; i < 50000; i++)); do + ln $DIR/$tdir/$tfile $DIR/$tdir/remote_dir/fffffffffffffffffffffffffffffffffffffffff-$i || + error "ln remote file fails" + done + + return 0 +} +run_test 300r "test remote ln under striped directory" + prepare_remote_file() { mkdir $DIR/$tdir/src_dir || error "create remote source failed"
            pjones Peter Jones added a comment -

            Got it. For future reference it is possible to make adjustments to git commit messages when landing, so it would have been possible to use the correct JIRA reference without delaying things.

            pjones Peter Jones added a comment - Got it. For future reference it is possible to make adjustments to git commit messages when landing, so it would have been possible to use the correct JIRA reference without delaying things.

            Peter,

            As you can see in the comment history, to make LU-8569 original issues to be clear, the new test failure about the LFSCK was split from LU-8569 description with new ticket LU-8647. The patch http://review.whamcloud.com/22723/ was used for resolving LU-8647 issue, but because the patch http://review.whamcloud.com/22723/ was push to Gerrit before LU-8647 created, then such patch still used the old ticket number.

            So we can close the ticket LU-8647 as resolved. There are still some work to be done for LU-8569. I am investigating the huge logs.

            yong.fan nasf (Inactive) added a comment - Peter, As you can see in the comment history, to make LU-8569 original issues to be clear, the new test failure about the LFSCK was split from LU-8569 description with new ticket LU-8647 . The patch http://review.whamcloud.com/22723/ was used for resolving LU-8647 issue, but because the patch http://review.whamcloud.com/22723/ was push to Gerrit before LU-8647 created, then such patch still used the old ticket number. So we can close the ticket LU-8647 as resolved. There are still some work to be done for LU-8569 . I am investigating the huge logs.
            pjones Peter Jones added a comment -

            So LU-8647 was fixed by http://git.whamcloud.com/fs/lustre-release.git/commit/445da16c2ac0475b1c1077c822800b68cdbb7ce3 even though it used the LU-8569 JIRA reference in the commit message?

            pjones Peter Jones added a comment - So LU-8647 was fixed by http://git.whamcloud.com/fs/lustre-release.git/commit/445da16c2ac0475b1c1077c822800b68cdbb7ce3 even though it used the LU-8569 JIRA reference in the commit message?

            Peter,

            There is still work being tracked under this ticket. The logs I posted last week are to help find a resolution to this issue.

            The patch that landed was for LU-8647.

            dinatale2 Giuseppe Di Natale (Inactive) added a comment - Peter, There is still work being tracked under this ticket. The logs I posted last week are to help find a resolution to this issue. The patch that landed was for LU-8647 .
            pjones Peter Jones added a comment -

            Actually perhaps I was premature to mark as resolved here. Fan Yong, what did the patch tracked under this ticket that jus tlanded to master address? Is there still work to be tracked under this ticket?

            pjones Peter Jones added a comment - Actually perhaps I was premature to mark as resolved here. Fan Yong, what did the patch tracked under this ticket that jus tlanded to master address? Is there still work to be tracked under this ticket?
            pjones Peter Jones added a comment -

            Landed for 2.9

            pjones Peter Jones added a comment - Landed for 2.9

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/22723/
            Subject: LU-8569 lfsck: cleanup lfsck requests list before exit
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 445da16c2ac0475b1c1077c822800b68cdbb7ce3

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/22723/ Subject: LU-8569 lfsck: cleanup lfsck requests list before exit Project: fs/lustre-release Branch: master Current Patch Set: Commit: 445da16c2ac0475b1c1077c822800b68cdbb7ce3

            Logs are now attached to this incident. The file names are jet-link-logs-part[1-4].tar.gz. The part 1 gzip has errors.log in it which has a sampling of what shows up in the console so you can use that to track down a specific file in the logs. Let me know if you need anything else.

            dinatale2 Giuseppe Di Natale (Inactive) added a comment - Logs are now attached to this incident. The file names are jet-link-logs-part [1-4] .tar.gz. The part 1 gzip has errors.log in it which has a sampling of what shows up in the console so you can use that to track down a specific file in the logs. Let me know if you need anything else.

            People

              yong.fan nasf (Inactive)
              morrone Christopher Morrone (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: