Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8569

Sharded DNE directory full of files that don't exist

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.10.0
    • None
    • 3
    • 9223372036854775807

    Description

      On our DNE testbed, one of our sharded directories seems to contain files that are all in a broken state. Currently both servers and clients are running 2.8.0_0.0.llnlpreview.40 (see the lustre-release-fe-llnl repo).

      We can get a directory listing, but nothing listed is actually accessible. Here is an excerpt from running ls -l:

      # pwd
      /p/lquake/casses1/opal-jet/simul_2
      # ls -l
      ls: cannot access simul_link.2243: No such file or directory
      ls: cannot access simul_link.3161: No such file or directory
      ls: cannot access simul_link.3129: No such file or directory
      ls: cannot access simul_link.3893: No such file or directory
      ls: cannot access simul_link.691: No such file or directory
      ls: cannot access simul_link.3233: No such file or directory
      ls: cannot access simul_link.235: No such file or directory
      ls: cannot access simul_link.1653: No such file or directory
      ls: cannot access simul_link.3167: No such file or directory
      ls: cannot access simul_link.681: No such file or directory
      ls: cannot access simul_link.835: No such file or directory
      ls: cannot access simul_link.3857: No such file or directory
      ls: cannot access simul_link.1591: No such file or directory
      ls: cannot access simul_link.1175: No such file or directory
      [cut]
      -????????? ? ? ? ?            ? simul_link.937
      -????????? ? ? ? ?            ? simul_link.94
      -????????? ? ? ? ?            ? simul_link.940
      -????????? ? ? ? ?            ? simul_link.941
      -????????? ? ? ? ?            ? simul_link.942
      -????????? ? ? ? ?            ? simul_link.943
      -????????? ? ? ? ?            ? simul_link.944
      -????????? ? ? ? ?            ? simul_link.947
      [cut]
      

      Here is the striping information:

      # lfs getdirstripe .
      .
      lmv_stripe_count: 16 lmv_stripe_offset: 12
      mdtidx           FID[seq:oid:ver]
          12           [0x50000996c:0x14fed:0x0]
          13           [0x54000919d:0x14fed:0x0]
          14           [0x58000a086:0x14fed:0x0]
          15           [0x5c000996b:0x14fed:0x0]
           0           [0x200006b03:0x14fed:0x0]
           1           [0x3000089cc:0x14fed:0x0]
           2           [0x38000996d:0x14fed:0x0]
           3           [0x4c000b0df:0x14fed:0x0]
           4           [0x2c000a142:0xec09:0x0]
           5           [0x3c000b8b2:0xec09:0x0]
           6           [0x34000a143:0xec09:0x0]
           7           [0x40000a143:0xec09:0x0]
           8           [0x44000a142:0xec09:0x0]
           9           [0x24000a143:0xec09:0x0]
          10           [0x2800091a4:0xec09:0x0]
          11           [0x4800091a3:0xec09:0x0]
      

      I ran lfsck on all services (at least those started by the "--all" option), but that did not address this situation.

      The problem files cannot be unlinked:

      # rm simul_link.999
      rm: cannot remove 'simul_link.999': No such file or directory
      

      Attachments

        1. getstripelogs.tar.gz
          0.2 kB
        2. jet-link-logs-part1.tar.gz
          0.2 kB
        3. jet-link-logs-part2.tar.gz
          0.2 kB
        4. jet-link-logs-part3.tar.gz
          0.2 kB
        5. jet-link-logs-part4.tar.gz
          0.2 kB
        6. lfsck_namespace_state-9-28-2016.log
          24 kB

        Issue Links

          Activity

            [LU-8569] Sharded DNE directory full of files that don't exist

            Apologies Peter, I went ahead and created LU-9037 to keep track of the porting so those who are interested can keep track of it's progress.

            dinatale2 Giuseppe Di Natale (Inactive) added a comment - Apologies Peter, I went ahead and created LU-9037 to keep track of the porting so those who are interested can keep track of it's progress.
            pjones Peter Jones added a comment -

            We'll post the links on the ticket and mark with llnlfixready when it's ready for you to pick up

            pjones Peter Jones added a comment - We'll post the links on the ticket and mark with llnlfixready when it's ready for you to pick up

            Peter,

            Are there tasks created so I can keep track of the 2.8 FE port?

            Joe

            dinatale2 Giuseppe Di Natale (Inactive) added a comment - Peter, Are there tasks created so I can keep track of the 2.8 FE port? Joe
            pjones Peter Jones added a comment -

            All patches landed to master for 2.10. Ports to 2.8 and 2.9 FE branches will be tracked separately.

            pjones Peter Jones added a comment - All patches landed to master for 2.10. Ports to 2.8 and 2.9 FE branches will be tracked separately.

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23741/
            Subject: LU-8569 lfsck: handle linkEA overflow
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 048a8740ae26e3406a7eab3bca383a90490cef93

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23741/ Subject: LU-8569 lfsck: handle linkEA overflow Project: fs/lustre-release Branch: master Current Patch Set: Commit: 048a8740ae26e3406a7eab3bca383a90490cef93
            pjones Peter Jones added a comment -

            Giuseppe

            The ticket will be marked resolved when the patches land to master but the ticket will remain on the LLNL prority list until the equivalent patches have been ported and landed to the 2.8 FE branch

            Peter

            pjones Peter Jones added a comment - Giuseppe The ticket will be marked resolved when the patches land to master but the ticket will remain on the LLNL prority list until the equivalent patches have been ported and landed to the 2.8 FE branch Peter

            Before this closes, can these patches also be ported to the 2.8FE branch?

            dinatale2 Giuseppe Di Natale (Inactive) added a comment - Before this closes, can these patches also be ported to the 2.8FE branch?

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23500/
            Subject: LU-8569 linkea: linkEA size limitation
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e760042016bb5b12f9b21568304c02711930720f

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23500/ Subject: LU-8569 linkea: linkEA size limitation Project: fs/lustre-release Branch: master Current Patch Set: Commit: e760042016bb5b12f9b21568304c02711930720f

            Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/23741
            Subject: LU-8569 lfsck: handle linkEA overflow
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 94f5d2fec9edb6e1e5359ceebea9882cb5bb2719

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/23741 Subject: LU-8569 lfsck: handle linkEA overflow Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 94f5d2fec9edb6e1e5359ceebea9882cb5bb2719

            Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/23500
            Subject: LU-8569 linkea: linkEA size limitation
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 0d8fe108f7b7f267fa790320954fc55e996af964

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/23500 Subject: LU-8569 linkea: linkEA size limitation Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 0d8fe108f7b7f267fa790320954fc55e996af964

            People

              yong.fan nasf (Inactive)
              morrone Christopher Morrone (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: