Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8569

Sharded DNE directory full of files that don't exist

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.10.0
    • None
    • 3
    • 9223372036854775807

    Description

      On our DNE testbed, one of our sharded directories seems to contain files that are all in a broken state. Currently both servers and clients are running 2.8.0_0.0.llnlpreview.40 (see the lustre-release-fe-llnl repo).

      We can get a directory listing, but nothing listed is actually accessible. Here is an excerpt from running ls -l:

      # pwd
      /p/lquake/casses1/opal-jet/simul_2
      # ls -l
      ls: cannot access simul_link.2243: No such file or directory
      ls: cannot access simul_link.3161: No such file or directory
      ls: cannot access simul_link.3129: No such file or directory
      ls: cannot access simul_link.3893: No such file or directory
      ls: cannot access simul_link.691: No such file or directory
      ls: cannot access simul_link.3233: No such file or directory
      ls: cannot access simul_link.235: No such file or directory
      ls: cannot access simul_link.1653: No such file or directory
      ls: cannot access simul_link.3167: No such file or directory
      ls: cannot access simul_link.681: No such file or directory
      ls: cannot access simul_link.835: No such file or directory
      ls: cannot access simul_link.3857: No such file or directory
      ls: cannot access simul_link.1591: No such file or directory
      ls: cannot access simul_link.1175: No such file or directory
      [cut]
      -????????? ? ? ? ?            ? simul_link.937
      -????????? ? ? ? ?            ? simul_link.94
      -????????? ? ? ? ?            ? simul_link.940
      -????????? ? ? ? ?            ? simul_link.941
      -????????? ? ? ? ?            ? simul_link.942
      -????????? ? ? ? ?            ? simul_link.943
      -????????? ? ? ? ?            ? simul_link.944
      -????????? ? ? ? ?            ? simul_link.947
      [cut]
      

      Here is the striping information:

      # lfs getdirstripe .
      .
      lmv_stripe_count: 16 lmv_stripe_offset: 12
      mdtidx           FID[seq:oid:ver]
          12           [0x50000996c:0x14fed:0x0]
          13           [0x54000919d:0x14fed:0x0]
          14           [0x58000a086:0x14fed:0x0]
          15           [0x5c000996b:0x14fed:0x0]
           0           [0x200006b03:0x14fed:0x0]
           1           [0x3000089cc:0x14fed:0x0]
           2           [0x38000996d:0x14fed:0x0]
           3           [0x4c000b0df:0x14fed:0x0]
           4           [0x2c000a142:0xec09:0x0]
           5           [0x3c000b8b2:0xec09:0x0]
           6           [0x34000a143:0xec09:0x0]
           7           [0x40000a143:0xec09:0x0]
           8           [0x44000a142:0xec09:0x0]
           9           [0x24000a143:0xec09:0x0]
          10           [0x2800091a4:0xec09:0x0]
          11           [0x4800091a3:0xec09:0x0]
      

      I ran lfsck on all services (at least those started by the "--all" option), but that did not address this situation.

      The problem files cannot be unlinked:

      # rm simul_link.999
      rm: cannot remove 'simul_link.999': No such file or directory
      

      Attachments

        1. lfsck_namespace_state-9-28-2016.log
          24 kB
        2. jet-link-logs-part4.tar.gz
          0.2 kB
        3. jet-link-logs-part3.tar.gz
          0.2 kB
        4. jet-link-logs-part2.tar.gz
          0.2 kB
        5. jet-link-logs-part1.tar.gz
          0.2 kB
        6. getstripelogs.tar.gz
          0.2 kB

        Issue Links

          Activity

            People

              yong.fan nasf (Inactive)
              morrone Christopher Morrone (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: