Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8218

lfsck not able to recover files lost from MDT

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.9.0
    • Lustre 2.7.0
    • None
    • 3
    • 9223372036854775807

    Description

      My understanding is that lfsck in lustre-2.7 should be able to handle lost file information on the MDT, as long as the objects are still on the OSTs. However, a simple test to simulate this is not recovering the files. Shouldn't it at least be able to put them into lost+found? Or am I misunderstanding the capabilities of lfsck? Or is the following test case invalid in some way?

      On the client, just create some test files...

      # cd /mnt/lustre/client/lfscktest
      # echo foo > foo
      # mkdir bar
      # echo baz > bar/baz
      
      # lfs getstripe foo bar/baz
      foo
      lmm_stripe_count:   1
      lmm_stripe_size:    1048576
      lmm_pattern:        1
      lmm_layout_gen:     0
      lmm_stripe_offset:  9
          obdidx         objid         objid         group
               9            460962          0x708a2                 0
      
      bar/baz
      lmm_stripe_count:   1
      lmm_stripe_size:    1048576
      lmm_pattern:        1
      lmm_layout_gen:     0
      lmm_stripe_offset:  12
          obdidx         objid         objid         group
              12            460866          0x70842                 0
      
      # sync
      

      On the MDS, simulate the MDT losing the information, such as could happen through restoring from a slightly outdated MDT backup...

      # umount /mnt/lustre/nbptest-mdt
      # mount -t ldiskfs /dev/mapper/nbptest--vg-mdttest /mnt/lustre/nbptest-mdt
      # cd /mnt/lustre/nbptest-mdt/ROOT
      
      # ls -ld lfscktest lfscktest/*
      drwxr-xr-x+ 3 root root 4096 May 30 08:15 lfscktest
      drwxr-xr-x+ 2 root root 4096 May 30 08:15 lfscktest/bar
      -rw-r--r--  1 root root    0 May 30 08:14 lfscktest/foo
      
      # rm -rf lfscktest/*
      
      # cd
      # umount /mnt/lustre/nbptest-mdt
      # mount -t lustre /dev/mapper/nbptest--vg-mdttest /mnt/lustre/nbptest-mdt
      

      Now check the filesystem...

      # lctl clear
      # lctl debug_daemon start /var/log/lfsck.debug
      # lctl lfsck_start -A -M nbptest-MDT0000 -c on -C on -o
      Started LFSCK on the device nbptest-MDT0000: scrub layout namespace
      
      # lctl get_param -n osd-ldiskfs.*.oi_scrub | grep status
      status: init
      status: completed
      
      # lctl debug_daemon stop
      # lctl debug_file /var/log/lfsck.debug | egrep -v " (NRS|RPC) " > /var/log/lfsck.log
      

      And look back on the client...

      # cd /mnt/lustre/client/         
      
      # ls -la lfscktest/
      total 8
      drwxr-xr-x+ 2 root root 4096 May 30 08:22 .
      drwxr-xr-x+ 9 root root 4096 May 30 08:14 ..
      
      # ls -la .lustre/lost+found/MDT0000
      total 8
      drwx------+ 3 root root 4096 May 27 10:44 .
      dr-x------+ 3 root root 4096 May 27 09:01 ..
      

      Notice that there is no sign of the files being restored anywhere. Nor do I find any mention of the object ID's in the lfsck.log file.

      Note that running lfsck_start with the "-t layout" option did not change the behaviour either.

      Attachments

        Activity

          [LU-8218] lfsck not able to recover files lost from MDT

          Fan Yong, thank you for the patch! I haven't had a chance to test with a new build yet, but did do a quick check of running lfsck after "rm -f oi.16.*" under ldiskfs. The lfsck then resulted in files like the following in ".lustre/lost+found/MDT0000/":

          .lustre/lost+found/MDT0000/[0x200003ab0:0x1:0x0]-R-0
          

          That is what we should expect, even with the patch, right? There is no way to determine the object's path once it is lost from the ROOT tree on the MDT?

          ndauchy Nathan Dauchy (Inactive) added a comment - Fan Yong, thank you for the patch! I haven't had a chance to test with a new build yet, but did do a quick check of running lfsck after "rm -f oi.16.*" under ldiskfs. The lfsck then resulted in files like the following in ".lustre/lost+found/MDT0000/": .lustre/lost+found/MDT0000/[0x200003ab0:0x1:0x0]-R-0 That is what we should expect, even with the patch, right? There is no way to determine the object's path once it is lost from the ROOT tree on the MDT?

          Nathan,

          Above patch may be not perfect solution, but it should be enough to resolve your case.

          yong.fan nasf (Inactive) added a comment - Nathan, Above patch may be not perfect solution, but it should be enough to resolve your case.

          Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/20659
          Subject: LU-8218 osd: handle stale OI mapping for non-restore case
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 31cf77414ad4f88c28d6eb2be54b32a7ec399ab7

          gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/20659 Subject: LU-8218 osd: handle stale OI mapping for non-restore case Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 31cf77414ad4f88c28d6eb2be54b32a7ec399ab7

          The workaround for your special case is that if you want to remove some MDT-object under "ldiskfs" mode directly, then please remove the OI files also.

          yong.fan nasf (Inactive) added a comment - The workaround for your special case is that if you want to remove some MDT-object under "ldiskfs" mode directly, then please remove the OI files also.

          Just to clarify the status of this ticket... we are on hold waiting for a new phase of scanning to be added to lfsck?

          In the meantime, is there a workaround we can use as part of the MDT recovery procedure when getting such stale mappings is expected? Can we mount as ldiskfs and manually check or clean things up?

          ndauchy Nathan Dauchy (Inactive) added a comment - Just to clarify the status of this ticket... we are on hold waiting for a new phase of scanning to be added to lfsck? In the meantime, is there a workaround we can use as part of the MDT recovery procedure when getting such stale mappings is expected? Can we mount as ldiskfs and manually check or clean things up?

          From the LFSCK view, the case of removing MDT-object directly without destroy the OI mapping is indistinguishable from the case of MDT file-level backup/restore. When the OSD tries to locate the local object/inode via the ino# that is obtained from the stale OI mapping, it does not know whether the real MDT-object exists or not. The possible solution is that the OI scrub should make double scanning: the first phase scanning is inode table based to scan all know object on the device; the second phase scanning is OI files based to find out all staled OI mappings. Currently, it only does the first phase scanning.

          yong.fan nasf (Inactive) added a comment - From the LFSCK view, the case of removing MDT-object directly without destroy the OI mapping is indistinguishable from the case of MDT file-level backup/restore. When the OSD tries to locate the local object/inode via the ino# that is obtained from the stale OI mapping, it does not know whether the real MDT-object exists or not. The possible solution is that the OI scrub should make double scanning: the first phase scanning is inode table based to scan all know object on the device; the second phase scanning is OI files based to find out all staled OI mappings. Currently, it only does the first phase scanning.

          OK... I can test that, but what if this was a "real" case of MDT corruption where only the files were lost? Is a new feature or phase in lfsck needed to manage the stale OI mappings?

          ndauchy Nathan Dauchy (Inactive) added a comment - OK... I can test that, but what if this was a "real" case of MDT corruption where only the files were lost? Is a new feature or phase in lfsck needed to manage the stale OI mappings?

          Because you only removed the files on the MDT under ldiskfs mode directly, but kept the OI files (oi.16.xxx) there which contains stale OI mappings for those removed MDT-objects as to the further LFSCK cannot locate objects properly. So please remove the OI files under ldiskfs mode and run LFSCK after that.

          Thanks!

          yong.fan nasf (Inactive) added a comment - Because you only removed the files on the MDT under ldiskfs mode directly, but kept the OI files (oi.16.xxx) there which contains stale OI mappings for those removed MDT-objects as to the further LFSCK cannot locate objects properly. So please remove the OI files under ldiskfs mode and run LFSCK after that. Thanks!

          debug logs from the servers while lfsck was run.

          service320 is client and where ost8 runs
          service322 is MDS
          service323 is where ost11 runs

          ndauchy Nathan Dauchy (Inactive) added a comment - debug logs from the servers while lfsck was run. service320 is client and where ost8 runs service322 is MDS service323 is where ost11 runs

          Yes, that is what I want to know. The OST-object's size has been updated, that means the dirty data bas been flushed back to the OST, although the PFID EA ("trusted.fid") is not printed properly.

          Please run layout LFSCK just on this system with LFSCK debug enabled, and collect the kernel debug logs on both the MDT and nbptest-ost8 and nbptest-ost11. Thanks!

          yong.fan nasf (Inactive) added a comment - Yes, that is what I want to know. The OST-object's size has been updated, that means the dirty data bas been flushed back to the OST, although the PFID EA ("trusted.fid") is not printed properly. Please run layout LFSCK just on this system with LFSCK debug enabled, and collect the kernel debug logs on both the MDT and nbptest-ost8 and nbptest-ost11. Thanks!

          Is this the information you are looking for?

          Client:

          # cd /mnt/lustre/client/lfscktest
          # echo foo > foo
          # mkdir bar
          # echo baz > bar/baz
          
          # lctl get_param ldlm.namespaces.*osc*.lru_size | grep -v =0
            ldlm.namespaces.nbptest-OST0000-osc-ffff8805daad9800.lru_size=1
            ldlm.namespaces.nbptest-OST0007-osc-ffff8805daad9800.lru_size=2
            ldlm.namespaces.nbptest-OST0008-osc-ffff8805daad9800.lru_size=1
            ldlm.namespaces.nbptest-OST0009-osc-ffff8805daad9800.lru_size=1
            ldlm.namespaces.nbptest-OST000b-osc-ffff8805daad9800.lru_size=1
            ldlm.namespaces.nbptest-OST000c-osc-ffff8805daad9800.lru_size=1
          # lctl set_param -n ldlm.namespaces.*osc*.lru_size=clear
          # lctl get_param ldlm.namespaces.*osc*.lru_size | grep -v =0
            (nothing returned)
          
          # getfattr -d -m ".*" -e hex foo bar/baz 
          # file: foo
          lustre.lov=0xd00bd10b010000000100000000000000b03a0000020000000000100001000000e20807000000000000000000000000000000000008000000
          trusted.link=0xdff1ea11010000002d00000000000000000000000000000000150000000200002b100000000900000000666f6f
          trusted.lma=0x0000000000000000b03a0000020000000100000000000000
          trusted.lov=0xd00bd10b010000000100000000000000b03a0000020000000000100001000000e20807000000000000000000000000000000000008000000
          
          # file: bar/baz
          lustre.lov=0xd00bd10b010000000300000000000000b03a0000020000000000100001000000e2080700000000000000000000000000000000000b000000
          trusted.link=0xdff1ea11010000002d00000000000000000000000000000000150000000200003ab0000000020000000062617a
          trusted.lma=0x0000000000000000b03a0000020000000300000000000000
          trusted.lov=0xd00bd10b010000000300000000000000b03a0000020000000000100001000000e2080700000000000000000000000000000000000b000000
          
          service320 /mnt/lustre/client/lfscktest # 
          
          # lfs getstripe foo bar/baz
          foo
          lmm_stripe_count:   1
          lmm_stripe_size:    1048576
          lmm_pattern:        1
          lmm_layout_gen:     0
          lmm_stripe_offset:  8
          	obdidx		 objid		 objid		 group
          	     8	        461026	      0x708e2	             0
          
          bar/baz
          lmm_stripe_count:   1
          lmm_stripe_size:    1048576
          lmm_pattern:        1
          lmm_layout_gen:     0
          lmm_stripe_offset:  11
          	obdidx		 objid		 objid		 group
          	    11	        461026	      0x708e2	             0
          
          # echo $(( 461026 % 32 ))         
          2
          
          # debugfs /dev/mapper/nbptest-ost8
          debugfs 1.42.13.wc4 (28-Nov-2015)
          debugfs:  cd O
          debugfs:  cd 0
          debugfs:  cd d2
          debugfs:  stat 461026
          Inode: 487   Type: regular    Mode:  0666   Flags: 0x80000
          Generation: 2904170364    Version: 0x0000000c:00000005
          User:     0   Group:     0   Size: 4
          File ACL: 0    Directory ACL: 0
          Links: 1   Blockcount: 8
          Fragment:  Address: 0    Number: 0    Size: 0
           ctime: 0x574efbee:00000000 -- Wed Jun  1 08:14:54 2016
           atime: 0x00000000:00000000 -- Wed Dec 31 16:00:00 1969
           mtime: 0x574efbee:00000000 -- Wed Jun  1 08:14:54 2016
          crtime: 0x574d9b4f:3f4d531c -- Tue May 31 07:10:23 2016
          Size of extra inode fields: 28
          Extended attributes stored in inode body: 
          invalid EA entry in inode
          EXTENTS:
          (0):152064
          debugfs:  dump 461026 /tmp/obj.461026.foo
          debugfs:  quit
          
          # cat /tmp/obj.461026.foo
          foo
          
          # debugfs /dev/mapper/nbptest-ost11
          debugfs 1.42.13.wc4 (28-Nov-2015)
          debugfs:  cd O
          debugfs:  cd 0
          debugfs:  cd d2
          debugfs:  stat 461026
          Inode: 489   Type: regular    Mode:  0666   Flags: 0x80000
          Generation: 3312724559    Version: 0x0000000c:00000007
          User:     0   Group:     0   Size: 4
          File ACL: 0    Directory ACL: 0
          Links: 1   Blockcount: 8
          Fragment:  Address: 0    Number: 0    Size: 0
           ctime: 0x574efbf6:00000000 -- Wed Jun  1 08:15:02 2016
           atime: 0x00000000:00000000 -- Wed Dec 31 16:00:00 1969
           mtime: 0x574efbf6:00000000 -- Wed Jun  1 08:15:02 2016
          crtime: 0x574d9b4f:58007134 -- Tue May 31 07:10:23 2016
          Size of extra inode fields: 28
          Extended attributes stored in inode body: 
          invalid EA entry in inode
          EXTENTS:
          (0):128768
          debugfs:  dump 461026 /tmp/obj.461026.baz
          debugfs:  quit
          
          # cat /tmp/obj.461026.baz
          baz
          
          ndauchy Nathan Dauchy (Inactive) added a comment - Is this the information you are looking for? Client: # cd /mnt/lustre/client/lfscktest # echo foo > foo # mkdir bar # echo baz > bar/baz # lctl get_param ldlm.namespaces.*osc*.lru_size | grep -v =0 ldlm.namespaces.nbptest-OST0000-osc-ffff8805daad9800.lru_size=1 ldlm.namespaces.nbptest-OST0007-osc-ffff8805daad9800.lru_size=2 ldlm.namespaces.nbptest-OST0008-osc-ffff8805daad9800.lru_size=1 ldlm.namespaces.nbptest-OST0009-osc-ffff8805daad9800.lru_size=1 ldlm.namespaces.nbptest-OST000b-osc-ffff8805daad9800.lru_size=1 ldlm.namespaces.nbptest-OST000c-osc-ffff8805daad9800.lru_size=1 # lctl set_param -n ldlm.namespaces.*osc*.lru_size=clear # lctl get_param ldlm.namespaces.*osc*.lru_size | grep -v =0 (nothing returned) # getfattr -d -m ".*" -e hex foo bar/baz # file: foo lustre.lov=0xd00bd10b010000000100000000000000b03a0000020000000000100001000000e20807000000000000000000000000000000000008000000 trusted.link=0xdff1ea11010000002d00000000000000000000000000000000150000000200002b100000000900000000666f6f trusted.lma=0x0000000000000000b03a0000020000000100000000000000 trusted.lov=0xd00bd10b010000000100000000000000b03a0000020000000000100001000000e20807000000000000000000000000000000000008000000 # file: bar/baz lustre.lov=0xd00bd10b010000000300000000000000b03a0000020000000000100001000000e2080700000000000000000000000000000000000b000000 trusted.link=0xdff1ea11010000002d00000000000000000000000000000000150000000200003ab0000000020000000062617a trusted.lma=0x0000000000000000b03a0000020000000300000000000000 trusted.lov=0xd00bd10b010000000300000000000000b03a0000020000000000100001000000e2080700000000000000000000000000000000000b000000 service320 /mnt/lustre/client/lfscktest # # lfs getstripe foo bar/baz foo lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: 1 lmm_layout_gen: 0 lmm_stripe_offset: 8 obdidx objid objid group 8 461026 0x708e2 0 bar/baz lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: 1 lmm_layout_gen: 0 lmm_stripe_offset: 11 obdidx objid objid group 11 461026 0x708e2 0 # echo $(( 461026 % 32 )) 2 # debugfs /dev/mapper/nbptest-ost8 debugfs 1.42.13.wc4 (28-Nov-2015) debugfs: cd O debugfs: cd 0 debugfs: cd d2 debugfs: stat 461026 Inode: 487 Type: regular Mode: 0666 Flags: 0x80000 Generation: 2904170364 Version: 0x0000000c:00000005 User: 0 Group: 0 Size: 4 File ACL: 0 Directory ACL: 0 Links: 1 Blockcount: 8 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x574efbee:00000000 -- Wed Jun 1 08:14:54 2016 atime: 0x00000000:00000000 -- Wed Dec 31 16:00:00 1969 mtime: 0x574efbee:00000000 -- Wed Jun 1 08:14:54 2016 crtime: 0x574d9b4f:3f4d531c -- Tue May 31 07:10:23 2016 Size of extra inode fields: 28 Extended attributes stored in inode body: invalid EA entry in inode EXTENTS: (0):152064 debugfs: dump 461026 /tmp/obj.461026.foo debugfs: quit # cat /tmp/obj.461026.foo foo # debugfs /dev/mapper/nbptest-ost11 debugfs 1.42.13.wc4 (28-Nov-2015) debugfs: cd O debugfs: cd 0 debugfs: cd d2 debugfs: stat 461026 Inode: 489 Type: regular Mode: 0666 Flags: 0x80000 Generation: 3312724559 Version: 0x0000000c:00000007 User: 0 Group: 0 Size: 4 File ACL: 0 Directory ACL: 0 Links: 1 Blockcount: 8 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x574efbf6:00000000 -- Wed Jun 1 08:15:02 2016 atime: 0x00000000:00000000 -- Wed Dec 31 16:00:00 1969 mtime: 0x574efbf6:00000000 -- Wed Jun 1 08:15:02 2016 crtime: 0x574d9b4f:58007134 -- Tue May 31 07:10:23 2016 Size of extra inode fields: 28 Extended attributes stored in inode body: invalid EA entry in inode EXTENTS: (0):128768 debugfs: dump 461026 /tmp/obj.461026.baz debugfs: quit # cat /tmp/obj.461026.baz baz

          People

            yong.fan nasf (Inactive)
            ndauchy Nathan Dauchy (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: