Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8218

lfsck not able to recover files lost from MDT

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.9.0
    • Lustre 2.7.0
    • None
    • 3
    • 9223372036854775807

    Description

      My understanding is that lfsck in lustre-2.7 should be able to handle lost file information on the MDT, as long as the objects are still on the OSTs. However, a simple test to simulate this is not recovering the files. Shouldn't it at least be able to put them into lost+found? Or am I misunderstanding the capabilities of lfsck? Or is the following test case invalid in some way?

      On the client, just create some test files...

      # cd /mnt/lustre/client/lfscktest
      # echo foo > foo
      # mkdir bar
      # echo baz > bar/baz
      
      # lfs getstripe foo bar/baz
      foo
      lmm_stripe_count:   1
      lmm_stripe_size:    1048576
      lmm_pattern:        1
      lmm_layout_gen:     0
      lmm_stripe_offset:  9
          obdidx         objid         objid         group
               9            460962          0x708a2                 0
      
      bar/baz
      lmm_stripe_count:   1
      lmm_stripe_size:    1048576
      lmm_pattern:        1
      lmm_layout_gen:     0
      lmm_stripe_offset:  12
          obdidx         objid         objid         group
              12            460866          0x70842                 0
      
      # sync
      

      On the MDS, simulate the MDT losing the information, such as could happen through restoring from a slightly outdated MDT backup...

      # umount /mnt/lustre/nbptest-mdt
      # mount -t ldiskfs /dev/mapper/nbptest--vg-mdttest /mnt/lustre/nbptest-mdt
      # cd /mnt/lustre/nbptest-mdt/ROOT
      
      # ls -ld lfscktest lfscktest/*
      drwxr-xr-x+ 3 root root 4096 May 30 08:15 lfscktest
      drwxr-xr-x+ 2 root root 4096 May 30 08:15 lfscktest/bar
      -rw-r--r--  1 root root    0 May 30 08:14 lfscktest/foo
      
      # rm -rf lfscktest/*
      
      # cd
      # umount /mnt/lustre/nbptest-mdt
      # mount -t lustre /dev/mapper/nbptest--vg-mdttest /mnt/lustre/nbptest-mdt
      

      Now check the filesystem...

      # lctl clear
      # lctl debug_daemon start /var/log/lfsck.debug
      # lctl lfsck_start -A -M nbptest-MDT0000 -c on -C on -o
      Started LFSCK on the device nbptest-MDT0000: scrub layout namespace
      
      # lctl get_param -n osd-ldiskfs.*.oi_scrub | grep status
      status: init
      status: completed
      
      # lctl debug_daemon stop
      # lctl debug_file /var/log/lfsck.debug | egrep -v " (NRS|RPC) " > /var/log/lfsck.log
      

      And look back on the client...

      # cd /mnt/lustre/client/         
      
      # ls -la lfscktest/
      total 8
      drwxr-xr-x+ 2 root root 4096 May 30 08:22 .
      drwxr-xr-x+ 9 root root 4096 May 30 08:14 ..
      
      # ls -la .lustre/lost+found/MDT0000
      total 8
      drwx------+ 3 root root 4096 May 27 10:44 .
      dr-x------+ 3 root root 4096 May 27 09:01 ..
      

      Notice that there is no sign of the files being restored anywhere. Nor do I find any mention of the object ID's in the lfsck.log file.

      Note that running lfsck_start with the "-t layout" option did not change the behaviour either.

      Attachments

        Activity

          [LU-8218] lfsck not able to recover files lost from MDT

          Can be closed. Add nasa label.

          mhanafi Mahmoud Hanafi added a comment - Can be closed. Add nasa label.
          pjones Peter Jones added a comment -

          Landed for 2.9

          pjones Peter Jones added a comment - Landed for 2.9

          Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20659/
          Subject: LU-8218 osd: handle stale OI mapping for non-restore case
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: cecde8bdb4913fd4405d425b0bf3aead03181e9d

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20659/ Subject: LU-8218 osd: handle stale OI mapping for non-restore case Project: fs/lustre-release Branch: master Current Patch Set: Commit: cecde8bdb4913fd4405d425b0bf3aead03181e9d

          The patch 20659 should be landed to master, and then be ported to the other branches.

          yong.fan nasf (Inactive) added a comment - The patch 20659 should be landed to master, and then be ported to the other branches.

          Hi Fan Yong,

          Do you intend to land http://review.whamcloud.com/20659 to master and future releases or to provide a workaround for us?

          jaylan Jay Lan (Inactive) added a comment - Hi Fan Yong, Do you intend to land http://review.whamcloud.com/20659 to master and future releases or to provide a workaround for us?

          That is what we should expect, even with the patch, right? There is no way to determine the object's path once it is lost from the ROOT tree on the MDT?

          Yes, that is what we can do now. The path information is stored as linkEA ("trusted.link") in the MDT-object. There is no other backup in the system. So if the MDT-object itself lost, then the LFSCK cannot know its original location, and have to put it under .luster/lost+found/

          yong.fan nasf (Inactive) added a comment - That is what we should expect, even with the patch, right? There is no way to determine the object's path once it is lost from the ROOT tree on the MDT? Yes, that is what we can do now. The path information is stored as linkEA ("trusted.link") in the MDT-object. There is no other backup in the system. So if the MDT-object itself lost, then the LFSCK cannot know its original location, and have to put it under .luster/lost+found/

          Fan Yong, thank you for the patch! I haven't had a chance to test with a new build yet, but did do a quick check of running lfsck after "rm -f oi.16.*" under ldiskfs. The lfsck then resulted in files like the following in ".lustre/lost+found/MDT0000/":

          .lustre/lost+found/MDT0000/[0x200003ab0:0x1:0x0]-R-0
          

          That is what we should expect, even with the patch, right? There is no way to determine the object's path once it is lost from the ROOT tree on the MDT?

          ndauchy Nathan Dauchy (Inactive) added a comment - Fan Yong, thank you for the patch! I haven't had a chance to test with a new build yet, but did do a quick check of running lfsck after "rm -f oi.16.*" under ldiskfs. The lfsck then resulted in files like the following in ".lustre/lost+found/MDT0000/": .lustre/lost+found/MDT0000/[0x200003ab0:0x1:0x0]-R-0 That is what we should expect, even with the patch, right? There is no way to determine the object's path once it is lost from the ROOT tree on the MDT?

          Nathan,

          Above patch may be not perfect solution, but it should be enough to resolve your case.

          yong.fan nasf (Inactive) added a comment - Nathan, Above patch may be not perfect solution, but it should be enough to resolve your case.

          Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/20659
          Subject: LU-8218 osd: handle stale OI mapping for non-restore case
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 31cf77414ad4f88c28d6eb2be54b32a7ec399ab7

          gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/20659 Subject: LU-8218 osd: handle stale OI mapping for non-restore case Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 31cf77414ad4f88c28d6eb2be54b32a7ec399ab7

          The workaround for your special case is that if you want to remove some MDT-object under "ldiskfs" mode directly, then please remove the OI files also.

          yong.fan nasf (Inactive) added a comment - The workaround for your special case is that if you want to remove some MDT-object under "ldiskfs" mode directly, then please remove the OI files also.

          People

            yong.fan nasf (Inactive)
            ndauchy Nathan Dauchy (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: