Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8218

lfsck not able to recover files lost from MDT

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.9.0
    • Lustre 2.7.0
    • None
    • 3
    • 9223372036854775807

    Description

      My understanding is that lfsck in lustre-2.7 should be able to handle lost file information on the MDT, as long as the objects are still on the OSTs. However, a simple test to simulate this is not recovering the files. Shouldn't it at least be able to put them into lost+found? Or am I misunderstanding the capabilities of lfsck? Or is the following test case invalid in some way?

      On the client, just create some test files...

      # cd /mnt/lustre/client/lfscktest
      # echo foo > foo
      # mkdir bar
      # echo baz > bar/baz
      
      # lfs getstripe foo bar/baz
      foo
      lmm_stripe_count:   1
      lmm_stripe_size:    1048576
      lmm_pattern:        1
      lmm_layout_gen:     0
      lmm_stripe_offset:  9
          obdidx         objid         objid         group
               9            460962          0x708a2                 0
      
      bar/baz
      lmm_stripe_count:   1
      lmm_stripe_size:    1048576
      lmm_pattern:        1
      lmm_layout_gen:     0
      lmm_stripe_offset:  12
          obdidx         objid         objid         group
              12            460866          0x70842                 0
      
      # sync
      

      On the MDS, simulate the MDT losing the information, such as could happen through restoring from a slightly outdated MDT backup...

      # umount /mnt/lustre/nbptest-mdt
      # mount -t ldiskfs /dev/mapper/nbptest--vg-mdttest /mnt/lustre/nbptest-mdt
      # cd /mnt/lustre/nbptest-mdt/ROOT
      
      # ls -ld lfscktest lfscktest/*
      drwxr-xr-x+ 3 root root 4096 May 30 08:15 lfscktest
      drwxr-xr-x+ 2 root root 4096 May 30 08:15 lfscktest/bar
      -rw-r--r--  1 root root    0 May 30 08:14 lfscktest/foo
      
      # rm -rf lfscktest/*
      
      # cd
      # umount /mnt/lustre/nbptest-mdt
      # mount -t lustre /dev/mapper/nbptest--vg-mdttest /mnt/lustre/nbptest-mdt
      

      Now check the filesystem...

      # lctl clear
      # lctl debug_daemon start /var/log/lfsck.debug
      # lctl lfsck_start -A -M nbptest-MDT0000 -c on -C on -o
      Started LFSCK on the device nbptest-MDT0000: scrub layout namespace
      
      # lctl get_param -n osd-ldiskfs.*.oi_scrub | grep status
      status: init
      status: completed
      
      # lctl debug_daemon stop
      # lctl debug_file /var/log/lfsck.debug | egrep -v " (NRS|RPC) " > /var/log/lfsck.log
      

      And look back on the client...

      # cd /mnt/lustre/client/         
      
      # ls -la lfscktest/
      total 8
      drwxr-xr-x+ 2 root root 4096 May 30 08:22 .
      drwxr-xr-x+ 9 root root 4096 May 30 08:14 ..
      
      # ls -la .lustre/lost+found/MDT0000
      total 8
      drwx------+ 3 root root 4096 May 27 10:44 .
      dr-x------+ 3 root root 4096 May 27 09:01 ..
      

      Notice that there is no sign of the files being restored anywhere. Nor do I find any mention of the object ID's in the lfsck.log file.

      Note that running lfsck_start with the "-t layout" option did not change the behaviour either.

      Attachments

        Activity

          People

            yong.fan nasf (Inactive)
            ndauchy Nathan Dauchy (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: