Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.7.0
-
None
-
3
-
9223372036854775807
Description
My understanding is that lfsck in lustre-2.7 should be able to handle lost file information on the MDT, as long as the objects are still on the OSTs. However, a simple test to simulate this is not recovering the files. Shouldn't it at least be able to put them into lost+found? Or am I misunderstanding the capabilities of lfsck? Or is the following test case invalid in some way?
On the client, just create some test files...
# cd /mnt/lustre/client/lfscktest # echo foo > foo # mkdir bar # echo baz > bar/baz # lfs getstripe foo bar/baz foo lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: 1 lmm_layout_gen: 0 lmm_stripe_offset: 9 obdidx objid objid group 9 460962 0x708a2 0 bar/baz lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: 1 lmm_layout_gen: 0 lmm_stripe_offset: 12 obdidx objid objid group 12 460866 0x70842 0 # sync
On the MDS, simulate the MDT losing the information, such as could happen through restoring from a slightly outdated MDT backup...
# umount /mnt/lustre/nbptest-mdt # mount -t ldiskfs /dev/mapper/nbptest--vg-mdttest /mnt/lustre/nbptest-mdt # cd /mnt/lustre/nbptest-mdt/ROOT # ls -ld lfscktest lfscktest/* drwxr-xr-x+ 3 root root 4096 May 30 08:15 lfscktest drwxr-xr-x+ 2 root root 4096 May 30 08:15 lfscktest/bar -rw-r--r-- 1 root root 0 May 30 08:14 lfscktest/foo # rm -rf lfscktest/* # cd # umount /mnt/lustre/nbptest-mdt # mount -t lustre /dev/mapper/nbptest--vg-mdttest /mnt/lustre/nbptest-mdt
Now check the filesystem...
# lctl clear # lctl debug_daemon start /var/log/lfsck.debug # lctl lfsck_start -A -M nbptest-MDT0000 -c on -C on -o Started LFSCK on the device nbptest-MDT0000: scrub layout namespace # lctl get_param -n osd-ldiskfs.*.oi_scrub | grep status status: init status: completed # lctl debug_daemon stop # lctl debug_file /var/log/lfsck.debug | egrep -v " (NRS|RPC) " > /var/log/lfsck.log
And look back on the client...
# cd /mnt/lustre/client/ # ls -la lfscktest/ total 8 drwxr-xr-x+ 2 root root 4096 May 30 08:22 . drwxr-xr-x+ 9 root root 4096 May 30 08:14 .. # ls -la .lustre/lost+found/MDT0000 total 8 drwx------+ 3 root root 4096 May 27 10:44 . dr-x------+ 3 root root 4096 May 27 09:01 ..
Notice that there is no sign of the files being restored anywhere. Nor do I find any mention of the object ID's in the lfsck.log file.
Note that running lfsck_start with the "-t layout" option did not change the behaviour either.
Fan Yong, thank you for the patch! I haven't had a chance to test with a new build yet, but did do a quick check of running lfsck after "rm -f oi.16.*" under ldiskfs. The lfsck then resulted in files like the following in ".lustre/lost+found/MDT0000/":
That is what we should expect, even with the patch, right? There is no way to determine the object's path once it is lost from the ROOT tree on the MDT?