Details
-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
None
-
3
-
9223372036854775807
Description
If the name entry corrupted and contains bad inode#, when osd_dirent_check_repair() verifies such name entry, it will use the wrong inode# stored in the name entry to locate the target inode, such inode may does not exist, or belong to other.
== sanity-lfsck test 23b: LFSCK can repair dangling name entry (2) == 01:46:53 (1471830413) ##### The objectA has multiple hard links, one of them corresponding to the name entry_B. But there is something wrong for the name entry_B and cause entry_B to references non-exist object_C. In the first-stage scanning, the LFSCK will think the entry_B as dangling, and re-create the lost object_C. When the LFSCK comes to the second-stage scanning, it will find that the former re-creating object_C is not proper, and will try to replace the object_C with the real object_A. ##### Inject failure stub on MDT0 to simulate dangling name entry fail_loc=0x1621 fail_loc=0 'ls' should fail because of dangling name entry Trigger namespace LFSCK to find out dangling name entry Started LFSCK on the device lustre-MDT0000: scrub namespace Waiting 32 secs for update Updated after 5s: wanted 'completed' got 'completed' sanity-lfsck test_23b: @@@@@@ FAIL: (9) Fail to repair dangling name entry: 0 Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:4853:error() = /usr/lib64/lustre/tests/sanity-lfsck.sh:3045:test_23b() = /usr/lib64/lustre/tests/test-framework.sh:5113:run_one() = /usr/lib64/lustre/tests/test-framework.sh:5151:run_one_logged() = /usr/lib64/lustre/tests/test-framework.sh:4955:run_test() = /usr/lib64/lustre/tests/sanity-lfsck.sh:3056:main() Dumping lctl log to /tmp/test_logs/1471830385/sanity-lfsck.test_23b.*.1471830419.log fre1234: Warning: Permanently added 'fre1235,192.168.112.35' (RSA) to the list of known hosts. fre1233: Warning: Permanently added 'fre1235,192.168.112.35' (RSA) to the list of known hosts. fre1236: Warning: Permanently added 'fre1235,192.168.112.35' (RSA) to the list of known hosts. Resetting fail_loc on all nodes...done. FAIL 23b (10s)