[LU-8574] The logic of verifying FID-in-dirent may handle the dangling or corrupted name entry improperly Created: 01/Sep/16 Updated: 25/Oct/16 Resolved: 25/Oct/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | nasf (Inactive) | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
If the name entry corrupted and contains bad inode#, when osd_dirent_check_repair() verifies such name entry, it will use the wrong inode# stored in the name entry to locate the target inode, such inode may does not exist, or belong to other. == sanity-lfsck test 23b: LFSCK can repair dangling name entry (2) == 01:46:53 (1471830413) ##### The objectA has multiple hard links, one of them corresponding to the name entry_B. But there is something wrong for the name entry_B and cause entry_B to references non-exist object_C. In the first-stage scanning, the LFSCK will think the entry_B as dangling, and re-create the lost object_C. When the LFSCK comes to the second-stage scanning, it will find that the former re-creating object_C is not proper, and will try to replace the object_C with the real object_A. ##### Inject failure stub on MDT0 to simulate dangling name entry fail_loc=0x1621 fail_loc=0 'ls' should fail because of dangling name entry Trigger namespace LFSCK to find out dangling name entry Started LFSCK on the device lustre-MDT0000: scrub namespace Waiting 32 secs for update Updated after 5s: wanted 'completed' got 'completed' sanity-lfsck test_23b: @@@@@@ FAIL: (9) Fail to repair dangling name entry: 0 Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:4853:error() = /usr/lib64/lustre/tests/sanity-lfsck.sh:3045:test_23b() = /usr/lib64/lustre/tests/test-framework.sh:5113:run_one() = /usr/lib64/lustre/tests/test-framework.sh:5151:run_one_logged() = /usr/lib64/lustre/tests/test-framework.sh:4955:run_test() = /usr/lib64/lustre/tests/sanity-lfsck.sh:3056:main() Dumping lctl log to /tmp/test_logs/1471830385/sanity-lfsck.test_23b.*.1471830419.log fre1234: Warning: Permanently added 'fre1235,192.168.112.35' (RSA) to the list of known hosts. fre1233: Warning: Permanently added 'fre1235,192.168.112.35' (RSA) to the list of known hosts. fre1236: Warning: Permanently added 'fre1235,192.168.112.35' (RSA) to the list of known hosts. Resetting fail_loc on all nodes...done. FAIL 23b (10s) |
| Comments |
| Comment by Gerrit Updater [ 05/Sep/16 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/22310 |
| Comment by Gerrit Updater [ 25/Oct/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/22310/ |