[LU-8574] The logic of verifying FID-in-dirent may handle the dangling or corrupted name entry improperly Created: 01/Sep/16  Updated: 25/Oct/16  Resolved: 25/Oct/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Critical
Reporter: nasf (Inactive) Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

If the name entry corrupted and contains bad inode#, when osd_dirent_check_repair() verifies such name entry, it will use the wrong inode# stored in the name entry to locate the target inode, such inode may does not exist, or belong to other.

== sanity-lfsck test 23b: LFSCK can repair dangling name entry (2) == 01:46:53 (1471830413)
#####
The objectA has multiple hard links, one of them corresponding
to the name entry_B. But there is something wrong for the name
entry_B and cause entry_B to references non-exist object_C.
In the first-stage scanning, the LFSCK will think the entry_B
as dangling, and re-create the lost object_C. When the LFSCK
comes to the second-stage scanning, it will find that the
former re-creating object_C is not proper, and will try to
replace the object_C with the real object_A.
#####
Inject failure stub on MDT0 to simulate dangling name entry
fail_loc=0x1621
fail_loc=0
'ls' should fail because of dangling name entry
Trigger namespace LFSCK to find out dangling name entry
Started LFSCK on the device lustre-MDT0000: scrub namespace
Waiting 32 secs for update
Updated after 5s: wanted 'completed' got 'completed'
 sanity-lfsck test_23b: @@@@@@ FAIL: (9) Fail to repair dangling name entry: 0 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4853:error()
  = /usr/lib64/lustre/tests/sanity-lfsck.sh:3045:test_23b()
  = /usr/lib64/lustre/tests/test-framework.sh:5113:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:5151:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:4955:run_test()
  = /usr/lib64/lustre/tests/sanity-lfsck.sh:3056:main()
Dumping lctl log to /tmp/test_logs/1471830385/sanity-lfsck.test_23b.*.1471830419.log
fre1234: Warning: Permanently added 'fre1235,192.168.112.35' (RSA) to the list of known hosts.

fre1233: Warning: Permanently added 'fre1235,192.168.112.35' (RSA) to the list of known hosts.

fre1236: Warning: Permanently added 'fre1235,192.168.112.35' (RSA) to the list of known hosts.

Resetting fail_loc on all nodes...done.
FAIL 23b (10s)


 Comments   
Comment by Gerrit Updater [ 05/Sep/16 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/22310
Subject: LU-8574 osd-ldiskfs: fix FID-in-dirent properly
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 180347ea9aee22afc33f2adac3bcbc755d143a13

Comment by Gerrit Updater [ 25/Oct/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/22310/
Subject: LU-8574 osd-ldiskfs: fix FID-in-dirent properly
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: c3947b14e5fa88b25d4e2a8e1c44b27d6397d814

Generated at Sat Feb 10 02:18:45 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.