[LU-11501] use the dcache properly with .lustre/fid Created: 10/Oct/18  Updated: 01/Apr/22

Status: In Progress
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Upstream

Type: Bug Priority: Minor
Reporter: James A Simmons Assignee: James A Simmons
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-9868 dcache/namei fixes for lustre Open
is related to LU-5008 bad negative dentry caching in .lustr... Open
is related to LU-8585 All Lustre test suites should pass wi... Open
is related to LU-9629 lfs migrate does not work as a non-ro... Resolved
is related to LU-11970 Using changelog reader causes fid2pat... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

While addressing several bugs for how Lustre handles the dcache a bug was exposed with sanity test 233. In the testing with newer kernels the following was seen:

VFS: Lookup of '[0x200000002:0x1:0x0]' in lustre lustre would have caused loop

In older kernels it is an outright crash of the node. After some debugging it was discovered why. If a name in .lustre/fid refers to a directory, then that directory will (could) have 2 names in the dcache - the name in .lustre/fid, and the "real" name. The dcache does not permit this. You cannot have two dentries pointing the the same directory inode. Multiple hard links to directories are forbidden. Currently lustre attempts to break this rule by spinning its own dcache routines but that does not guarantee in the future that some core dcache functionality will change that breaks lustre. Due to the having multiple hard links you can form loops like reported above or if some core dcache functionality does an assert of dancestor() that will crash the node.  It could even be possible that in the future the dcache core code might even prune its cache of broken dentries.



 Comments   
Comment by Peter Jones [ 12/Oct/18 ]

James

Do I understand correctly that this is a longer-term task relating to the upstream kernel work?

Peter

Comment by James A Simmons [ 12/Oct/18 ]

Yes its a longer term project. Currently the way lustre handles .lustre/fid in relation to the dcache is starting to show up with tickets like LU-9735. I'm working around the issues but in the future we can expect more breakage. Will look into other solutions

Comment by Gerrit Updater [ 05/Sep/21 ]

"James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/44846
Subject: LU-11501 llite: use d_real for directories in fid cache.
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 26e52756fe807e23d1c76ad5d257ae1f0c81e989

Generated at Sat Feb 10 02:44:25 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.