[LU-10960] Improve negative dentry client caching for repeated stats of non-existent files Created: 26/Apr/18  Updated: 27/Apr/18

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0, Lustre 2.9.0, Lustre 2.10.3
Fix Version/s: None

Type: Improvement Priority: Major
Reporter: Nathan Dauchy (Inactive) Assignee: Oleg Drokin
Resolution: Unresolved Votes: 0
Labels: None
Environment:

CentOS-6.9 servers, with 2.7 or 2.10.
SLES12 clients, with 2.9 and 2.10


Rank (Obsolete): 9223372036854775807

 Description   

We have seen high load on the MDS, correlated to multi-node user jobs stat'ing one or multiple non-existing files in a tight loop, often from multiple ranks at once. Similar unnecessary load conditions may also be encountered when shells/loaders look down PATH or LD_PRELOAD at job startup.

The client should cache a negative dentry to limit the impact. However, based on discussions at LUG'18, my rough understanding is that a client can only retain a negative dentry if it has a lock (non-exclusive read is ok?) on that dir to give it context in which to cache it. Therefore the situation of NOT having that lock can occur fairly easily...
A) a full path to the non-existing file is used, so the parent dir is never read
B) another client creates OR removes a file (or directory) within the parent dir, thereby revoking the dir read lock

There may be ways to cache that negative dentry by getting a read lock on that path only, but that has the downside that the MDT could have to track a huge number of additional locks. One possibility would be to only do that if the MDT sees a number of requests over some threshold. Others will require some thought.

Please correct any details in the summary that I got wrong or missed.  Thanks!



 Comments   
Comment by Brad Hoagland (Inactive) [ 27/Apr/18 ]

IIUC, this came out of LUG discussions.

Generated at Sat Feb 10 02:39:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.