Details
-
Improvement
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.7.0, Lustre 2.9.0, Lustre 2.10.3
-
None
-
CentOS-6.9 servers, with 2.7 or 2.10.
SLES12 clients, with 2.9 and 2.10
-
9223372036854775807
Description
We have seen high load on the MDS, correlated to multi-node user jobs stat'ing one or multiple non-existing files in a tight loop, often from multiple ranks at once. Similar unnecessary load conditions may also be encountered when shells/loaders look down PATH or LD_PRELOAD at job startup.
The client should cache a negative dentry to limit the impact. However, based on discussions at LUG'18, my rough understanding is that a client can only retain a negative dentry if it has a lock (non-exclusive read is ok?) on that dir to give it context in which to cache it. Therefore the situation of NOT having that lock can occur fairly easily...
A) a full path to the non-existing file is used, so the parent dir is never read
B) another client creates OR removes a file (or directory) within the parent dir, thereby revoking the dir read lock
There may be ways to cache that negative dentry by getting a read lock on that path only, but that has the downside that the MDT could have to track a huge number of additional locks. One possibility would be to only do that if the MDT sees a number of requests over some threshold. Others will require some thought. ![]()
Please correct any details in the summary that I got wrong or missed. Thanks!