|
We have seen high load on the MDS, correlated to multi-node user jobs stat'ing one or multiple non-existing files in a tight loop, often from multiple ranks at once. Similar unnecessary load conditions may also be encountered when shells/loaders look down PATH or LD_PRELOAD at job startup.
The client should cache a negative dentry to limit the impact. However, based on discussions at LUG'18, my rough understanding is that a client can only retain a negative dentry if it has a lock (non-exclusive read is ok?) on that dir to give it context in which to cache it. Therefore the situation of NOT having that lock can occur fairly easily...
A) a full path to the non-existing file is used, so the parent dir is never read
B) another client creates OR removes a file (or directory) within the parent dir, thereby revoking the dir read lock
There may be ways to cache that negative dentry by getting a read lock on that path only, but that has the downside that the MDT could have to track a huge number of additional locks. One possibility would be to only do that if the MDT sees a number of requests over some threshold. Others will require some thought. 
Please correct any details in the summary that I got wrong or missed. Thanks!
|