Details
-
Improvement
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
VMs + 2.12.8 + 3.10.0-1160.59.1
robinhood v3 + 2.12.8 + 3.10.0-1062
-
9223372036854775807
Description
This issue was observed with robinhood clients:
- robinhood become slower to sync the fs with changelog with the time
- robinhood become slower if the reader is late (more negative entries generated).
"strace" on reader threads reveal that the FID stats could take several seconds.
drop_cache 2 or 3 fixes temporary the issue.
Reproducer
I was able to reproduce the issue with a "dumb" executable that generate a lot of "negative entries" with parallel stats on "<fs>/.lustre/fid/<non_existent_fid>".
The perf_fid_cont.svg is a flamegraph on the threads of the test process (fid_rand).
Most of the threads of fid_rand wait for the mutex of "./lustre/fid" inode in:
static int lookup_slow(struct nameidata *nd, struct path *path) { struct dentry *dentry, *parent; int err; parent = nd->path.dentry; BUG_ON(nd->inode != parent->d_inode); mutex_lock(&parent->d_inode->i_mutex); <--- contention here dentry = __lookup_hash(&nd->last, parent, nd->flags); mutex_unlock(&parent->d_inode->i_mutex);
workarround
- crontab with "echo 2 > /proc/sys/vm/drop_caches"
- set the "/proc/sys/fs/negative-dentry-limit" on 3.10.0-1160 kernel