Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
None
-
3
-
9223372036854775807
Description
A change in https://review.whamcloud.com/#/c/30246/
(LU-10269 ldlm: fix the issues introduced by try bits)
introduced a major regression in open() performance in at least an mdsrate open() benchmark:
On a real system, this mdsrate open benchmark drops from 35K opens/second to around 11K.
This is 10K opens per process, 64 processes, opens are random files from among 300000 existing files (created by mdsrate earlier):
aprun -n 64 /usr/lib64/lustre/tests/mdsrate -d /mnt/lustre/mdsrate --open --iters 10000 --nfile=300000
On a much smaller VM, I see a drop from 8K opens/second to 4K with this benchmark.
This is 8K opens per process, 4 processes, opens randomly selected from among 30000 existing files:
mpirun -n 4 /usr/lib64/lustre/tests/mdsrate -d /mnt/lustre/mdsrate --open --iters 8000 --nfile=30000
The specific change is no longer attempting to grant the LOOKUP lock on opens.
Oleg has a patch in flight which reverses this change, but also includes a number of other attempted optimizations:
https://review.whamcloud.com/#/c/32156/
Andreas suggested I push the simpler patch to revert, with the idea that that patch can land quickly and Oleg's further optimizations can be considered separately.