Details
-
Improvement
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.16.0
-
3
-
9223372036854775807
Description
The lack of parallel operations in a single directory from one client is becoming a limiting factor in some workloads, as the number of CPU cores on clients are growing into the hundreds (e.g. DGX H100 have 256 cores today). The IO500 mdtest-hard-write test exercises exactly this code (multiple threads updating a single shared directory), both on a single client as well as multiple clients. This benchmark is sometimes run with multiple mountpoints on a single client in order to work around the kernel VFS locking limitations, however this impacts any multi-threaded workload that is operating in a single directory. It would be best (both for the benchmark and real applications) to fix the VFS locking properly in the kernel.
There was an RFC patch from neilb a couple of years ago (VFS: support parallel updates in the one directory) that added a prototype VFS parallel directory lock for NFS clients, and after minor revisions that patch was showing significant improvements to performance (400x) just due to concurrency of requests over a high-latency network, even though the NFS server itself could not do parallel operations. Unfortunately that patch was never landed after the initial positive RFC.
The MDS and ldiskfs can already handle parallel locking on a single directory from multiple different clients, but the kernel VFS directory locking does not currently allow more than one thread to be modifying the directory contents in any way (create, unlink, rename). This change should provide fairly substantial speedups to multi-threaded workloads on a single directory (up to the concurrency limit of the threads on the client and MDS, which could potentially be up to 100x faster).
It would be very useful to update Neil's patch for the latest kernels and then make the (hopefully minor) changes to the lustre/llite code under "#ifdef DCACHE_PAR_UPDATE" that would be needed to work with a kernel with this change, whether that is from patching the client kernel (which we haven't done in a long time) or because the patch is landed upstream.