Details
-
Improvement
-
Resolution: Fixed
-
Minor
-
Lustre 2.10.8, Lustre 2.12.3
-
9223372036854775807
Description
If multiple threads on a client are executing statfs() calls concurrently, and the obd_statfs() cache has expired, then each thread will send an OST_STATFS RPC to each OST. With certain statfs-heavy workloads on many-core client nodes, this can result in thousands of needless RPCs being sent from each client every few seconds.
Since all of the callers funnel through obd_statfs(), and there is no benefit to having multiple OST_STATFS or MDS_STATFS replies from the same target (they return the same data, and all threads are blocked on the reply) it makes sense to just allow one thread to execute the statfs and other threads to (interruptibly) wait for it to complete.
Attachments
Issue Links
- is related to
-
LU-13296 statfs isn't work properly with MDT statfs proxy
-
- Resolved
-
LU-12368testing is underway right now at the customer site, running ExaScaler 4.2 and DDN Lustre 2.10.7_ddn5. We noted the default llite.*.statfs_max_age was set to 1. (or 1 second)When testing with the default age of 1, we observed a regular occurrence (~1 in 10) of slow app timesteps in which each timestep made numerous statfs calls.
Increasing the statfs_max_age to 30 seconds, we did see a marked, positive change in behavior in which we only saw 1 in 900 slow app timesteps.