Details
-
Improvement
-
Resolution: Fixed
-
Major
-
None
-
None
-
9223372036854775807
Description
During benchmarking at large scale on KNL+Omnipath system (8k nodes) we saw periodic OS noise due to lustre kernel threads that greatly affects the performance of small-message MPI collectives (MPI_Barrier, MPI_Allreduce with small datasize, etc).
The request is for lustre to use kernel threads in a deterministic manner when the load is low. In this case the ideal usage would have been to use only the thread(s) in CPT 0 which can be setup to run only on KNL tile 0 (cores 0 and 1). Then when no significant I/O is going on, a benchmark thread can be bound to a tile other than 0 and not see any lustre noise.
This is more than just a benchmarking setting, it is common for allreduce to be a bottleneck especially at scale, and usually HPC applications have long phases with no I/O.