Details
-
Improvement
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.14.0, Lustre 2.17.0
-
3
-
9223372036854775807
Description
In the IO500 benchmark, the "ior-hard-write" phase simulates many threads writing to a single large file (e.g. writing out regions of a very large array from memory), with a stonewall timer, after which all threads must continue to write until each thread has written the same amount of data as the farthest write offset from any thread.
In the current implementation, some "early mover" jobs have a large advantage to write to the file because they are granted DLM locks for non-conflicting regions of the file, and get far ahead of other writers that must contend for the DLM locks. This causes the "IOR hard write" phase to take a long time due to a "long tail" where threads need to "fill in" the large gaps in the file. Having the NRS TBF request handler sort the RPCs by file offset (in addition to arrival time) and prioritize writes with smaller offsets over writes with higher offsets would slow down the faster writers and speed up the slower ones, until they are in lockstep. Having the writes processed sequentially is also beneficial for managing the server cache and IO request merging for submission to the underlying filesystem, so should result in improved aggregate performance even though some threads are deliberately slowed down.
The NRS ORR engine exists to do request ordering within an object, but having a single NRS TBF policy is preferred, since ORR is missing much of the functionality of TBF, and doing tiered request sorting is unlikely to produce an optimal result.
Attachments
Issue Links
- is related to
-
LU-8433 Maximizing Bandwidth utilization by TBF Rule with Dependency
- Open
-
LU-18179 Implementation of Round-Robin/Fair Share response with Token Bucket Filters
- Open
-
LU-18180 UID in req_buffer_history
- Open
-
LU-17296 NRS TBF default rules
- Open
-
LU-18192 TBF: generic combination TBF types with different granularities
- Open