Loading...

XML

Word

Printable

Details

Type: Improvement
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: Lustre 2.14.0, Lustre 2.17.0
Labels:
- IO500
- TBF
- hard
- performance

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

In the IO500 benchmark, the "ior-hard-write" phase simulates many threads writing to a single large file (e.g. writing out regions of a very large array from memory), with a stonewall timer, after which all threads must continue to write until each thread has written the same amount of data as the farthest write offset from any thread.

In the current implementation, some "early mover" jobs have a large advantage to write to the file because they are granted DLM locks for non-conflicting regions of the file, and get far ahead of other writers that must contend for the DLM locks. This causes the "IOR hard write" phase to take a long time due to a "long tail" where threads need to "fill in" the large gaps in the file. Having the NRS TBF request handler sort the RPCs by file offset (in addition to arrival time) and prioritize writes with smaller offsets over writes with higher offsets would slow down the faster writers and speed up the slower ones, until they are in lockstep. Having the writes processed sequentially is also beneficial for managing the server cache and IO request merging for submission to the underlying filesystem, so should result in improved aggregate performance even though some threads are deliberately slowed down.

The NRS ORR engine exists to do request ordering within an object, but having a single NRS TBF policy is preferred, since ORR is missing much of the functionality of TBF, and doing tiered request sorting is unlikely to produce an optimal result.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Revisiting_IO_Bandwidth_Sharing_Strategies.pdf
3.52 MB
29/Aug/24 3:13 AM

Issue Links

is related to

LU-8433 Maximizing Bandwidth utilization by TBF Rule with Dependency

Open

LU-18179 Implementation of Round-Robin/Fair Share response with Token Bucket Filters

Open

LU-18180 UID in req_buffer_history

Open

LU-17296 NRS TBF default rules

Open

LU-18192 TBF: generic combination TBF types with different granularities

Open

Activity

People

Assignee:: WC Triage

Reporter:: Andreas Dilger

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 05/Feb/24 2:08 AM

Updated:: 03/Sep/24 6:01 PM