[LU-16007] NRS Jobid default RPC aggregation - Whamcloud Community JIRA

Details

Type: Improvement
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Labels:
- medium

Rank (Obsolete):
9223372036854775807

Description

It would be useful to have a default NRS TBF rule that batched IOs on the servers based on their JobID, in a manner similar to how ORR or CRR batch IOs based on the Object ID or client NID.

Aggregating and completing IOs from a specific job is beneficial for aggregate IO throughput optimization. While this is locally unfair for one job to be prioritized over another, it avoids both jobs from being slowed down while their IOs are competing with each other and being interleaved "fairly" to the storage. Prioritizing any one JobID would at least allow it to finish its IO first, and then get on with computation (presumably no longer generating IO) and the other JobIDs can complete their IO with less contention. The IO completion time would probably be comparable for the last JobID, but may even be improved if the reduction in contention allows the IO to be more efficient (i.e. read- or write-only workload vs. mixed read/write from multiple jobs).

An implementation challenge would be to ensure that the same JobID is prioritized across all MDTs/OSTs, so that one job actually finishes its IO first, and does not have uneven completion time across targets. Self-balancing systems might do something (arbitrary) like prioritize IO based on lower JobID name, since this can be determined uniformly across targets without any central control. De-prioritized jobs would get a "credit" for a later priority boost (e.g. GIFT: A Coupon Based Throttle-and-Reward Mechanism for Fair and Efficient I/O Bandwidth Management on Parallel Storage Systems) so that overall the IO is fair and "cp" does not always have priority over "dd" if the JobID is "procname_uid".

Attachments

Issue Links

is related to

LU-17296 NRS TBF default rules

Open

LU-13031 store JobID of program that created file in inodes at create time

Resolved

is related to

LU-17166 add NRS TBF rule for projid

Open

Activity

[LU-16007] NRS Jobid default RPC aggregation

Andreas Dilger added a comment - 16/May/24 5:09 PM

LU-17166 is implementing TBF for projid, so that would be possible.

Andreas Dilger added a comment - 16/May/24 5:09 PM LU-17166 is implementing TBF for projid, so that would be possible.

Nathan Rutman added a comment - 16/May/24 4:47 PM

It would be nice to be able to batch based on project ID as well.

Nathan Rutman added a comment - 16/May/24 4:47 PM It would be nice to be able to batch based on project ID as well.

People

Assignee:: WC Triage

Reporter:: Andreas Dilger

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 13/Jul/22 2:42 AM

Updated:: 16/May/24 5:09 PM