[LU-17158] TBF rate should not be based on CPT - Whamcloud Community JIRA

Details

Type: Improvement
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: None
Labels:
- lad24dd
- tbf

Epic/Theme:
- QoS-TBF
Severity:
3
Rank (Obsolete):
9223372036854775807

Description

The TBF rate is related to Lustre CPU partition, that mean that the same TBF class can exist on several CPTs. So the maximum rate with several clients is: rate * nbr_cpt

e.g:
If we apply on an OSS:

lctl set_param ost.OSS.ost_io.nrs_tbf_rule="start rule_fio jobid={*} rate=3000

The job rate with 1 node will be limited 3000 RPC/s. But if the OSS is configured with 4 CPT and the job uses more than 4 nodes (or LNet routers when compute nodes are behind), the maximum rate is :
3000 * 4 = 12000 RPC/s.

I think this was originally design to keep the CPT independents and avoid lock contentions. But this can be mitigated by using rhashtable with rcu_lock to share tokens between CPT.

Attachments

Activity

[LU-17158] TBF rate should not be based on CPT

Andreas Dilger added a comment - 22/Feb/25 9:39 AM

qian_wc the current problem with NRS + CPT is that the rates that are specified for e.g. TBF rules are per-CPT which is not what users expect, and can be unevenly applied if the clients are not using each CPT uniformly. While I don't think we have to remove CPT from NRS completely, there are two changes that should be done to make this behavior more what users expect:

the specified rate/tokens for a rule should be the global rate across all CPTs, and divided by the number of CPTs when assigned to a bucket. This ensures that the total rate in one interval is the rate that was specified, and does not depend on the NUMA/CPT configuration on a server node.
there should be balancing between the TBF buckets in different CPTs within an interval, so that all tokens can be used if needed. This balancing doesn't have to be perfect, but could definitely be improved significantly over not balancing at all. For example, if a CPT bucket runs out of tokens, it could check the buckets in other CPTs until it finds one with tokens, and take 1/2 of the remaining tokens.

Andreas Dilger added a comment - 22/Feb/25 9:39 AM qian_wc the current problem with NRS + CPT is that the rates that are specified for e.g. TBF rules are per-CPT which is not what users expect, and can be unevenly applied if the clients are not using each CPT uniformly. While I don't think we have to remove CPT from NRS completely, there are two changes that should be done to make this behavior more what users expect: the specified rate/tokens for a rule should be the global rate across all CPTs, and divided by the number of CPTs when assigned to a bucket. This ensures that the total rate in one interval is the rate that was specified, and does not depend on the NUMA/CPT configuration on a server node. there should be balancing between the TBF buckets in different CPTs within an interval, so that all tokens can be used if needed. This balancing doesn't have to be perfect, but could definitely be improved significantly over not balancing at all. For example, if a CPT bucket runs out of tokens, it could check the buckets in other CPTs until it finds one with tokens, and take 1/2 of the remaining tokens.

Qian Yingjin added a comment - 24/Dec/24 2:09 AM - edited

I do not think we can get rid of CPT, as the NRS schedule framework is based on CPT.
If we want to make TBF rate not based on CPT, this means lots of NRS framework reconstruction.

Or Etenne, could you please present your design idea in more details?

Qian Yingjin added a comment - 24/Dec/24 2:09 AM - edited I do not think we can get rid of CPT, as the NRS schedule framework is based on CPT. If we want to make TBF rate not based on CPT, this means lots of NRS framework reconstruction. Or Etenne, could you please present your design idea in more details?

Etienne Aujames added a comment - 23/Sep/24 9:30 AM

adilger, I am not planing to work on this now, it can go on the Developer ticket list.
But note, that this issue does not apply for "tbf nid" policy because NID are assigned to a CPT.

Etienne Aujames added a comment - 23/Sep/24 9:30 AM adilger , I am not planing to work on this now, it can go on the Developer ticket list. But note, that this issue does not apply for "tbf nid" policy because NID are assigned to a CPT.

Andreas Dilger added a comment - 20/Sep/24 5:52 AM

eaujames, I see that this is assigned to you, is it something you plan to start working on, or should it go into the LAD Developer Day candidate list?

Andreas Dilger added a comment - 20/Sep/24 5:52 AM eaujames , I see that this is assigned to you, is it something you plan to start working on, or should it go into the LAD Developer Day candidate list?

Andreas Dilger added a comment - 29/Sep/23 3:38 PM

As a starting point it would make sense to divide the rate by the CPT count so that the total rate remains the same regardless of how many CPTs are configured. Being able to share tokens between CPTs is an improvement beyond that.

Andreas Dilger added a comment - 29/Sep/23 3:38 PM As a starting point it would make sense to divide the rate by the CPT count so that the total rate remains the same regardless of how many CPTs are configured. Being able to share tokens between CPTs is an improvement beyond that.

People

Assignee:: Qian Yingjin

Reporter:: Etienne Aujames

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 29/Sep/23 3:28 PM

Updated:: 22/Feb/25 9:39 AM