Description
The TBF rate is related to Lustre CPU partition, that mean that the same TBF class can exist on several CPTs. So the maximum rate with several clients is: rate * nbr_cpt
e.g:
If we apply on an OSS:
lctl set_param ost.OSS.ost_io.nrs_tbf_rule="start rule_fio jobid={*} rate=3000
The job rate with 1 node will be limited 3000 RPC/s. But if the OSS is configured with 4 CPT and the job uses more than 4 nodes (or LNet routers when compute nodes are behind), the maximum rate is :
3000 * 4 = 12000 RPC/s.
I think this was originally design to keep the CPT independents and avoid lock contentions. But this can be mitigated by using rhashtable with rcu_lock to share tokens between CPT.