Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17158

TBF rate should not be based on CPT

XMLWordPrintable

    • 3
    • 9223372036854775807

      The TBF rate is related to Lustre CPU partition, that mean that the same TBF class can exist on several CPTs. So the maximum rate with several clients is: rate * nbr_cpt

      e.g:
      If we apply on an OSS:

      lctl set_param ost.OSS.ost_io.nrs_tbf_rule="start rule_fio jobid={*} rate=3000
      

      The job rate with 1 node will be limited 3000 RPC/s. But if the OSS is configured with 4 CPT and the job uses more than 4 nodes (or LNet routers when compute nodes are behind), the maximum rate is :
      3000 * 4 = 12000 RPC/s.

      I think this was originally design to keep the CPT independents and avoid lock contentions. But this can be mitigated by using rhashtable with rcu_lock to share tokens between CPT.

            qian_wc Qian Yingjin
            eaujames Etienne Aujames
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated: