Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17158

TBF rate should not be based on CPT

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      The TBF rate is related to Lustre CPU partition, that mean that the same TBF class can exist on several CPTs. So the maximum rate with several clients is: rate * nbr_cpt

      e.g:
      If we apply on an OSS:

      lctl set_param ost.OSS.ost_io.nrs_tbf_rule="start rule_fio jobid={*} rate=3000
      

      The job rate with 1 node will be limited 3000 RPC/s. But if the OSS is configured with 4 CPT and the job uses more than 4 nodes (or LNet routers when compute nodes are behind), the maximum rate is :
      3000 * 4 = 12000 RPC/s.

      I think this was originally design to keep the CPT independents and avoid lock contentions. But this can be mitigated by using rhashtable with rcu_lock to share tokens between CPT.

      Attachments

        Activity

          People

            core-lustre-triage Core Lustre Triage
            eaujames Etienne Aujames
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: