Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15251

tbf gid rules ignored on MDS

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • Lustre 2.12.7
    • None
    • CentOS 7.9
    • 3
    • 9223372036854775807

      Hello! Today we enabled tbf gid on Oak storage, both on MDS and OSS and noticed that new rules on MDS are not enforced. Only the rules with gid={0} and the default rule {*} seem to be utilized. All other gid-specific rules are ignored.

      We used "tbf uid" before on this system. We disabled it by switching back to "fifo" first, and then enabled "tbf gid". Something like that:

      lctl set_param mds.MDS.mdt.nrs_policies="tbf gid"
      lctl set_param mds.MDS.mdt_readpage.nrs_policies="tbf gid"
      
      lctl set_param mds.MDS.mdt.nrs_tbf_rule="start root gid={0} rate=10000"
      lctl set_param mds.MDS.mdt_readpage.nrs_tbf_rule="start root gid={0} rate=10000"
      
      lctl set_param mds.MDS.mdt.nrs_tbf_rule="change default rate=1000"
      
      ... then we added rules per GID (400+)...
      
      [root@oak-md1-s2 ~]# lctl get_param mds.MDS.mdt.nrs_tbf_rule
      mds.MDS.mdt.nrs_tbf_rule=
      regular_requests:
      CPT 0:
      scg_prj_mvp {7456} 803, ref 0
      scg_lab_twc {7122} 638, ref 0
      scg_lab_mg1 {9159} 607, ref 0
      scg_lab_irv {7152} 607, ref 0
      scg_prj_scgs {7458} 709, ref 0
      scg_prj_rttp {10137} 605, ref 0
      scg_prj_pcgp {7450} 610, ref 0
      ... many other rules with ref 0...
      ruthm {3199} 640, ref 0
      yiorgo {3367} 1800, ref 0
      root {0} 10000, ref 29                <<<
      default {*} 1000, ref 195            <<<
       

       
      The policy is started:

        - name: tbf gid
          state: started
          fallback: no
          queued: 2                   
          active: 0 
      

       

      A user in a defined GID rule, for example I tested from GID 3199, is limited by the default rule (I tested by lowering the default rule {*}'s value of 1000 to 10 for the test and immediately noticed throttling. So the rule "ruthm {3199} 640, ref 0" above seems to be just ignored.

      Per-GID rules are only defined for the mdt and mdt_readpage services in my case, not all of them.

      On the OSS, the configuration is similar for the ost and ost_io services and per-GID rules are working as expected.

      Servers and clients are running Lustre 2.12.7.

      Attaching rpctrace debug output on MDS as oak-md1-s2_rpctrace_tbf_gid.dk.log.gz

            lixi_wc Li Xi
            sthiell Stephane Thiell
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: