[LU-15251] tbf gid rules ignored on MDS Created: 19/Nov/21 Updated: 19/Nov/21 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.7 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Stephane Thiell | Assignee: | Li Xi |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
CentOS 7.9 |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Hello! Today we enabled tbf gid on Oak storage, both on MDS and OSS and noticed that new rules on MDS are not enforced. Only the rules with gid={0} and the default rule {*} seem to be utilized. All other gid-specific rules are ignored. We used "tbf uid" before on this system. We disabled it by switching back to "fifo" first, and then enabled "tbf gid". Something like that: lctl set_param mds.MDS.mdt.nrs_policies="tbf gid"
lctl set_param mds.MDS.mdt_readpage.nrs_policies="tbf gid"
lctl set_param mds.MDS.mdt.nrs_tbf_rule="start root gid={0} rate=10000"
lctl set_param mds.MDS.mdt_readpage.nrs_tbf_rule="start root gid={0} rate=10000"
lctl set_param mds.MDS.mdt.nrs_tbf_rule="change default rate=1000"
... then we added rules per GID (400+)...
[root@oak-md1-s2 ~]# lctl get_param mds.MDS.mdt.nrs_tbf_rule
mds.MDS.mdt.nrs_tbf_rule=
regular_requests:
CPT 0:
scg_prj_mvp {7456} 803, ref 0
scg_lab_twc {7122} 638, ref 0
scg_lab_mg1 {9159} 607, ref 0
scg_lab_irv {7152} 607, ref 0
scg_prj_scgs {7458} 709, ref 0
scg_prj_rttp {10137} 605, ref 0
scg_prj_pcgp {7450} 610, ref 0
... many other rules with ref 0...
ruthm {3199} 640, ref 0
yiorgo {3367} 1800, ref 0
root {0} 10000, ref 29 <<<
default {*} 1000, ref 195 <<<
- name: tbf gid
state: started
fallback: no
queued: 2
active: 0
A user in a defined GID rule, for example I tested from GID 3199, is limited by the default rule (I tested by lowering the default rule {*}'s value of 1000 to 10 for the test and immediately noticed throttling. So the rule "ruthm {3199} 640, ref 0" above seems to be just ignored. Per-GID rules are only defined for the mdt and mdt_readpage services in my case, not all of them. On the OSS, the configuration is similar for the ost and ost_io services and per-GID rules are working as expected. Servers and clients are running Lustre 2.12.7. Attaching rpctrace debug output on MDS as oak-md1-s2_rpctrace_tbf_gid.dk.log.gz |
| Comments |
| Comment by Peter Jones [ 19/Nov/21 ] |
|
Li Xi Could you please advise? Thanks Peter |