[LU-11968] tbf QOS gid rules not being enforced Created: 13/Feb/19  Updated: 02/Mar/19  Resolved: 28/Feb/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Mahmoud Hanafi Assignee: Li Xi
Resolution: Not a Bug Votes: 0
Labels: None

Attachments: File tbf_debug.out.gz    
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Setting up tbf qos for ost_io like this

lctl set_param ost.OSS.ost_io.nrs_policies="tbf gid"
lctl set_param ost.OSS.ost_io.nrs_tbf_rule="start viz gid={1128} rate=10000"
lctl set_param ost.OSS.ost_io.nrs_tbf_rule="start css gid={1125} rate=10000"
lctl set_param ost.OSS.ost_io.nrs_tbf_rule="change default rate=100"

Running IOR as a user with group css, the writes are correctly throttled by the css rule. But read are throttled by the default rate.

Lowering the default rate decrease reads bw, and increasing the default rate increases read bw.

IOR is getting 8GB/sec Writes and 1.4GB/sec reads. It should be 8GB/sec for read and writes.

 srv2 /sys/kernel/debug/lustre/ost/OSS/ost_io # cat nrs_tbf_rule
regular_requests:
CPT 0:
css {1125} 10000, ref 1
viz {1128} 10000, ref 0
default {*} 100, ref 1
CPT 1:
css {1125} 10000, ref 1
viz {1128} 10000, ref 0
default {*} 100, ref 1
CPT 2:
css {1125} 10000, ref 1
viz {1128} 10000, ref 0
default {*} 100, ref 1
CPT 3:
css {1125} 10000, ref 1
viz {1128} 10000, ref 0
default {*} 100, ref 1
CPT 4:
css {1125} 10000, ref 1
viz {1128} 10000, ref 0
default {*} 100, ref 1


 Comments   
Comment by Mahmoud Hanafi [ 13/Feb/19 ]

I have attached debug logs that shows switch from write to read and how it picks the wrong rule.

Comment by Peter Jones [ 14/Feb/19 ]

Li Xi

Could you please investigate?

Thanks

Peter

Comment by Li Xi [ 14/Feb/19 ]
osc_build_rpc
 cl_req_attr_set
  coo_req_attr_set
   vvp_req_attr_set
    obdo_from_inode
     dst->o_gid = from_kgid(&init_user_ns, src->i_gid);
    
osc_brw_prep_request
 body->oa.o_uid = oa->o_uid;
 body->oa.o_gid = oa->o_gid;

nrs_tbf_id_cli_set
 ost_tbf_id_cli_set
  id->ti_uid = body->oa.o_uid;
  id->ti_gid = body->oa.o_gid;

obdo_from_inode
 dst->o_uid = from_kuid(&init_user_ns, src->i_uid);
 obdo_from_la

I am trying to reproduce and writing debug patch.

Comment by Gerrit Updater [ 14/Feb/19 ]

Li Xi (lixi@ddn.com) uploaded a new patch: https://review.whamcloud.com/34257
Subject: LU-11968 ptlrpc: debug
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f23cf308512a0d9e3ef82ccac67fde23d7e9cfc3

Comment by Li Xi [ 14/Feb/19 ]

Hi Mahmoud, do you mind to apply the patch 34257 and check whether the rule is matched as expected. As far as I test, there is nothing strange in my environment:

# lctl set_param ost.OSS.ost_io.nrs_policies="tbf gid"
# lctl set_param ost.OSS.ost_io.nrs_tbf_rule="start viz gid={1000} rate=10000"
# su - test
$ dd if=/mnt/lustre/file of=/dev/null bs=1048576
Feb 14 23:07:43 server17-el7-vm1 kernel: LustreError: 14327:0:(vvp_object.c:218:vvp_req_attr_set()) cra_type 0, gid 1000
Feb 14 23:07:43 server17-el7-vm1 kernel: LustreError: 14327:0:(osc_request.c:1383:osc_brw_prep_request()) gid 1000
Feb 14 23:07:43 server17-el7-vm1 kernel: LustreError: 14431:0:(nrs_tbf.c:1534:ost_tbf_id_cli_set()) ost_tbf_id_cli_set gid: 1000
Feb 14 23:07:43 server17-el7-vm1 kernel: LustreError: 14431:0:(nrs_tbf.c:263:nrs_tbf_rule_match()) rule [viz] matches ID [1000
Comment by Mahmoud Hanafi [ 28/Feb/19 ]

This is not a bug and can be close. I was using a 2.11 client.

 

Comment by Peter Jones [ 28/Feb/19 ]

ok thanks

Generated at Sat Feb 10 02:48:31 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.