[LU-13037] print tbf stats Created: 30/Nov/19 Updated: 01/Apr/21 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major |
| Reporter: | Mahmoud Hanafi | Assignee: | Li Xi |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
We would like a way to dump current tbf stats. |
| Comments |
| Comment by Peter Jones [ 30/Nov/19 ] |
|
Li xi How possible is this with the current design of TBF? Peter |
| Comment by Li Xi [ 06/Dec/19 ] |
|
I am working on a patch that works similar like jobstat, but that is going to take a while. |
| Comment by Li Xi [ 07/Dec/19 ] |
|
Two entries are added into each service: nrs_tbf_stats_reg and nrs_tbf_stats_hp. The first one for regular requests, and the second one for high priority requests. And TBF information of all client classifications will be dumped from each entry. Following is an example of the dumped information: # cat /sys/kernel/debug/lustre/ost/OSS/ost_io/nrs_tbf_stats_hp - key: _10.0.1.253@tcp_10_0_0 refs: 0 rule: default rpc_rate: 10000 ntoken: 2 token_depth: 3 - key: _10.0.1.253@tcp_4_0_0 refs: 0 rule: default rpc_rate: 10000 ntoken: 2 token_depth: 3 - key: dd.0_10.0.1.253@tcp_10_0_0 refs: 0 rule: default rpc_rate: 10000 ntoken: 2 token_depth: 3 - key: dd.0_10.0.1.253@tcp_4_0_0 refs: 9 rule: default rpc_rate: 10000 ntoken: 2 token_depth: 3 # cat /sys/kernel/debug/lustre/ost/OSS/ost_io/nrs_tbf_stats_reg - key: _10.0.1.253@tcp_10_0_0 refs: 0 rule: default rpc_rate: 10000 ntoken: 2 token_depth: 3 - key: _10.0.1.253@tcp_4_0_0 refs: 0 rule: default rpc_rate: 10000 ntoken: 2 token_depth: 3 - key: dd.0_10.0.1.253@tcp_10_0_0 refs: 0 rule: default rpc_rate: 10000 ntoken: 2 token_depth: 3 - key: dd.0_10.0.1.253@tcp_4_0_0 refs: 9 rule: default rpc_rate: 10000 ntoken: 2 token_depth: 3 |
| Comment by Gerrit Updater [ 07/Dec/19 ] |
|
Li Xi (lixi@ddn.com) uploaded a new patch: https://review.whamcloud.com/36950 |
| Comment by Li Xi [ 07/Dec/19 ] |
|
mhanafi Please feel free to let me know whether the dumped information is what you need. |
| Comment by Peter Jones [ 14/Dec/19 ] |
|
mhanafi what do you think? |
| Comment by Mahmoud Hanafi [ 26/May/20 ] |
|
Why do we get more than 1 stats for a specific cpt and queue_type. Here for uid 929411059 we see for cpt=0 and queue_type=reg we get 2 and for cpt=0 and queue_type=hp we get 2. nbp13-srv1 /sys/kernel/debug/lustre/ost/OSS/ost_io # cat /sys/kernel/debug/lustre/ost/OSS/ost_io/nrs_tbf_stats| grep -A 4 929411059 - uid: 929411059 cpt: 0 queue_type: hp refs: 1 rule: default -- - uid: 929411059 cpt: 0 queue_type: hp refs: 1 rule: default -- - uid: 929411059 cpt: 0 queue_type: reg refs: 1 rule: default -- - uid: 929411059 cpt: 0 queue_type: reg refs: 1 rule: default -- Later for cpt=8 we only 1 for hp and 2 for reg. - uid: 929411059 cpt: 8 queue_type: hp refs: 1 rule: default -- - uid: 929411059 cpt: 8 queue_type: reg refs: 1 rule: default -- - uid: 929411059 cpt: 8 queue_type: reg refs: 1 rule: default |
| Comment by Peter Jones [ 08/Aug/20 ] |
|
mhanafi I noticed this week that you are carrying this patch in your distribution. Sorry that we missed your question above. Have you had any other questions/comments about using this change? Do you think that we should proceed with landing it in its current form or is more work required? |
| Comment by Li Xi [ 10/Aug/20 ] |
|
Sorry for late reply.
Understood comparing to this, a single stats would be easier to understand. However, this is determined by the internal design and implementation of request handling of Lustre which have good reasons too. And TBF has no other choice but to use and depend on it. Lustre seperate requests into two different types, regular (reg) requests and high priority (hp) requests. And handling of these two types of requests are seperated in order to make sure high priority requests won't be blocked by many regular requests. And Lustre divide CPUs into different patitions (cpt). Each patition handles RPC requests independently. Because of these existing mechansim, TBF have to set the RPC rate limitations seperately (maybe with the same values), and the stats are seperated for each cpt and request type. |
| Comment by Mahmoud Hanafi [ 10/Aug/20 ] |
|
I understand that we have hp and reg. But We get 2 request for the same cpt and queue_type type - uid: 929411059 cpt: 0 queue_type: reg refs: 1 rule: default -- - uid: 929411059 cpt: 0 queue_type: reg refs: 1 rule: default -- Yes Peter we would like to get this landed. |
| Comment by Li Xi [ 10/Aug/20 ] |
|
mhanafi Sorry for the misunderstanding. I found a bug of the patch. Not sure whether that is the cause of the duplicated outputs. The patch will be refreshed soon anyway. |
| Comment by Qian Yingjin [ 13/Aug/20 ] |
|
Hi Mahmoud, Thanks, |
| Comment by Jay Lan (Inactive) [ 13/Aug/20 ] |
|
I had compilation error in lustre-2.12.4 against CentOS 7.7 kernel : Making all in . I do not know how I got those "unrecgonized command line option". Before applying this patch it was compiled fine. |
| Comment by Qian Yingjin [ 14/Aug/20 ] |
|
I build it based on the latest master branch. To make it pass the build for 2.12.4, you just need to modify it with:
seq_printf(p, "%llu\n", cli->tc_rpc_rate);
Regards, |
| Comment by Peter Jones [ 05/Sep/20 ] |
|
jaylan any updates on testing this patch? |
| Comment by Peter Jones [ 01/Apr/21 ] |
|
mhanafi jaylan you have not provided any feedback as to whether this patch meets your requirements. However, rumour has it that you are carrying this patch - does this mean that you can now provide us some feedback as to whether this patch is useful and should proceed with landing? |
| Comment by Mahmoud Hanafi [ 01/Apr/21 ] |
|
We have been using the patch. But I think it needs addition work to be more useful. We will need to think about how it could improved. |