[LU-14976] Changing tbf policy induces high CPU load Created: 01/Sep/21 Updated: 24/May/23 Resolved: 22/Apr/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Etienne Aujames | Assignee: | Etienne Aujames |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Centos 7 VMs on Lustre 2.14 |
||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
Reproducer:
mds.MDS.mdt.nrs_policies=
regular_requests:
- name: fifo
state: started
fallback: yes
queued: 0
active: 0
- name: crrn
state: stopped
fallback: no
queued: 0
active: 0
- name: tbf
state: started
fallback: no
queued: 1
active: 0
- name: delay
state: stopped
fallback: no
queued: 0
active: 0
When we try to change the policy to fifo, the proccess is block to "stopping" state:
mds.MDS.mdt.nrs_policies=
regular_requests:
- name: fifo
state: started
fallback: yes
queued: 0
active: 0
- name: crrn
state: stopped
fallback: no
queued: 0
active: 0
- name: tbf
state: stopping
fallback: no
queued: 1
active: 0
- name: delay
state: stopped
fallback: no
queued: 0
active: 0
Analyse: It seems that when we change tbf policy ("tbf gid" -> "tbf"), old rpc queued inside "tbf gid" became inaccessible to ptlrpc threads. ptlrpc_wait_event wake up when an rpc is availabled to enqueue. But in that case ptlrpc thread is unable to enqueue the request, so it wake up all the time (causing the cpu load). 00000100:00000001:1.0:1630509978.890060:0:4749:0:(service.c:2029:ptlrpc_server_request_get()) Process leaving (rc=0 : 0 : 0) 00000100:00000001:0.0:1630509978.890060:0:5580:0:(service.c:2008:ptlrpc_server_request_get()) Process entered 00000100:00000001:2.0:1630509978.890061:0:5653:0:(service.c:2029:ptlrpc_server_request_get()) Process leaving (rc=0 : 0 : 0) 00000100:00000001:2.0:1630509978.890061:0:5653:0:(service.c:2248:ptlrpc_server_handle_request()) Process leaving (rc=0 : 0 : 0) 00000100:00000001:1.0:1630509978.890061:0:4749:0:(service.c:2248:ptlrpc_server_handle_request()) Process leaving (rc=0 : 0 : 0) 00000100:00000001:0.0:1630509978.890061:0:5580:0:(service.c:2029:ptlrpc_server_request_get()) Process leaving (rc=0 : 0 : 0) 00000100:00000001:0.0:1630509978.890061:0:5580:0:(service.c:2248:ptlrpc_server_handle_request()) Process leaving (rc=0 : 0 : 0) 00000100:00000001:1.0:1630509978.890062:0:4749:0:(service.c:2244:ptlrpc_server_handle_request()) Process entered 00000100:00000001:1.0:1630509978.890062:0:4749:0:(service.c:2008:ptlrpc_server_request_get()) Process entered 00000100:00000001:2.0:1630509978.890063:0:5653:0:(service.c:2244:ptlrpc_server_handle_request()) Process entered 00000100:00000001:2.0:1630509978.890063:0:5653:0:(service.c:2008:ptlrpc_server_request_get()) Process entered 00000100:00000001:1.0:1630509978.890063:0:4749:0:(service.c:2029:ptlrpc_server_request_get()) Process leaving (rc=0 : 0 : 0) 00000100:00000001:0.0:1630509978.890063:0:5580:0:(service.c:2244:ptlrpc_server_handle_request()) Process entered 00000100:00000001:2.0:1630509978.890064:0:5653:0:(service.c:2029:ptlrpc_server_request_get()) Process leaving (rc=0 : 0 : 0) On my VM for one mdt thread ptlrpc_server_handle_request() is called with 300kHz frequency (doing nothing). |
| Comments |
| Comment by Andreas Dilger [ 02/Sep/21 ] |
|
My guess is that the RPCs are only connected to the old NRS type, and then "fetching" RPCs to process with a new NRS type returns nothing. What needs to happen in the very rare case that the NRS type is changed at runtime is either:
My preference would be #2, because this only adds overhead on the rare case when the NRS type is changed, rather than adding overhead for fetching every RPC from the queue. However, looking at ptlrpc_server_request_get->ptlrpc_nrs_req_get_nolock0() it would appear that #1 is supposed to be handling this case properly:
/**
* Always try to drain requests from all NRS polices even if they are
* inactive, because the user can change policy status at runtime.
*/
list_for_each_entry(policy, &nrs->nrs_policy_queued, pol_list_queued) {
nrq = nrs_request_get(policy, peek, force);
but that doesn't seem to be working properly (only nrs_tbf_req_get() appears in the flame graph). It may be that the "nrs gid" queue internal to the TBF policy itself is not making those RPCs available? eaujames, to step back a minute, what is the reason to change the NRS policy type while the system is in use? Is this just something you hit during benchmarking? The NRS policy type should basically never change during the lifetime of a system. |
| Comment by Gerrit Updater [ 02/Sep/21 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/44817 |
| Comment by Andreas Dilger [ 02/Sep/21 ] |
|
My above patch does not make any attempt to fix this problem, just cleans up code for the "nrs_policies" parameter, and other parameters in this file, so that this code is easier to find. |
| Comment by Etienne Aujames [ 02/Sep/21 ] |
|
This issue occurred on a filesystem in production. Here the context: |
| Comment by Gerrit Updater [ 27/Oct/21 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44817/ |
| Comment by Gerrit Updater [ 12/Sep/22 ] |
|
"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/48523 |
| Comment by Gerrit Updater [ 22/Apr/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48523/ |
| Comment by Peter Jones [ 22/Apr/23 ] |
|
Landed for 2.16 |
| Comment by Gerrit Updater [ 24/May/23 ] |
|
"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51118 |
| Comment by Gerrit Updater [ 24/May/23 ] |
|
"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51119 |