Details
-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
Lustre 2.16.0, Lustre 2.17.0, Lustre 2.16.1, Lustre 2.15.7
-
None
-
3
-
9223372036854775807
Description
[Sat Nov 23 23:41:34 UTC 2024] Call trace: [Sat Nov 23 23:41:34 UTC 2024] kthread_should_stop+0x18/0x40 [Sat Nov 23 23:41:34 UTC 2024] obd_get_mod_rpc_slot+0x10c/0x43c [obdclass] [Sat Nov 23 23:41:34 UTC 2024] ptlrpc_get_mod_rpc_slot+0x38/0x60 [ptlrpc] [Sat Nov 23 23:41:34 UTC 2024] mdc_close+0x21c/0xe64 [mdc] [Sat Nov 23 23:41:34 UTC 2024] lmv_close+0x1a8/0x480 [lmv] [Sat Nov 23 23:41:34 UTC 2024] ll_close_inode_openhandle+0x404/0xcc8 [lustre] [Sat Nov 23 23:41:34 UTC 2024] ll_md_real_close+0xa4/0x280 [lustre] [Sat Nov 23 23:41:34 UTC 2024] ll_clear_inode+0x1a0/0x7e0 [lustre] [Sat Nov 23 23:41:34 UTC 2024] ll_delete_inode+0x70/0x260 [lustre]
Rootcause Analysis
- Kernel try to start new kthread from kthreadd by alloc_thread_stack_node
- Ran out of memory, try to clean up inode cache
- In obd_get_mod_rpc_slot, unfortunately in flight rpcs is full. So trying to put into sleep using wait_woken
- In kthread_should_stop -> to_kthread, it tried to read set_child_tid but it's null. It's expected since the task_struct here is kthreadd, which is still the parent one, because the allocation have not yet completed
Repro
- Create a dummy obd device
- Modify current task_struct,
set_child_tid: 0000000000000000, current->flags: 208840, set max_in_flight_mod_rpcs to 0 so that it will be put into sleep all the time - call obd_get_mod_rpc_slot and get kernel panic
[ 1621.881498] Call trace: [ 1621.881915] kthread_should_stop+0x18/0x40 [ 1621.882627] obd_get_mod_rpc_slot+0x10c/0x43c [obdclass] [ 1621.883502] test_obd_rpc_slot+0xdc/0x270 [task_mod] [ 1621.884317] task_mod_init+0x70/0x1000 [task_mod]
Proposed fix
Skip the waiting part in obd_get_mod_rpc_slot since we know it will cause kernel panic.
During the normal flow, the process will be put into sleep and be woken up by claim_mod_rpc_function.
2231 avail = cli->cl_mod_rpcs_in_flight < cli->cl_max_mod_rpcs_in_flight || 2232 (close_req && cli->cl_close_rpcs_in_flight == 0); 2233 if (avail) { 2234 cli->cl_mod_rpcs_in_flight++; 2235 if (close_req) 2236 cli->cl_close_rpcs_in_flight++; 2237 ret = woken_wake_function(wq_entry, mode, flags, key); 2238 w->woken = true; 2239 } else if (cli->cl_close_rpcs_in_flight)
In this special case, the process is kthreadd, which will trigger kernel panic if put into sleep. So the modified logic will looks similar to close_req, which simply get a guaranteed slot by
2235 if (close_req) cli->cl_close_rpcs_in_flight++;
The fix will be a similar process, except it will skip the whole enqueue process as well.
if (! ((current->flags & PF_KTHREAD) && current->set_child_tid) ){ // Skip wait_woken as it will cause kernel panic. Grant it a slot. cli->cl_mod_rpcs_in_flight++; } else { // ... enqueue and wake_woken logic