Details
-
Bug
-
Resolution: Unresolved
-
Medium
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
Feb 26 05:30:00 oleg245-client kernel: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:289 Feb 26 05:30:00 oleg245-client kernel: in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 37, name: kworker/u8:2 Feb 26 05:30:00 oleg245-client kernel: CPU: 0 PID: 37 Comm: kworker/u8:2 Kdump: loaded Tainted: G O -------- - - 4.18.0rh8.10-debug #2 Feb 26 05:30:00 oleg245-client kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-8.fc42 06/10/2025 Feb 26 05:30:00 oleg245-client kernel: Workqueue: ptlrpc_pinger ptlrpc_pinger_main [ptlrpc] Feb 26 05:30:00 oleg245-client kernel: Call Trace: Feb 26 05:30:00 oleg245-client kernel: ? dump_stack+0xbb/0x10e Feb 26 05:30:00 oleg245-client kernel: ? ___might_sleep.cold.92+0xd9/0x107 Feb 26 05:30:00 oleg245-client kernel: ? __might_sleep+0x59/0xc0 Feb 26 05:30:00 oleg245-client kernel: ? mutex_lock+0x24/0x70 Feb 26 05:30:00 oleg245-client kernel: ? lnet_peerni_by_nid_locked+0x7f/0x1c0 [lnet] Feb 26 05:30:00 oleg245-client kernel: ? LNetPeerDiscovered+0x78/0x460 [lnet] Feb 26 05:30:00 oleg245-client kernel: ? import_select_connection+0x2ad/0xed0 [ptlrpc] Feb 26 05:30:00 oleg245-client kernel: ? ptlrpc_connect_import_locked+0x49c/0x1070 [ptlrpc] Feb 26 05:30:00 oleg245-client kernel: ? rpc_make_runnable+0xb5/0xd0 Feb 26 05:30:00 oleg245-client kernel: ? inet_recvmsg+0x81/0x180 Feb 26 05:30:00 oleg245-client kernel: ? update_load_avg+0x9f/0xa40 Feb 26 05:30:00 oleg245-client kernel: ? xs_poll_check_readable+0x38/0xb0 Feb 26 05:30:00 oleg245-client kernel: ? ptlrpc_pinger_main+0x709/0xf20 [ptlrpc] Feb 26 05:30:00 oleg245-client kernel: ? process_one_work+0x2c8/0x700 Feb 26 05:30:00 oleg245-client kernel: ? worker_thread+0x296/0x6e0 Feb 26 05:30:00 oleg245-client kernel: ? rescuer_thread+0x570/0x570 Feb 26 05:30:00 oleg245-client kernel: ? kthread+0x1d1/0x200 Feb 26 05:30:00 oleg245-client kernel: ? set_kthread_struct+0x70/0x70 Feb 26 05:30:00 oleg245-client kernel: ? ret_from_fork+0x1f/0x30
this happens in import_select_connection() inside spinlock-protected loop across import connection when it calls LNetPeerDiscovered(). The latter was changed in past to use lnet_peerni_by_nid_locked() which in turn may sleep when takin mutex in slow path. This code need to be changed to don't refresh connection uptodate state inside loop.
Possible way to fix that:
- does first loop iteration without entering slow path,
- if all uptodated conn are tried, refresh remaining non-uptodated connection status by slow path without imp_lock taken
Attachments
Issue Links
- is related to
-
LU-19437 LNet discovery may remove peer added by Lustre
-
- Resolved
-