Details
-
Bug
-
Resolution: Unresolved
-
Medium
-
None
-
Lustre 2.17.0, Lustre 2.16.1
-
None
-
3
-
9223372036854775807
Description
Looks like recent lnet landings (I think LU-15135 and LU-18555) broke some things in interop
assertion failure in sanity-lnet test 256:
[23676.200412] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Disable routing on trevis-107vm11 [23676.516990] Lustre: DEBUG MARKER: Disable routing on trevis-107vm11 [23676.752477] Lustre: DEBUG MARKER: /usr/sbin/lnetctl set routing 0 [23676.935020] LNetError: 817662:0:(lib-move.c:1199:lnet_return_rx_credits_locked()) ASSERTION( msg->msg_kiov != ((void *)0) ) failed: [23676.935070] LNetError: 817662:0:(lib-move.c:1199:lnet_return_rx_credits_locked()) LBUG [23676.935079] CPU: 0 PID: 817662 Comm: lnetctl Kdump: loaded Tainted: G OE ------- --- 5.14.0-427.31.1_lustre.el9.x86_64 #1 [23676.935082] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [23676.935087] Call Trace: [23676.935100] <TASK> [23676.935103] dump_stack_lvl+0x34/0x48 [23676.935154] lbug_with_loc.cold+0x5/0x58 [libcfs] [23676.935202] lnet_return_rx_credits_locked+0x437/0x600 [lnet] [23676.935260] lnet_msg_decommit_rx+0xcb/0x2c0 [lnet] [23676.935299] lnet_msg_decommit+0x79/0x200 [lnet] [23676.935336] lnet_complete_msg_locked+0x33/0x240 [lnet] [23676.935372] lnet_finalize+0xff/0x260 [lnet] [23676.935409] lnet_drop_routed_msgs_locked+0xa6/0xd0 [lnet] [23676.935447] lnet_rtrpool_free_bufs+0xaa/0x160 [lnet] [23676.935492] lnet_rtrpools_free.part.0+0x46/0x90 [lnet] [23676.935531] LNetCtl+0xc23/0x1d00 [lnet] [23676.935570] ? mutex_lock+0xe/0x30 [23676.935599] ? LNetNIInit+0x237/0x600 [lnet] [23676.935632] lnet_ioctl+0x23e/0x300 [lnet] [23676.935671] ? lnet_ioctl_getdata+0x147/0x790 [lnet] [23676.935709] lnet_psdev_ioctl+0x353/0x4c0 [lnet] [23676.935748] __x64_sys_ioctl+0x8a/0xc0 [23676.935780] do_syscall_64+0x5c/0x90 [23676.935790] ? exc_page_fault+0x62/0x150 [23676.935793] entry_SYSCALL_64_after_hwframe+0x72/0xdc [23676.935804] RIP: 0033:0x7f6a1490357b [23676.935862] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 75 68 0f 00 f7 d8 64 89 01 48 [23676.935864] RSP: 002b:00007ffd92712068 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [23676.935877] RAX: ffffffffffffffda RBX: 00007f6a14bf2fc0 RCX: 00007f6a1490357b [23676.935878] RDX: 00007ffd92712120 RSI: 00000000c0b86557 RDI: 0000000000000003 [23676.935879] RBP: 00000000c0b86557 R08: 00007ffd92712120 R09: 0000000000000000 [23676.935880] R10: 00007f6a14811d78 R11: 0000000000000202 R12: 00007ffd92712120 [23676.935881] R13: 00007f6a14bd136e R14: 00007ffd927120a0 R15: 00007f6a14c31000 [23676.935884] </TASK>
https://testing.whamcloud.com/test_sets/2a64bccf-e4b9-43f5-b384-c17ef711e31e
https://testing.whamcloud.com/test_sets/080a66f5-c061-40ad-8ac7-59cdece7416b
https://testing.whamcloud.com/test_sets/20637603-a12a-4e3c-a5ab-9baa1a02f02a
A similar crash in 2.15 sanity-lnet test 257:
[ 2484.224447] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Disable routing on onyx-150vm12 [ 2484.758406] Lustre: DEBUG MARKER: Disable routing on onyx-150vm12 [ 2485.065750] Lustre: DEBUG MARKER: /usr/sbin/lnetctl set routing 0 [ 2485.270108] LNetError: 255122:0:(lib-move.c:1205:lnet_return_rx_credits_locked()) ASSERTION( msg->msg_kiov != ((void *)0) ) failed: [ 2485.272442] LNetError: 255122:0:(lib-move.c:1205:lnet_return_rx_credits_locked()) LBUG [ 2485.273980] Pid: 255122, comm: lnetctl 4.18.0-553.53.1.el8_lustre.x86_64 #1 SMP Tue Jun 10 20:38:47 UTC 2025 [ 2485.275744] Call Trace TBD: [ 2485.276401] [<0>] libcfs_call_trace+0x63/0x90 [libcfs] [ 2485.277346] [<0>] lbug_with_loc+0x3f/0x70 [libcfs] [ 2485.278204] [<0>] lnet_return_rx_credits_locked+0x3c7/0x570 [lnet] [ 2485.279372] [<0>] lnet_msg_decommit+0x288/0x760 [lnet] [ 2485.280309] [<0>] lnet_finalize+0x3b2/0xa70 [lnet] [ 2485.281182] [<0>] lnet_drop_routed_msgs_locked+0xac/0xe0 [lnet] [ 2485.282236] [<0>] lnet_rtrpool_free_bufs+0xaa/0x160 [lnet] [ 2485.283227] [<0>] lnet_rtrpools_free+0x52/0x90 [lnet] [ 2485.284141] [<0>] LNetCtl+0x13d1/0x1bb0 [lnet] [ 2485.284963] [<0>] lnet_ioctl+0xa8/0x260 [lnet] [ 2485.285796] [<0>] notifier_call_chain+0x47/0x70 [ 2485.286652] [<0>] blocking_notifier_call_chain+0x42/0x60 [ 2485.287615] [<0>] libcfs_psdev_ioctl+0x34a/0x590 [libcfs] [ 2485.288579] [<0>] do_vfs_ioctl+0xa4/0x690 [ 2485.289329] [<0>] ksys_ioctl+0x64/0xa0 [ 2485.290023] [<0>] __x64_sys_ioctl+0x16/0x20 [ 2485.290791] [<0>] do_syscall_64+0x5b/0x1a0 [ 2485.291551] [<0>] entry_SYSCALL_64_after_hwframe+0x66/0xcb [ 2485.292547] Kernel panic - not syncing: LBUG
https://testing.whamcloud.com/test_sets/c34290ae-f8b9-40c4-a3d0-1bf43b57ece6