Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.15.6
-
3
-
9223372036854775807
Description
This issue was created by maloo for jianyu <yujian@whamcloud.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/cee7992f-b0ae-414f-b3b0-0a419a46c1d9
test_226 failed with the following error:
onyx-45vm6 crashed during sanity-lnet test_226
Test session details:
clients: https://build.whamcloud.com/job/lustre-master/4581 - 4.18.0-553.16.1.el8_10.x86_64
servers: https://build.whamcloud.com/job/lustre-b2_15/94 - 4.18.0-553.5.1.el8_lustre.x86_64
Console log on MDS:
Lustre: DEBUG MARKER: output="$(/usr/sbin/lnetctl route show --net tcp --gateway 10.240.23.245@tcp1 2>/dev/null)"; if [[ -n "${output}" ]]; then echo "Delete route to tcp via 10.240.23.245@tcp1"; /usr/sbin/lnetctl route del --net tcp --gateway 10.240.23.245@tcp1; e LNetError: 1348624:0:(peer.c:2227:lnet_destroy_peer_ni_locked()) ASSERTION( list_empty(&lpni->lpni_peer_nis) ) failed: LNetError: 1348624:0:(peer.c:2227:lnet_destroy_peer_ni_locked()) LBUG Pid: 1348624, comm: socknal_sd00_01 4.18.0-553.5.1.el8_lustre.x86_64 #1 SMP Fri Jun 28 18:44:24 UTC 2024 Call Trace TBD: [<0>] libcfs_call_trace+0x6f/0xa0 [libcfs] [<0>] lbug_with_loc+0x3f/0x70 [libcfs] [<0>] lnet_destroy_peer_ni_locked+0x446/0x4e0 [lnet] [<0>] lnet_handle_find_routed_path+0x86c/0xee0 [lnet] [<0>] lnet_select_pathway+0xb95/0x16c0 [lnet] [<0>] lnet_send+0x6d/0x1e0 [lnet] [<0>] lnet_parse_local+0x3ef/0xde0 [lnet] [<0>] lnet_parse+0xd78/0x1480 [lnet] [<0>] ksocknal_process_receive+0x4dc/0xdb0 [ksocklnd] [<0>] ksocknal_scheduler+0x188/0x17c0 [ksocklnd] [<0>] kthread+0x134/0x150 [<0>] ret_from_fork+0x35/0x40 Kernel panic - not syncing: LBUG CPU: 1 PID: 1348624 Comm: socknal_sd00_01 Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.5.1.el8_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: dump_stack+0x41/0x60 panic+0xe7/0x2ac ? ret_from_fork+0x35/0x40 lbug_with_loc.cold.8+0x18/0x18 [libcfs] lnet_destroy_peer_ni_locked+0x446/0x4e0 [lnet] lnet_handle_find_routed_path+0x86c/0xee0 [lnet] lnet_select_pathway+0xb95/0x16c0 [lnet] ? lnet_try_match_md+0x337/0x630 [lnet] lnet_send+0x6d/0x1e0 [lnet] lnet_parse_local+0x3ef/0xde0 [lnet] lnet_parse+0xd78/0x1480 [lnet] ksocknal_process_receive+0x4dc/0xdb0 [ksocklnd] ksocknal_scheduler+0x188/0x17c0 [ksocklnd] ? finish_wait+0x80/0x80 ? ksocknal_recv+0x2a0/0x2a0 [ksocklnd] kthread+0x134/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x35/0x40
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity-lnet test_226 - onyx-45vm6 crashed during sanity-lnet test_226
Attachments
Issue Links
- is related to
-
LU-17062 Prevent use after free following *_decref_locked() usage
-
- Resolved
-
-
LU-17440 after move from 2.14 to 2.15: LNetError: 31941:0:(peer.c:2194:lnet_destroy_peer_ni_locked()) ASSERTION( list_empty(&lpni->lpni_peer_nis) )
-
- Resolved
-
- mentioned in
-
Page No Confluence page found with the given URL.
-
Page No Confluence page found with the given URL.
-
Page No Confluence page found with the given URL.
-
Page Loading...
ssmirnov, I ask because the timelines for those two changes (patch landing on b2_15 and patch landing on master) are independent. Also, the version check patch on master would also skip testing with other versions older than 2.15.6 that would not have the fix patch in any case.
Note that there should similarly be a patch on b2_15 that is skipping this test when run with any version older than 2.15.5.1 or so, since it will not have the fix patch either.