Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
None
-
3
-
9223372036854775807
Description
LNet teardown could retry cleanup before asserting.
We see this assert show up in sanity-lnet/220
Excerpted from https://testing.whamcloud.com/test_logs/087d6d3d-deca-4831-9337-30fae7338f25/show_text
[17841.535068] Lustre: DEBUG MARKER: == sanity-lnet test 220: Add routes w/default options - check aliveness ========================================================== 23:19:06 (1706570346) [17841.835785] Lustre: DEBUG MARKER: /usr/sbin/lustre_rmmod [17842.279424] Key type lgssc unregistered [17842.319629] LNetError: 6049:0:(lib-md.c:281:lnet_assert_handler_unused()) ASSERTION( md->md_handler != handler ) failed: [17842.320935] LNetError: 6049:0:(lib-md.c:281:lnet_assert_handler_unused()) LBUG [17842.321757] Pid: 6049, comm: lnet_discovery 5.14.0-284.30.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Aug 25 09:13:12 EDT 2023 [17842.322978] Call Trace TBD: [17842.323365] Kernel panic - not syncing: LBUG [17842.323894] CPU: 0 PID: 6049 Comm: lnet_discovery Kdump: loaded Tainted: G OE -------- --- 5.14.0-284.30.1.el9_2.x86_64 #1 [17842.325176] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [17842.325838] Call Trace: [17842.326178] <TASK> [17842.326492] dump_stack_lvl+0x34/0x48 [17842.326997] panic+0xf4/0x2c6 [17842.327399] ? lnet_discovery_event_reply+0xbc0/0xbc0 [lnet] [17842.328223] lbug_with_loc.cold+0x18/0x18 [libcfs] [17842.328869] lnet_assert_handler_unused+0x9c/0xd0 [lnet] [17842.329506] lnet_peer_discovery+0x997/0xaf0 [lnet] [17842.330111] ? cpuacct_percpu_seq_show+0x10/0x10 [17842.330680] ? lnet_peer_data_present+0x580/0x580 [lnet] [17842.331323] kthread+0xd9/0x100 [17842.331734] ? kthread_complete_and_exit+0x20/0x20 [17842.332298] ret_from_fork+0x22/0x30 [17842.332769] </TASK>
We could attempt to retry the clean pass a couple of times before finally asserting.