Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.15.3
-
None
-
CentOS 7.9 3.10.0-1160.90.1.el7_lustre.pl1.x86_64
-
3
-
9223372036854775807
Description
With Lustre 2.15.3 on clients (Sherlock) and servers (Fir), we hit this assertion on a Lustre OSS (fir-io2-s1) while we were redeploying LNet routers from 2.15.2 to 2.15.3. We did it slowly but still, I believe by redeploying the routers we put some stress on LNet and triggered this particular condition. I do have a vmcore available upon request.
[1927361.489487] LNetError: 26687:0:(o2iblnd.c:992:kiblnd_destroy_conn()) ASSERTION( conn->ibc_nsends_posted == 0 ) failed: [1927361.500446] LNetError: 26687:0:(o2iblnd.c:992:kiblnd_destroy_conn()) LBUG [1927361.507427] Pid: 26687, comm: kiblnd_connd 3.10.0-1160.90.1.el7_lustre.pl1.x86_64 #1 SMP Tue Jun 20 15:47:49 PDT 2023 [1927361.518344] Call Trace: [1927361.520991] [<0>] libcfs_call_trace+0x90/0xf0 [libcfs] [1927361.526319] [<0>] lbug_with_loc+0x4c/0xa0 [libcfs] [1927361.531308] [<0>] kiblnd_destroy_conn+0x476/0x650 [ko2iblnd] [1927361.537253] [<0>] kiblnd_connd+0xfa/0xcb0 [ko2iblnd] [1927361.542393] [<0>] kthread+0xd1/0xe0 [1927361.546067] [<0>] ret_from_fork_nospec_begin+0x7/0x21 [1927361.551328] [<0>] 0xfffffffffffffffe [1927361.555087] Kernel panic - not syncing: LBUG [1927361.559528] CPU: 59 PID: 26687 Comm: kiblnd_connd Kdump: loaded Tainted: G OE ------------ 3.10.0-1160.90.1.el7_lustre.pl1.x86_64 #1 [1927361.572724] Hardware name: Dell Inc. PowerEdge R6525/0N7YGH, BIOS 2.11.3 02/24/2023 [1927361.580548] Call Trace: [1927361.583167] [<ffffffff985b1bec>] dump_stack+0x19/0x1f [1927361.588478] [<ffffffff985ab708>] panic+0xe8/0x21f [1927361.593445] [<ffffffffc092f5eb>] lbug_with_loc+0x9b/0xa0 [libcfs] [1927361.599796] [<ffffffffc0a0ca36>] kiblnd_destroy_conn+0x476/0x650 [ko2iblnd] [1927361.607016] [<ffffffffc0a1e72a>] kiblnd_connd+0xfa/0xcb0 [ko2iblnd] [1927361.613541] [<ffffffff97ecc790>] ? wake_up_atomic_t+0x40/0x40 [1927361.619546] [<ffffffffc0a1e630>] ? kiblnd_cm_callback+0x2140/0x2140 [ko2iblnd] [1927361.627023] [<ffffffff97ecb621>] kthread+0xd1/0xe0 [1927361.632074] [<ffffffff97ecb550>] ? insert_kthread_work+0x40/0x40 [1927361.638774] [<ffffffff985c51dd>] ret_from_fork_nospec_begin+0x7/0x21 [1927361.645387] [<ffffffff97ecb550>] ? insert_kthread_work+0x40/0x40
Attaching vmcore-dmesg.txt as fir-io2-s1-20230724-vmcore-dmesg.txt