Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Lustre 2.12.3
-
None
-
CentOS 7.6
-
2
-
9223372036854775807
Description
Using 2.12.3 servers and clients, we hit this bug once on an OSS, once on a client:
[197803.220678] LNetError: 34981:0:(lib-move.c:2729:lnet_detach_rsp_tracker()) ASSERTION( rspt->rspt_cpt == cpt ) failed: [197803.220682] LNetError: 34981:0:(lib-move.c:2729:lnet_detach_rsp_tracker()) LBUG [197803.220684] Pid: 34981, comm: kiblnd_sd_01_00 3.10.0-957.27.2.el7.x86_64 #1 SMP Mon Jul 29 17:46:05 UTC 2019 [197803.220684] Call Trace: [197803.220706] [<ffffffffc0e777cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [197803.220712] [<ffffffffc0e7787c>] lbug_with_loc+0x4c/0xa0 [libcfs] [197803.220730] [<ffffffffc0f1849b>] lnet_detach_rsp_tracker+0x5b/0x60 [lnet] [197803.220739] [<ffffffffc0f08d3a>] lnet_finalize+0x72a/0x9a0 [lnet] [197803.220748] [<ffffffffc0f12a51>] lnet_post_send_locked+0x751/0x9c0 [lnet] [197803.220757] [<ffffffffc0f149a8>] lnet_return_tx_credits_locked+0x2a8/0x490 [lnet] [197803.220765] [<ffffffffc0f075ec>] lnet_msg_decommit+0xec/0x700 [lnet] [197803.220772] [<ffffffffc0f089b7>] lnet_finalize+0x3a7/0x9a0 [lnet] [197803.220783] [<ffffffffc14c861d>] kiblnd_tx_done+0x10d/0x3e0 [ko2iblnd] [197803.220789] [<ffffffffc14d3b0d>] kiblnd_scheduler+0x89d/0x1180 [ko2iblnd] [197803.220793] [<ffffffff8d4c2e81>] kthread+0xd1/0xe0 [197803.220797] [<ffffffff8db76c37>] ret_from_fork_nospec_end+0x0/0x39 [197803.220822] [<ffffffffffffffff>] 0xffffffffffffffff [197803.220823] Kernel panic - not syncing: LBUG [197803.220826] CPU: 37 PID: 34981 Comm: kiblnd_sd_01_00 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7.x86_64 #1 [197803.220827] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 2.9.1 12/04/2018 [197803.220828] Call Trace: [197803.220833] [<ffffffff8db64147>] dump_stack+0x19/0x1b [197803.220835] [<ffffffff8db5d850>] panic+0xe8/0x21f [197803.220842] [<ffffffffc0e778cb>] lbug_with_loc+0x9b/0xa0 [libcfs] [197803.220851] [<ffffffffc0f1849b>] lnet_detach_rsp_tracker+0x5b/0x60 [lnet] [197803.220860] [<ffffffffc0f08d3a>] lnet_finalize+0x72a/0x9a0 [lnet] [197803.220863] [<ffffffff8d502372>] ? ktime_get_ts64+0x52/0xf0 [197803.220872] [<ffffffffc0f12a51>] lnet_post_send_locked+0x751/0x9c0 [lnet] [197803.220880] [<ffffffffc0f149a8>] lnet_return_tx_credits_locked+0x2a8/0x490 [lnet] [197803.220888] [<ffffffffc0f075ec>] lnet_msg_decommit+0xec/0x700 [lnet] [197803.220896] [<ffffffffc0f089b7>] lnet_finalize+0x3a7/0x9a0 [lnet] [197803.220901] [<ffffffffc14c861d>] kiblnd_tx_done+0x10d/0x3e0 [ko2iblnd] [197803.220906] [<ffffffffc14d3b0d>] kiblnd_scheduler+0x89d/0x1180 [ko2iblnd] [197803.220908] [<ffffffff8d4e220e>] ? dequeue_task_fair+0x41e/0x660 [197803.220911] [<ffffffff8d42a59e>] ? __switch_to+0xce/0x580 [197803.220913] [<ffffffff8d4d7c40>] ? wake_up_state+0x20/0x20 [197803.220919] [<ffffffffc14d3270>] ? kiblnd_cq_event+0x90/0x90 [ko2iblnd] [197803.220920] [<ffffffff8d4c2e81>] kthread+0xd1/0xe0 [197803.220923] [<ffffffff8d4c2db0>] ? insert_kthread_work+0x40/0x40 [197803.220925] [<ffffffff8db76c37>] ret_from_fork_nospec_begin+0x21/0x21 [197803.220927] [<ffffffff8d4c2db0>] ? insert_kthread_work+0x40/0x40
We're using Mellanox OFED 4.7 on servers / routers / this client (not all clients have been upgraded yet).
We can provide crash dumps upon request. This seems to be a new problem either with 2.12.3 vs 2.12.0 or with Mellanox OFED 4.7 vs 4.5.
Thanks!
Stephane