Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Lustre 2.12.3
-
None
-
CentOS 7.6
-
2
-
9223372036854775807
Description
Using 2.12.3 servers and clients, we hit this bug once on an OSS, once on a client:
[197803.220678] LNetError: 34981:0:(lib-move.c:2729:lnet_detach_rsp_tracker()) ASSERTION( rspt->rspt_cpt == cpt ) failed: [197803.220682] LNetError: 34981:0:(lib-move.c:2729:lnet_detach_rsp_tracker()) LBUG [197803.220684] Pid: 34981, comm: kiblnd_sd_01_00 3.10.0-957.27.2.el7.x86_64 #1 SMP Mon Jul 29 17:46:05 UTC 2019 [197803.220684] Call Trace: [197803.220706] [<ffffffffc0e777cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [197803.220712] [<ffffffffc0e7787c>] lbug_with_loc+0x4c/0xa0 [libcfs] [197803.220730] [<ffffffffc0f1849b>] lnet_detach_rsp_tracker+0x5b/0x60 [lnet] [197803.220739] [<ffffffffc0f08d3a>] lnet_finalize+0x72a/0x9a0 [lnet] [197803.220748] [<ffffffffc0f12a51>] lnet_post_send_locked+0x751/0x9c0 [lnet] [197803.220757] [<ffffffffc0f149a8>] lnet_return_tx_credits_locked+0x2a8/0x490 [lnet] [197803.220765] [<ffffffffc0f075ec>] lnet_msg_decommit+0xec/0x700 [lnet] [197803.220772] [<ffffffffc0f089b7>] lnet_finalize+0x3a7/0x9a0 [lnet] [197803.220783] [<ffffffffc14c861d>] kiblnd_tx_done+0x10d/0x3e0 [ko2iblnd] [197803.220789] [<ffffffffc14d3b0d>] kiblnd_scheduler+0x89d/0x1180 [ko2iblnd] [197803.220793] [<ffffffff8d4c2e81>] kthread+0xd1/0xe0 [197803.220797] [<ffffffff8db76c37>] ret_from_fork_nospec_end+0x0/0x39 [197803.220822] [<ffffffffffffffff>] 0xffffffffffffffff [197803.220823] Kernel panic - not syncing: LBUG [197803.220826] CPU: 37 PID: 34981 Comm: kiblnd_sd_01_00 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7.x86_64 #1 [197803.220827] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 2.9.1 12/04/2018 [197803.220828] Call Trace: [197803.220833] [<ffffffff8db64147>] dump_stack+0x19/0x1b [197803.220835] [<ffffffff8db5d850>] panic+0xe8/0x21f [197803.220842] [<ffffffffc0e778cb>] lbug_with_loc+0x9b/0xa0 [libcfs] [197803.220851] [<ffffffffc0f1849b>] lnet_detach_rsp_tracker+0x5b/0x60 [lnet] [197803.220860] [<ffffffffc0f08d3a>] lnet_finalize+0x72a/0x9a0 [lnet] [197803.220863] [<ffffffff8d502372>] ? ktime_get_ts64+0x52/0xf0 [197803.220872] [<ffffffffc0f12a51>] lnet_post_send_locked+0x751/0x9c0 [lnet] [197803.220880] [<ffffffffc0f149a8>] lnet_return_tx_credits_locked+0x2a8/0x490 [lnet] [197803.220888] [<ffffffffc0f075ec>] lnet_msg_decommit+0xec/0x700 [lnet] [197803.220896] [<ffffffffc0f089b7>] lnet_finalize+0x3a7/0x9a0 [lnet] [197803.220901] [<ffffffffc14c861d>] kiblnd_tx_done+0x10d/0x3e0 [ko2iblnd] [197803.220906] [<ffffffffc14d3b0d>] kiblnd_scheduler+0x89d/0x1180 [ko2iblnd] [197803.220908] [<ffffffff8d4e220e>] ? dequeue_task_fair+0x41e/0x660 [197803.220911] [<ffffffff8d42a59e>] ? __switch_to+0xce/0x580 [197803.220913] [<ffffffff8d4d7c40>] ? wake_up_state+0x20/0x20 [197803.220919] [<ffffffffc14d3270>] ? kiblnd_cq_event+0x90/0x90 [ko2iblnd] [197803.220920] [<ffffffff8d4c2e81>] kthread+0xd1/0xe0 [197803.220923] [<ffffffff8d4c2db0>] ? insert_kthread_work+0x40/0x40 [197803.220925] [<ffffffff8db76c37>] ret_from_fork_nospec_begin+0x21/0x21 [197803.220927] [<ffffffff8d4c2db0>] ? insert_kthread_work+0x40/0x40
We're using Mellanox OFED 4.7 on servers / routers / this client (not all clients have been upgraded yet).
We can provide crash dumps upon request. This seems to be a new problem either with 2.12.3 vs 2.12.0 or with Mellanox OFED 4.7 vs 4.5.
Thanks!
Stephane
Peter, that sounds good to me. We haven't hit this issue since we added the two patches above. Thanks!