Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Lustre 2.13.0, Lustre 2.12.3
-
None
-
3
-
9223372036854775807
Description
It's possible for a message with
msg_rx_committed
to reach the resend block in lnet_health_check() and trip this assert. The assert should be changed to an if-statement and we should simply return -1 to finalize the message.
[846578.191198] LustreError: 65272:0:(brw_test.c:389:brw_server_rpc_done()) Skipped 12 previous similar messages [846578.191789] LNetError: 44030:0:(lib-msg.c:735:lnet_health_check()) ASSERTION( msg->msg_tx_committed ) failed: [846578.191793] LNetError: 44030:0:(lib-msg.c:735:lnet_health_check()) LBUG [846578.191795] Pid: 44030, comm: kiblnd_sd_01_00 3.10.0-693.21.1.x3.2.152.x86_64 #1 SMP Mon Feb 25 06:44:43 PST 2019 [846578.191795] Call Trace: [846578.191824] [<ffffffff8103a212>] save_stack_trace_tsk+0x22/0x40 [846578.191856] [<ffffffffc0a3f7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [846578.191868] [<ffffffffc0a3f87c>] lbug_with_loc+0x4c/0xa0 [libcfs] [846578.191915] [<ffffffffc0ad2c7e>] lnet_health_check+0x9ae/0x9e0 [lnet] [846578.191930] [<ffffffffc0ad2dc5>] lnet_finalize+0x115/0x9c0 [lnet] [846578.191949] [<ffffffffc0b8278d>] kiblnd_tx_done+0x10d/0x3e0 [ko2iblnd] [846578.191958] [<ffffffffc0b8dc5d>] kiblnd_scheduler+0x89d/0x1180 [ko2iblnd] [846578.191965] [<ffffffff810b4031>] kthread+0xd1/0xe0 [846578.191972] [<ffffffff816c455d>] ret_from_fork+0x5d/0xb0 [846578.192031] [<ffffffffffffffff>] 0xffffffffffffffff [846578.192032] Kernel panic - not syncing: LBUG [846578.192037] CPU: 13 PID: 44030 Comm: kiblnd_sd_01_00 Tainted: P OE ------------ 3.10.0-693.21.1.x3.2.152.x86_64 #1 [846578.192038] Hardware name: Seagate SATI-TL/Type2 - Board Product Sati2, BIOS SATI-TL.v0046.0002 01/13/2015 [846578.192040] Call Trace: [846578.192053] [<ffffffff816b17c8>] dump_stack+0x19/0x1b [846578.192057] [<ffffffff816ab634>] panic+0xe8/0x21f [846578.192072] [<ffffffffc0a3f8cb>] lbug_with_loc+0x9b/0xa0 [libcfs] [846578.192087] [<ffffffffc0ad2c7e>] lnet_health_check+0x9ae/0x9e0 [lnet] [846578.192094] [<ffffffff810eced2>] ? ktime_get_ts64+0x52/0xf0 [846578.192110] [<ffffffffc0ad2dc5>] lnet_finalize+0x115/0x9c0 [lnet] [846578.192119] [<ffffffffc0b78b52>] ? kiblnd_pool_free_node+0x82/0x170 [ko2iblnd] [846578.192126] [<ffffffffc0b8278d>] kiblnd_tx_done+0x10d/0x3e0 [ko2iblnd] [846578.192135] [<ffffffffc0b8dc5d>] kiblnd_scheduler+0x89d/0x1180 [ko2iblnd] [846578.192142] [<ffffffff810cb0c5>] ? sched_clock_cpu+0x85/0xc0 [846578.192146] [<ffffffff8102954d>] ? __switch_to+0xcd/0x500 [846578.192149] [<ffffffff810c7c80>] ? wake_up_state+0x20/0x20 [846578.192156] [<ffffffffc0b8d3c0>] ? kiblnd_cq_event+0x90/0x90 [ko2iblnd] [846578.192159] [<ffffffff810b4031>] kthread+0xd1/0xe0 [846578.192161] [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40 [846578.192164] [<ffffffff816c455d>] ret_from_fork+0x5d/0xb0 [846578.192167] [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40