Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.4.3
-
Lustre-2.4.3
-
3
-
14153
Description
MDS failovered and once MDS's recovery finished, many OSS crahsed due to following ASSERTION.
2014-05-30 17:39:07 Lustre: Skipped 3 previous similar messages 2014-05-30 17:39:07 LustreError: 18967:0:(ldlm_lib.c:1851:target_next_replay_req()) ASSERTION( atomic_read(&obd->obd_req_replay_clients) == 0 ) failed: 2014-05-30 17:39:07 LustreError: 18967:0:(ldlm_lib.c:1851:target_next_replay_req()) LBUG 2014-05-30 17:39:07 Pid: 18967, comm: tgt_recov 2014-05-30 17:39:07 2014-05-30 17:39:07 Call Trace: 2014-05-30 17:39:07 [<ffffffffa0353895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 2014-05-30 17:39:07 [<ffffffffa0353e97>] lbug_with_loc+0x47/0xb0 [libcfs] 2014-05-30 17:39:07 [<ffffffffa066f48c>] target_recovery_thread+0x14ac/0x1970 [ptlrpc] 2014-05-30 17:39:07 [<ffffffffa066dfe0>] ? target_recovery_thread+0x0/0x1970 [ptlrpc] 2014-05-30 17:39:07 [<ffffffff8100c0ca>] child_rip+0xa/0x20 2014-05-30 17:39:07 [<ffffffffa066dfe0>] ? target_recovery_thread+0x0/0x1970 [ptlrpc] 2014-05-30 17:39:07 [<ffffffffa066dfe0>] ? target_recovery_thread+0x0/0x1970 [ptlrpc] 2014-05-30 17:39:07 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 2014-05-30 17:39:07 2014-05-30 17:39:07 Kernel panic - not syncing: LBUG 2014-05-30 17:39:07 Pid: 18967, comm: tgt_recov Not tainted 2.6.32-358.18.1.el6_lustre.x86_64 #1 2014-05-30 17:39:07 Call Trace: 2014-05-30 17:39:07 [<ffffffff8150de58>] ? panic+0xa7/0x16f 2014-05-30 17:39:07 [<ffffffffa0353eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] 2014-05-30 17:39:07 [<ffffffffa066f48c>] ? target_recovery_thread+0x14ac/0x1970 [ptlrpc] 2014-05-30 17:39:07 [<ffffffffa066dfe0>] ? target_recovery_thread+0x0/0x1970 [ptlrpc] 2014-05-30 17:39:07 [<ffffffff8100c0ca>] ? child_rip+0xa/0x20 2014-05-30 17:39:07 [<ffffffffa066dfe0>] ? target_recovery_thread+0x0/0x1970 [ptlrpc] 2014-05-30 17:39:07 [<ffffffffa066dfe0>] ? target_recovery_thread+0x0/0x1970 [ptlrpc] 2014-05-30 17:39:07 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
LU-1522 and LU-2397 reported similar problem, but these patches have been merged in b2_4, already.