Details
-
Bug
-
Resolution: Cannot Reproduce
-
Critical
-
None
-
Lustre 2.1.2
-
3
-
4365
Description
Production MDS got stuck in a crash/reboot loop. It hit the summary assertion this morning then again on each reboot during recovery. Finally we aborted recovery and the MDS stabilized. We have several crash dumps.
Backtrace below and console log attached.
LustreError: 3411:0:(mdt_handler.c:2511:mdt_req_handle()) ASSERTION(h->mh_act != NULL) failed LustreError: 3411:0:(mdt_handler.c:2511:mdt_req_handle()) LBUG PID: 7364 TASK: ffff88079b637540 CPU: 14 COMMAND: "mdt_221" #0 [ffff88077e3abb98] machine_kexec at ffffffff8103216b #1 [ffff88077e3abbf8] crash_kexec at ffffffff810b8d12 #2 [ffff88077e3abcc8] panic at ffffffff814ee999 #3 [ffff88077e3abd48] lbug_with_loc at ffffffffa0456e1b [libcfs] #4 [ffff88077e3abd68] libcfs_assertion_failed at ffffffffa046042d [libcfs] #5 [ffff88077e3abd88] mdt_handle_common at ffffffffa0c162d9 [mdt] #6 [ffff88077e3abdd8] mdt_regular_handle at ffffffffa0c163f5 [mdt] #7 [ffff88077e3abde8] ptlrpc_main at ffffffffa0717d64 [ptlrpc] #8 [ffff88077e3abf48] kernel_thread at ffffffff8100c14a
LLNL-bugzilla-ID: 1836