Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.9.0
-
lola
build: https://build.hpdd.intel.com/job/lustre-master/3431/ tag 2.8.57 for el6.7
-
3
-
9223372036854775807
Description
Error happened during soak testing of build '20160902' (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160902)
Configuration reads as:
4 MDS with 1 MDT / MDS, formatted with ldiskfs and configured pairwise in active-active HA configuration
6 OSS with 4 OST / OSS formatted with ldiskfs and configured pairwise in active-active HA configuration
DNE is enabled
Sequence of events
- 2016-09-06 02:51:28,201:fsmgmt.fsmgmt:INFO triggering fault mds_failover ( lola-8 (mdt-0) --> lola-9)
- 2016-09-06 03:41:33,479:fsmgmt.fsmgmt:INFO mds_failover just completed (mdt-0 failed back to lola-8)
- 2016-09-06 03:41:17 MDS lola-11 crashed with error message:
<0>LustreError: 6666:0:(lu_object.h:716:lu_object_get()) ASSERTION( atomic_read(&o->lo_header->loh_ref) > 0 ) failed: <0>LustreError: 6666:0:(lu_object.h:716:lu_object_get()) LBUG <4>Pid: 6666, comm: mdt03_002 <4> <4>Call Trace: <4> [<ffffffffa07f0875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4> [<ffffffffa07f0e77>] lbug_with_loc+0x47/0xb0 [libcfs] <4> [<ffffffffa1203071>] mdt_remote_object_lock+0x491/0x4a0 [mdt] <4> [<ffffffffa12298a0>] mdt_reint_open+0x2b90/0x3180 [mdt] <4> [<ffffffffa1211ead>] mdt_reint_rec+0x5d/0x200 [mdt] <4> [<ffffffffa11fd5db>] mdt_reint_internal+0x62b/0xa50 [mdt] <4> [<ffffffffa11fdbf6>] mdt_intent_reint+0x1f6/0x440 [mdt] <4> [<ffffffffa11fb8be>] mdt_intent_policy+0x4be/0xc70 [mdt] <4> [<ffffffffa0ab77c7>] ldlm_lock_enqueue+0x127/0x990 [ptlrpc] <4> [<ffffffffa0ae2c27>] ldlm_handle_enqueue0+0x807/0x14d0 [ptlrpc] <4> [<ffffffffa0b68b21>] tgt_enqueue+0x61/0x230 [ptlrpc] <4> [<ffffffffa0b69ccc>] tgt_request_handle+0x8ec/0x1440 [ptlrpc] <4> [<ffffffffa0b16501>] ptlrpc_main+0xd31/0x1800 [ptlrpc] <4> [<ffffffffa0b157d0>] ? ptlrpc_main+0x0/0x1800 [ptlrpc] <4> [<ffffffff810a138e>] kthread+0x9e/0xc0 <4> [<ffffffff8100c28a>] child_rip+0xa/0x20 <4> [<ffffffff810a12f0>] ? kthread+0x0/0xc0 <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20 <4> <0>Kernel panic - not syncing: LBUG <4>Pid: 6666, comm: mdt03_002 Tainted: P -- ------------ 2.6.32-573.26.1.el6_lustre.x86_64 #1 <4>Call Trace: <4> [<ffffffff81539407>] ? panic+0xa7/0x16f <4> [<ffffffffa07f0ecb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] <4> [<ffffffffa1203071>] ? mdt_remote_object_lock+0x491/0x4a0 [mdt] <4> [<ffffffffa12298a0>] ? mdt_reint_open+0x2b90/0x3180 [mdt] <4> [<ffffffffa1211ead>] ? mdt_reint_rec+0x5d/0x200 [mdt] <4> [<ffffffffa11fd5db>] ? mdt_reint_internal+0x62b/0xa50 [mdt] <4> [<ffffffffa11fdbf6>] ? mdt_intent_reint+0x1f6/0x440 [mdt] <4> [<ffffffffa11fb8be>] ? mdt_intent_policy+0x4be/0xc70 [mdt] <4> [<ffffffffa0ab77c7>] ? ldlm_lock_enqueue+0x127/0x990 [ptlrpc] <4> [<ffffffffa0ae2c27>] ? ldlm_handle_enqueue0+0x807/0x14d0 [ptlrpc] <4> [<ffffffffa0b68b21>] ? tgt_enqueue+0x61/0x230 [ptlrpc] <4> [<ffffffffa0b69ccc>] ? tgt_request_handle+0x8ec/0x1440 [ptlrpc] <4> [<ffffffffa0b16501>] ? ptlrpc_main+0xd31/0x1800 [ptlrpc] <4> [<ffffffffa0b157d0>] ? ptlrpc_main+0x0/0x1800 [ptlrpc] <4> [<ffffffff810a138e>] ? kthread+0x9e/0xc0 <4> [<ffffffff8100c28a>] ? child_rip+0xa/0x20 <4> [<ffffffff810a12f0>] ? kthread+0x0/0xc0 <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20
Attached files:
console, message logs of all MDS nodes; vmcore-dmesg.txt of lola-11.
crash dump is available.