Details
-
Bug
-
Resolution: Duplicate
-
Blocker
-
None
-
lola
build: 2.7.63-4-gf84e06e, a7eface85ea2d2aa6198681264b082a0244855d4 + patches
-
3
-
9223372036854775807
Description
The error occurred during soak testing of master branch build '20151122' (see https://wiki.hpdd.intel.com/pages/viewpage.action?title=Soak+Testing+on+Lola&spaceKey=Releases#SoakTestingonLola-20151122). DNE is enabled. MDSes are configured in active-active failover configuration.
Sequence of events:
- 2015-11-26 10:32 Failover resources (mdt-0,1) lola-8 --> lola-9 started
- 2015-11-26 11:40 Failback resources (mdt-0.1) lola-9 --> lola-8 completed successful
- 2015-11-26 11:44 LBUG on lola-9. See the following message.
Nov 26 11:44:53 lola-9 kernel: LustreError: 7588:0:(layout.c:1989:__req_capsule_get()) ASSERTION( fmt != ((void *)(long)0x5a5a5a5a5a5a5a5a) ) failed: Nov 26 11:44:53 lola-9 kernel: LustreError: 7588:0:(layout.c:1989:__req_capsule_get()) LBUG Nov 26 11:44:53 lola-9 kernel: Pid: 7588, comm: mdt02_000 Nov 26 11:44:53 lola-9 kernel: Nov 26 11:44:53 lola-9 kernel: Call Trace: Nov 26 11:44:53 lola-9 kernel: [<ffffffffa07c1875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] Nov 26 11:44:53 lola-9 kernel: [<ffffffffa07c1e77>] lbug_with_loc+0x47/0xb0 [libcfs] Nov 26 11:44:53 lola-9 kernel: [<ffffffffa0b2eed7>] __req_capsule_get+0x617/0x6e0 [ptlrpc] Nov 26 11:44:53 lola-9 kernel: [<ffffffffa08bb595>] ? class_handle2object+0x95/0x190 [obdclass] Nov 26 11:44:53 lola-9 kernel: [<ffffffffa0b2f0a8>] req_capsule_server_get+0x18/0x20 [ptlrpc] Nov 26 11:44:53 lola-9 kernel: [<ffffffffa0ad2f52>] ldlm_cli_enqueue_fini+0x1d2/0xe30 [ptlrpc] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa0af55b4>] ? ptlrpc_set_destroy+0x414/0x570 [ptlrpc] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa0ad3f71>] ldlm_cli_enqueue+0x3c1/0x870 [ptlrpc] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa0ad9010>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa122c2f0>] ? mdt_remote_blocking_ast+0x0/0x210 [mdt] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa14105c5>] osp_md_object_lock+0x185/0x240 [osp] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa131d557>] lod_object_lock+0x147/0x860 [lod] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa08dfa0f>] ? lu_object_find_try+0x9f/0x260 [obdclass] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa139f92b>] mdd_object_lock+0x3b/0xd0 [mdd] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa1239b2a>] mdt_remote_object_lock+0x14a/0x310 [mdt] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa0b05925>] ? lustre_msg_buf+0x55/0x60 [ptlrpc] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa0b2ea22>] ? __req_capsule_get+0x162/0x6e0 [ptlrpc] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa1239e19>] mdt_object_lock_internal+0x129/0x2d0 [mdt] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa123a081>] mdt_object_lock+0x11/0x20 [mdt] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa124bb2a>] mdt_reint_create+0x6fa/0xcc0 [mdt] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa08fb870>] ? lu_ucred+0x20/0x30 [obdclass] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa122b675>] ? mdt_ucred+0x15/0x20 [mdt] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa12448dc>] ? mdt_root_squash+0x2c/0x3f0 [mdt] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa0b2ea22>] ? __req_capsule_get+0x162/0x6e0 [ptlrpc] Nov 26 11:44:54 lola-9 kernel: [<ffffffff81294a3a>] ? strlcpy+0x4a/0x60 Nov 26 11:44:54 lola-9 kernel: [<ffffffffa1248a1d>] mdt_reint_rec+0x5d/0x200 [mdt] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa123477b>] mdt_reint_internal+0x62b/0xb80 [mdt] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa123516b>] mdt_reint+0x6b/0x120 [mdt] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa0b70e1c>] tgt_request_handle+0x8bc/0x12e0 [ptlrpc] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa0b18711>] ptlrpc_main+0xe41/0x1910 [ptlrpc] Nov 26 11:44:54 lola-9 kernel: [<ffffffff8152a39e>] ? thread_return+0x4e/0x7d0 Nov 26 11:44:54 lola-9 kernel: [<ffffffffa0b178d0>] ? ptlrpc_main+0x0/0x1910 [ptlrpc] Nov 26 11:44:54 lola-9 kernel: [<ffffffff8109e78e>] kthread+0x9e/0xc0 Nov 26 11:44:54 lola-9 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20 Nov 26 11:44:54 lola-9 kernel: [<ffffffff8109e6f0>] ? kthread+0x0/0xc0 Nov 26 11:44:54 lola-9 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20 Nov 26 11:44:54 lola-9 kernel: Nov 26 11:44:54 lola-9 kernel: LustreError: dumping log to /tmp/lustre-log.1448567094.7588
Attached messages, console log files of MDS (lola-9) and debug log file mentioned in LBUG error message. Also extracted Lustre messages on client nodes and attached them to the ticket. No errors occured on OSS nodes.