Details
-
Bug
-
Resolution: Duplicate
-
Blocker
-
None
-
lola
build: 2.7.63-4-gf84e06e, a7eface85ea2d2aa6198681264b082a0244855d4 + patches
-
3
-
9223372036854775807
Description
The error occurred during soak testing of master branch build '20151122' (see https://wiki.hpdd.intel.com/pages/viewpage.action?title=Soak+Testing+on+Lola&spaceKey=Releases#SoakTestingonLola-20151122). DNE is enabled. MDSes are configured in active-active failover configuration.
Sequence of events:
- 2015-11-26 10:32 Failover resources (mdt-0,1) lola-8 --> lola-9 started
- 2015-11-26 11:40 Failback resources (mdt-0.1) lola-9 --> lola-8 completed successful
- 2015-11-26 11:44 LBUG on lola-9. See the following message.
Nov 26 11:44:53 lola-9 kernel: LustreError: 7588:0:(layout.c:1989:__req_capsule_get()) ASSERTION( fmt != ((void *)(long)0x5a5a5a5a5a5a5a5a) ) failed: Nov 26 11:44:53 lola-9 kernel: LustreError: 7588:0:(layout.c:1989:__req_capsule_get()) LBUG Nov 26 11:44:53 lola-9 kernel: Pid: 7588, comm: mdt02_000 Nov 26 11:44:53 lola-9 kernel: Nov 26 11:44:53 lola-9 kernel: Call Trace: Nov 26 11:44:53 lola-9 kernel: [<ffffffffa07c1875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] Nov 26 11:44:53 lola-9 kernel: [<ffffffffa07c1e77>] lbug_with_loc+0x47/0xb0 [libcfs] Nov 26 11:44:53 lola-9 kernel: [<ffffffffa0b2eed7>] __req_capsule_get+0x617/0x6e0 [ptlrpc] Nov 26 11:44:53 lola-9 kernel: [<ffffffffa08bb595>] ? class_handle2object+0x95/0x190 [obdclass] Nov 26 11:44:53 lola-9 kernel: [<ffffffffa0b2f0a8>] req_capsule_server_get+0x18/0x20 [ptlrpc] Nov 26 11:44:53 lola-9 kernel: [<ffffffffa0ad2f52>] ldlm_cli_enqueue_fini+0x1d2/0xe30 [ptlrpc] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa0af55b4>] ? ptlrpc_set_destroy+0x414/0x570 [ptlrpc] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa0ad3f71>] ldlm_cli_enqueue+0x3c1/0x870 [ptlrpc] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa0ad9010>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa122c2f0>] ? mdt_remote_blocking_ast+0x0/0x210 [mdt] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa14105c5>] osp_md_object_lock+0x185/0x240 [osp] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa131d557>] lod_object_lock+0x147/0x860 [lod] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa08dfa0f>] ? lu_object_find_try+0x9f/0x260 [obdclass] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa139f92b>] mdd_object_lock+0x3b/0xd0 [mdd] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa1239b2a>] mdt_remote_object_lock+0x14a/0x310 [mdt] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa0b05925>] ? lustre_msg_buf+0x55/0x60 [ptlrpc] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa0b2ea22>] ? __req_capsule_get+0x162/0x6e0 [ptlrpc] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa1239e19>] mdt_object_lock_internal+0x129/0x2d0 [mdt] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa123a081>] mdt_object_lock+0x11/0x20 [mdt] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa124bb2a>] mdt_reint_create+0x6fa/0xcc0 [mdt] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa08fb870>] ? lu_ucred+0x20/0x30 [obdclass] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa122b675>] ? mdt_ucred+0x15/0x20 [mdt] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa12448dc>] ? mdt_root_squash+0x2c/0x3f0 [mdt] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa0b2ea22>] ? __req_capsule_get+0x162/0x6e0 [ptlrpc] Nov 26 11:44:54 lola-9 kernel: [<ffffffff81294a3a>] ? strlcpy+0x4a/0x60 Nov 26 11:44:54 lola-9 kernel: [<ffffffffa1248a1d>] mdt_reint_rec+0x5d/0x200 [mdt] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa123477b>] mdt_reint_internal+0x62b/0xb80 [mdt] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa123516b>] mdt_reint+0x6b/0x120 [mdt] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa0b70e1c>] tgt_request_handle+0x8bc/0x12e0 [ptlrpc] Nov 26 11:44:54 lola-9 kernel: [<ffffffffa0b18711>] ptlrpc_main+0xe41/0x1910 [ptlrpc] Nov 26 11:44:54 lola-9 kernel: [<ffffffff8152a39e>] ? thread_return+0x4e/0x7d0 Nov 26 11:44:54 lola-9 kernel: [<ffffffffa0b178d0>] ? ptlrpc_main+0x0/0x1910 [ptlrpc] Nov 26 11:44:54 lola-9 kernel: [<ffffffff8109e78e>] kthread+0x9e/0xc0 Nov 26 11:44:54 lola-9 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20 Nov 26 11:44:54 lola-9 kernel: [<ffffffff8109e6f0>] ? kthread+0x0/0xc0 Nov 26 11:44:54 lola-9 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20 Nov 26 11:44:54 lola-9 kernel: Nov 26 11:44:54 lola-9 kernel: LustreError: dumping log to /tmp/lustre-log.1448567094.7588
Attached messages, console log files of MDS (lola-9) and debug log file mentioned in LBUG error message. Also extracted Lustre messages on client nodes and attached them to the ticket. No errors occured on OSS nodes.
Attachments
Issue Links
Activity
Resolution | New: Duplicate [ 3 ] | |
Status | Original: Open [ 1 ] | New: Resolved [ 5 ] |
Fix Version/s | New: Lustre 2.8.0 [ 11113 ] |
Link | New: This issue is related to JFC-14 [ JFC-14 ] |
Assignee | Original: DevOps Triage [ devops-triage ] | New: Di Wang [ di.wang ] |
Attachment | New: messages-lola-9.log.bz2 [ 19739 ] |
Attachment | New: console-lola-9.log.bz2 [ 19736 ] | |
Attachment | New: lola-9-lbug-client-messages.txt.bz2 [ 19737 ] | |
Attachment | New: lustre-log.1448567094.7588.bz2 [ 19738 ] |