Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.8.0
-
lola
build: 2.7.63-4-gf84e06e, a7eface85ea2d2aa6198681264b082a0244855d4 + patches
-
3
-
9223372036854775807
Description
The error occurred during soak testing of master branch build '20151122' (see https://wiki.hpdd.intel.com/pages/viewpage.action?title=Soak+Testing+on+Lola&spaceKey=Releases#SoakTestingonLola-20151122). DNE is enabled. MDSes are configured in active-active failover configuration.
Sequence of events:
- 2015-11-26 10:32 Failover resources (mdt-0,1) lola-8 --> lola-9 started
- 2015-11-26 11:40 Failback resources (mdt-0.1) lola-9 --> lola-8 completed successful
- 2015-11-26 11:44 LBUG on lola-8. See the following message.
Nov 26 11:44:54 lola-8 kernel: LustreError: 8491:0:(out_lib.c:692:out_tx_write_exec()) read record [0x240089779:0x1:0x0] tail_pos 173122472 rc -53 index 50635 size 172659608 Nov 26 11:44:54 lola-8 kernel: LustreError: 8491:0:(out_lib.c:693:out_tx_write_exec()) LBUG Nov 26 11:44:54 lola-8 kernel: Pid: 8491, comm: mdt_out03_004 Nov 26 11:44:54 lola-8 kernel: Nov 26 11:44:54 lola-8 kernel: Call Trace: Nov 26 11:44:54 lola-8 kernel: [<ffffffffa07fb875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] Nov 26 11:44:54 lola-8 kernel: [<ffffffffa07fbe77>] lbug_with_loc+0x47/0xb0 [libcfs] Nov 26 11:44:54 lola-8 kernel: [<ffffffffa0bb60a0>] out_tx_write_exec+0x500/0x7a0 [ptlrpc] Nov 26 11:44:54 lola-8 kernel: [<ffffffffa0bb934b>] ? out_tx_xattr_set_exec+0xeb/0x680 [ptlrpc] Nov 26 11:44:54 lola-8 kernel: [<ffffffffa0bae13a>] out_tx_end+0xda/0x5d0 [ptlrpc] Nov 26 11:44:54 lola-8 kernel: [<ffffffffa0bb3726>] out_handle+0xbd6/0x1890 [ptlrpc] Nov 26 11:44:54 lola-8 kernel: [<ffffffffa0afa4e0>] ? target_bulk_timeout+0x0/0xc0 [ptlrpc] Nov 26 11:44:54 lola-8 kernel: [<ffffffffa0baae1c>] tgt_request_handle+0x8bc/0x12e0 [ptlrpc] Nov 26 11:44:54 lola-8 kernel: [<ffffffffa0b52711>] ptlrpc_main+0xe41/0x1910 [ptlrpc] Nov 26 11:44:54 lola-8 kernel: [<ffffffff8152a39e>] ? thread_return+0x4e/0x7d0 Nov 26 11:44:54 lola-8 kernel: [<ffffffffa0b518d0>] ? ptlrpc_main+0x0/0x1910 [ptlrpc] Nov 26 11:44:54 lola-8 kernel: [<ffffffff8109e78e>] kthread+0x9e/0xc0 Nov 26 11:44:54 lola-8 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20 Nov 26 11:44:54 lola-8 kernel: [<ffffffff8109e6f0>] ? kthread+0x0/0xc0 Nov 26 11:44:54 lola-8 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20 Nov 26 11:44:54 lola-8 kernel: Nov 26 11:44:54 lola-8 kernel: LustreError: dumping log to /tmp/lustre-log.1448567093.8491
Most likely this event is related to LU-7488 which happened almost at the same time on the HA failover partner (lola-9)
Attached console and messages log file of MDS (lola-8), kernel debug log file mentioned in the LBUG error message and error messages extracted from Lustre client nodes messages files that showed up at the same time.
Attachments
Issue Links
Activity
Resolution | New: Fixed [ 1 ] | |
Status | Original: Open [ 1 ] | New: Resolved [ 5 ] |
End date | New: 07/Dec/15 | |
Start date | New: 27/Nov/15 |
Link | New: This issue is related to JFC-14 [ JFC-14 ] |
Fix Version/s | New: Lustre 2.8.0 [ 11113 ] |
Affects Version/s | New: Lustre 2.8.0 [ 11113 ] |
Assignee | Original: WC Triage [ wc-triage ] | New: Di Wang [ di.wang ] |
Attachment | New: console-lola-8.log.bz2 [ 19742 ] | |
Attachment | New: lola-8-lbug-client-messages.txt.bz2 [ 19743 ] | |
Attachment | New: lustre-log.1448567093.8491.bz2 [ 19744 ] | |
Attachment | New: messages-lola-8.log.bz2 [ 19745 ] |