Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.8.0
-
lola
build: 2.7.63-4-gf84e06e, a7eface85ea2d2aa6198681264b082a0244855d4 + patches
-
3
-
9223372036854775807
Description
The error occurred during soak testing of master branch build '20151122' (see https://wiki.hpdd.intel.com/pages/viewpage.action?title=Soak+Testing+on+Lola&spaceKey=Releases#SoakTestingonLola-20151122). DNE is enabled. MDSes are configured in active-active failover configuration.
Sequence of events:
- 2015-11-26 10:32 Failover resources (mdt-0,1) lola-8 --> lola-9 started
- 2015-11-26 11:40 Failback resources (mdt-0.1) lola-9 --> lola-8 completed successful
- 2015-11-26 11:44 LBUG on lola-8. See the following message.
Nov 26 11:44:54 lola-8 kernel: LustreError: 8491:0:(out_lib.c:692:out_tx_write_exec()) read record [0x240089779:0x1:0x0] tail_pos 173122472 rc -53 index 50635 size 172659608 Nov 26 11:44:54 lola-8 kernel: LustreError: 8491:0:(out_lib.c:693:out_tx_write_exec()) LBUG Nov 26 11:44:54 lola-8 kernel: Pid: 8491, comm: mdt_out03_004 Nov 26 11:44:54 lola-8 kernel: Nov 26 11:44:54 lola-8 kernel: Call Trace: Nov 26 11:44:54 lola-8 kernel: [<ffffffffa07fb875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] Nov 26 11:44:54 lola-8 kernel: [<ffffffffa07fbe77>] lbug_with_loc+0x47/0xb0 [libcfs] Nov 26 11:44:54 lola-8 kernel: [<ffffffffa0bb60a0>] out_tx_write_exec+0x500/0x7a0 [ptlrpc] Nov 26 11:44:54 lola-8 kernel: [<ffffffffa0bb934b>] ? out_tx_xattr_set_exec+0xeb/0x680 [ptlrpc] Nov 26 11:44:54 lola-8 kernel: [<ffffffffa0bae13a>] out_tx_end+0xda/0x5d0 [ptlrpc] Nov 26 11:44:54 lola-8 kernel: [<ffffffffa0bb3726>] out_handle+0xbd6/0x1890 [ptlrpc] Nov 26 11:44:54 lola-8 kernel: [<ffffffffa0afa4e0>] ? target_bulk_timeout+0x0/0xc0 [ptlrpc] Nov 26 11:44:54 lola-8 kernel: [<ffffffffa0baae1c>] tgt_request_handle+0x8bc/0x12e0 [ptlrpc] Nov 26 11:44:54 lola-8 kernel: [<ffffffffa0b52711>] ptlrpc_main+0xe41/0x1910 [ptlrpc] Nov 26 11:44:54 lola-8 kernel: [<ffffffff8152a39e>] ? thread_return+0x4e/0x7d0 Nov 26 11:44:54 lola-8 kernel: [<ffffffffa0b518d0>] ? ptlrpc_main+0x0/0x1910 [ptlrpc] Nov 26 11:44:54 lola-8 kernel: [<ffffffff8109e78e>] kthread+0x9e/0xc0 Nov 26 11:44:54 lola-8 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20 Nov 26 11:44:54 lola-8 kernel: [<ffffffff8109e6f0>] ? kthread+0x0/0xc0 Nov 26 11:44:54 lola-8 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20 Nov 26 11:44:54 lola-8 kernel: Nov 26 11:44:54 lola-8 kernel: LustreError: dumping log to /tmp/lustre-log.1448567093.8491
Most likely this event is related to LU-7488 which happened almost at the same time on the HA failover partner (lola-9)
Attached console and messages log file of MDS (lola-8), kernel debug log file mentioned in the LBUG error message and error messages extracted from Lustre client nodes messages files that showed up at the same time.