Details
-
Bug
-
Resolution: Duplicate
-
Blocker
-
None
-
Lustre 2.8.0
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for wangdi <di.wang@intel.com>
This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/f670098e-6a05-11e5-9d0a-5254006e85c2.
The sub-test test_116b failed with the following error:
10:13:07:Lustre: DEBUG MARKER: == replay-single test 116b: large update log slave MDT recovery == 10:12:46 (1443867166) 10:13:07:Lustre: DEBUG MARKER: sync; sync; sync 10:13:07:Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0001 notransno 10:13:07:Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0001 readonly 10:13:07:Turning device dm-1 (0xfd00001) read-only 10:13:07:Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds2 REPLAY BARRIER on lustre-MDT0001 10:13:07:Lustre: DEBUG MARKER: mds2 REPLAY BARRIER on lustre-MDT0001 10:13:07:Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80001702 10:13:07:Lustre: DEBUG MARKER: grep -c /mnt/mds2' ' /proc/mounts 10:13:07:Lustre: DEBUG MARKER: umount -d /mnt/mds2 10:13:07:Removing read-only on unknown block (0xfd00001) 10:13:07:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' 10:13:07:Lustre: DEBUG MARKER: hostname 10:13:07:Lustre: DEBUG MARKER: test -b /dev/lvm-Role_MDS/P2 10:13:07:Lustre: DEBUG MARKER: mkdir -p /mnt/mds2; mount -t lustre /dev/lvm-Role_MDS/P2 /mnt/mds2 10:13:07:LDISKFS-fs (dm-1): recovery complete 10:13:07:LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. quota=on. Opts: 10:13:07:Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u 10:13:07:Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1 10:13:07:Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P2 2>/dev/null 10:13:07:LustreError: 30838:0:(tgt_handler.c:621:process_req_last_xid()) @@@ Unexpected xid 560f8bf556714 vs. last_xid 560f8bf55672b 10:13:07: req@ffff880059530c50 x1513996361754388/t0(158913790082) o1000->lustre-MDT0000-mdtlov_UUID@10.1.5.33@tcp:666/0 lens 4168/0 e 0 to 0 dl 1443867196 ref 1 fl Interpret:/4/ffffffff rc 0/-1 10:13:07:LustreError: 30838:0:(tgt_handler.c:636:process_req_last_xid()) LBUG 10:13:07:Pid: 30838, comm: mdt_out00_000 10:13:07: 10:13:07:Call Trace: 10:13:07: [<ffffffffa049b875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 10:13:07: [<ffffffffa049be77>] lbug_with_loc+0x47/0xb0 [libcfs] 10:13:07: [<ffffffffa08859bd>] tgt_request_handle+0x105d/0x13f0 [ptlrpc] 10:13:07: [<ffffffffa082cd61>] ptlrpc_main+0xe41/0x1910 [ptlrpc] 10:13:07: [<ffffffffa082bf20>] ? ptlrpc_main+0x0/0x1910 [ptlrpc] 10:13:07: [<ffffffff810a101e>] kthread+0x9e/0xc0 10:13:07: [<ffffffff8100c28a>] child_rip+0xa/0x20 10:13:07: [<ffffffff810a0f80>] ? kthread+0x0/0xc0 10:13:07: [<ffffffff8100c280>] ? child_rip+0x0/0x20 10:13:07: 10:13:07:Kernel panic - not syncing: LBUG 10:13:07:Pid: 30838, comm: mdt_out00_000 Not tainted 2.6.32-573.3.1.el6_lustre.g00880a0.x86_64 #1 10:13:07:Call Trace: 10:13:07: [<ffffffff815384e4>] ? panic+0xa7/0x16f 10:13:07: [<ffffffffa049becb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] 10:13:07: [<ffffffffa08859bd>] ? tgt_request_handle+0x105d/0x13f0 [ptlrpc] 10:13:07: [<ffffffffa082cd61>] ? ptlrpc_main+0xe41/0x1910 [ptlrpc] 10:13:07: [<ffffffffa082bf20>] ? ptlrpc_main+0x0/0x1910 [ptlrpc] 10:13:07: [<ffffffff810a101e>] ? kthread+0x9e/0xc0 10:13:07: [<ffffffff8100c28a>] ? child_rip+0xa/0x20 10:13:07: [<ffffffff810a0f80>] ? kthread+0x0/0xc0 10:13:07: [<ffffffff8100c280>] ? child_rip+0x0/0x20 10:13:07:Initializing cgroup subsys cpuset 10:13:07:Initializing cgroup subsys cpu
Please provide additional information about the failure here.
Info required for matching: replay-single 116b
Attachments
Issue Links
- is related to
-
LU-5951 sanity test_39k: mtime is lost on close
-
- Resolved
-
rq_allow_replay is only for readonly request (no need replay), which means the request can be sent even before the import is FULL (during recovery process). Hmm, Did you see replay request set with rq_allow_replay?
Btw: even the comment out that LBUG section, it does not seem to resolve the problem. See https://testing.hpdd.intel.com/test_sets/15508dae-6bde-11e5-ae15-5254006e85c2