Details
-
Bug
-
Resolution: Cannot Reproduce
-
Critical
-
None
-
Lustre 2.7.0
-
3
-
16588
Description
This issue was created by maloo for sarah <sarah@whamcloud.com>
This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/93f10028-6bb0-11e4-88ff-5254006e85c2.
The sub-test test_10 failed with the following error:
import is not in FULL state
MDS console:
Lustre: DEBUG MARKER: /usr/sbin/lctl mark == replay-dual test 10: resending a replayed unlink == 04:24:27 \(1415708667\)
Lustre: DEBUG MARKER: == replay-dual test 10: resending a replayed unlink == 04:24:27 (1415708667)
Lustre: DEBUG MARKER: sync; sync; sync
Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 notransno
Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 readonly
LustreError: 20635:0:(osd_handler.c:1402:osd_ro()) *** setting lustre-MDT0000 read-only ***
Turning device dm-0 (0xfd00000) read-only
Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000
Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000
Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000119
Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts
Lustre: DEBUG MARKER: umount -d /mnt/mds1
Lustre: Failing over lustre-MDT0000
Lustre: Skipped 1 previous similar message
LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 10.2.4.100@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 16 previous similar messages
Lustre: 20823:0:(client.c:1947:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1415708669/real 1415708669] req@ffff88007b2d3800 x1484473792234212/t0(0) o251->MGC10.2.4.99@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1415708675 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 20823:0:(client.c:1947:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Removing read-only on unknown block (0xfd00000)
Lustre: server umount lustre-MDT0000 complete
Lustre: Skipped 1 previous similar message
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
Lustre: DEBUG MARKER: hostname
Lustre: DEBUG MARKER: test -b /dev/lvm-Role_MDS/P1
Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre /dev/lvm-Role_MDS/P1 /mnt/mds1
LDISKFS-fs (dm-0): recovery complete
LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts:
LustreError: 11-0: lustre-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11.
LustreError: Skipped 1 previous similar message
Lustre: lustre-MDT0000: Will be in recovery for at least 1:00, or until 4 clients reconnect
Lustre: Skipped 1 previous similar message
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P1 2>/dev/null
Lustre: *** cfs_fail_loc=119, val=2147483648***
LustreError: 21089:0:(ldlm_lib.c:2389:target_send_reply_msg()) @@@ dropping reply req@ffff880064eee400 x1484473741963360/t47244640260(47244640260) o36->250c9384-2350-36d9-cb7a-d96794f57fdd@10.2.4.94@tcp:0/0 lens 520/448 e 0 to 0 dl 1415709342 ref 1 fl Complete:/4/0 rc 0/0
Lustre: lustre-MDT0000: Denying connection for new client lustre-MDT0000-lwp-OST0001_UUID (at 10.2.4.100@tcp), waiting for all 4 known clients (0 recovered, 4 in progress, and 0 evicted) to recover in 10:21
Lustre: Skipped 771 previous similar messages
INFO: task tgt_recov:21089 blocked for more than 120 seconds.
Not tainted 2.6.32-431.29.2.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
tgt_recov D 0000000000000000 0 21089 2 0x00000080
ffff88006987dda0 0000000000000046 ffff88006987dd00 ffff88007b8aff67
ffffc90001fb7370 ffff88006921153c 0000000000000004 ffff8800692111b8
ffff880055108638 ffff88006987dfd8 000000000000fbc8 ffff880055108638
Call Trace:
[<ffffffffa07e0f10>] ? check_for_next_transno+0x0/0x590 [ptlrpc]
[<ffffffffa07ddf4d>] target_recovery_overseer+0x9d/0x230 [ptlrpc]
[<ffffffffa07dc630>] ? exp_req_replay_healthy+0x0/0x30 [ptlrpc]
[<ffffffff8109afa0>] ? autoremove_wake_function+0x0/0x40
[<ffffffffa07e521e>] target_recovery_thread+0x9ae/0x1a10 [ptlrpc]
[<ffffffff81061d12>] ? default_wake_function+0x12/0x20
[<ffffffffa07e4870>] ? target_recovery_thread+0x0/0x1a10 [ptlrpc]
[<ffffffff8109abf6>] kthread+0x96/0xa0
[<ffffffff8100c20a>] child_rip+0xa/0x20
[<ffffffff8109ab60>] ? kthread+0x0/0xa0
[<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task tgt_recov:21089 blocked for more than 120 seconds.
Not tainted 2.6.32-431.29.2.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Info required for matching: replay-dual 10
Attachments
Issue Links
- mentioned in
-
Page Loading...