Details
-
Bug
-
Resolution: Cannot Reproduce
-
Critical
-
None
-
Lustre 2.7.0
-
3
-
16588
Description
This issue was created by maloo for sarah <sarah@whamcloud.com>
This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/93f10028-6bb0-11e4-88ff-5254006e85c2.
The sub-test test_10 failed with the following error:
import is not in FULL state
MDS console:
Lustre: DEBUG MARKER: /usr/sbin/lctl mark == replay-dual test 10: resending a replayed unlink == 04:24:27 \(1415708667\) Lustre: DEBUG MARKER: == replay-dual test 10: resending a replayed unlink == 04:24:27 (1415708667) Lustre: DEBUG MARKER: sync; sync; sync Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 notransno Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 readonly LustreError: 20635:0:(osd_handler.c:1402:osd_ro()) *** setting lustre-MDT0000 read-only *** Turning device dm-0 (0xfd00000) read-only Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000 Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000 Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000119 Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts Lustre: DEBUG MARKER: umount -d /mnt/mds1 Lustre: Failing over lustre-MDT0000 Lustre: Skipped 1 previous similar message LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 10.2.4.100@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 16 previous similar messages Lustre: 20823:0:(client.c:1947:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1415708669/real 1415708669] req@ffff88007b2d3800 x1484473792234212/t0(0) o251->MGC10.2.4.99@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1415708675 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: 20823:0:(client.c:1947:ptlrpc_expire_one_request()) Skipped 1 previous similar message Removing read-only on unknown block (0xfd00000) Lustre: server umount lustre-MDT0000 complete Lustre: Skipped 1 previous similar message Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' Lustre: DEBUG MARKER: hostname Lustre: DEBUG MARKER: test -b /dev/lvm-Role_MDS/P1 Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre /dev/lvm-Role_MDS/P1 /mnt/mds1 LDISKFS-fs (dm-0): recovery complete LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts: LustreError: 11-0: lustre-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11. LustreError: Skipped 1 previous similar message Lustre: lustre-MDT0000: Will be in recovery for at least 1:00, or until 4 clients reconnect Lustre: Skipped 1 previous similar message Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P1 2>/dev/null Lustre: *** cfs_fail_loc=119, val=2147483648*** LustreError: 21089:0:(ldlm_lib.c:2389:target_send_reply_msg()) @@@ dropping reply req@ffff880064eee400 x1484473741963360/t47244640260(47244640260) o36->250c9384-2350-36d9-cb7a-d96794f57fdd@10.2.4.94@tcp:0/0 lens 520/448 e 0 to 0 dl 1415709342 ref 1 fl Complete:/4/0 rc 0/0 Lustre: lustre-MDT0000: Denying connection for new client lustre-MDT0000-lwp-OST0001_UUID (at 10.2.4.100@tcp), waiting for all 4 known clients (0 recovered, 4 in progress, and 0 evicted) to recover in 10:21 Lustre: Skipped 771 previous similar messages INFO: task tgt_recov:21089 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6_lustre.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. tgt_recov D 0000000000000000 0 21089 2 0x00000080 ffff88006987dda0 0000000000000046 ffff88006987dd00 ffff88007b8aff67 ffffc90001fb7370 ffff88006921153c 0000000000000004 ffff8800692111b8 ffff880055108638 ffff88006987dfd8 000000000000fbc8 ffff880055108638 Call Trace: [<ffffffffa07e0f10>] ? check_for_next_transno+0x0/0x590 [ptlrpc] [<ffffffffa07ddf4d>] target_recovery_overseer+0x9d/0x230 [ptlrpc] [<ffffffffa07dc630>] ? exp_req_replay_healthy+0x0/0x30 [ptlrpc] [<ffffffff8109afa0>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa07e521e>] target_recovery_thread+0x9ae/0x1a10 [ptlrpc] [<ffffffff81061d12>] ? default_wake_function+0x12/0x20 [<ffffffffa07e4870>] ? target_recovery_thread+0x0/0x1a10 [ptlrpc] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 INFO: task tgt_recov:21089 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6_lustre.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Info required for matching: replay-dual 10
Attachments
Issue Links
- mentioned in
-
Page Loading...