Details
-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
lustre-master build # 1837 RHEL6 zfs
-
3
-
12460
Description
This issue was created by maloo for sarah <sarah@whamcloud.com>
This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/ed6584da-847f-11e3-9133-52540035b04c.
The sub-test test_failover_ost failed with the following error:
test failed to respond and timed out
OST dmesg:
Lustre: DEBUG MARKER: zpool list -H lustre-ost2 >/dev/null 2>&1 || zpool import -f -o cachefile=none -d /dev/lvm-Role_OSS lustre-ost2 Lustre: DEBUG MARKER: mkdir -p /mnt/ost2; mount -t lustre lustre-ost2/ost2 /mnt/ost2 Lustre: 3388:0:(client.c:1903:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1390291785/real 1390291785] req@ffff88006c21d800 x1457826572533816/t0(0) o38->lustre-MDT0000-lwp-OST0000@10.10.4.198@tcp:12/10 lens 400/544 e 0 to 1 dl 1390291790 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 LustreError: 13a-8: Failed to get MGS log params and no local copy. Lustre: 3388:0:(client.c:1903:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1390291791/real 1390291791] req@ffff88006b122c00 x1457826572533912/t0(0) o38->lustre-MDT0000-lwp-OST0001@10.10.4.198@tcp:12/10 lens 400/544 e 0 to 1 dl 1390291796 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: lustre-OST0000: Will be in recovery for at least 1:00, or until 3 clients reconnect LustreError: 137-5: lustre-OST0002_UUID: not available for connect from 10.10.4.200@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. Lustre: 3388:0:(client.c:1903:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1390291801/real 1390291801] req@ffff88006c21dc00 x1457826572533940/t0(0) o38->lustre-MDT0000-lwp-OST0001@10.10.4.198@tcp:12/10 lens 400/544 e 0 to 1 dl 1390291811 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: lustre-OST0000: Recovery over after 0:07, of 3 clients 3 recovered and 0 were evicted. Lustre: lustre-OST0000: deleting orphan objects from 0x0:833 to 0x0:833 Lustre: 3388:0:(client.c:1903:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1390291816/real 1390291816] req@ffff880037b0ec00 x1457826572533968/t0(0) o38->lustre-MDT0000-lwp-OST0001@10.10.4.198@tcp:12/10 lens 400/544 e 0 to 1 dl 1390291831 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 LustreError: 137-5: lustre-OST0002_UUID: not available for connect from 10.10.4.200@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 18 previous similar messages Lustre: 3388:0:(client.c:1903:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1390291836/real 1390291836] req@ffff8800680a7000 x1457826572534004/t0(0) o38->lustre-MDT0000-lwp-OST0001@10.10.4.198@tcp:12/10 lens 400/544 e 0 to 1 dl 1390291856 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: 3388:0:(client.c:1903:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1390291866/real 1390291866] req@ffff880067d01000 x1457826572534060/t0(0) o38->lustre-MDT0000-lwp-OST0001@10.10.4.198@tcp:12/10 lens 400/544 e 0 to 1 dl 1390291891 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: 3388:0:(client.c:1903:ptlrpc_expire_one_request()) Skipped 1 previous similar message LustreError: 137-5: lustre-OST0002_UUID: not available for connect from 10.10.4.200@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 14 previous similar messages Lustre: 3388:0:(client.c:1903:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1390291871/real 1390291871] req@ffff88006addb800 x1457826572534072/t0(0) o38->lustre-MDT0000-lwp-OST0001@10.10.4.198@tcp:12/10 lens 400/544 e 0 to 1 dl 1390291896 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 LustreError: 137-5: lustre-OST0002_UUID: not available for connect from 10.10.4.200@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 14 previous similar messages Lustre: 3388:0:(client.c:1903:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1390291931/real 1390291931] req@ffff88006addbc00 x1457826572534196/t0(0) o38->lustre-MDT0000-lwp-OST0001@10.10.4.198@tcp:12/10 lens 400/544 e 0 to 1 dl 1390291956 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: 3388:0:(client.c:1903:ptlrpc_expire_one_request()) Skipped 6 previous similar messages LustreError: 137-5: lustre-OST0002_UUID: not available for connect from 10.10.4.200@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 35 previous similar messages INFO: task mount.lustre:3972 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mount.lustre D 0000000000000001 0 3972 3971 0x00000080 ffff88006af93718 0000000000000086 ffff88006af936a8 ffffffff81065c75 000000106af936b8 ffff88007e4b2ad8 ffff8800022167a8 ffff880002216740 ffff88006a935098 ffff88006af93fd8 000000000000fb88 ffff88006a935098 Call Trace: [<ffffffff81065c75>] ? enqueue_entity+0x125/0x410 [<ffffffff8103c7d8>] ? pvclock_clocksource_read+0x58/0xd0 [<ffffffff8150f475>] schedule_timeout+0x215/0x2e0 [<ffffffff81065c75>] ? enqueue_entity+0x125/0x410 [<ffffffff810572f4>] ? check_preempt_wakeup+0x1a4/0x260 [<ffffffff8106605b>] ? enqueue_task_fair+0xfb/0x100 [<ffffffff8150f0f3>] wait_for_common+0x123/0x180 [<ffffffff81063990>] ? default_wake_function+0x0/0x20 [<ffffffff8150f20d>] wait_for_completion+0x1d/0x20 [<ffffffffa0858c03>] llog_process_or_fork+0x353/0x5f0 [obdclass] [<ffffffffa0858eb4>] llog_process+0x14/0x20 [obdclass] [<ffffffffa088d444>] class_config_parse_llog+0x1e4/0x330 [obdclass] [<ffffffffa105c2d2>] mgc_process_log+0xd22/0x18e0 [mgc] [<ffffffffa1056360>] ? mgc_blocking_ast+0x0/0x810 [mgc] [<ffffffffa0aaf2a0>] ? ldlm_completion_ast+0x0/0x930 [ptlrpc] [<ffffffffa105e4f5>] mgc_process_config+0x645/0x11d0 [mgc] [<ffffffffa089d476>] lustre_process_log+0x256/0xa70 [obdclass] [<ffffffffa086f832>] ? class_name2dev+0x42/0xe0 [obdclass] [<ffffffff81168043>] ? kmem_cache_alloc_trace+0x1a3/0x1b0 [<ffffffffa086f8de>] ? class_name2obd+0xe/0x30 [obdclass] [<ffffffffa08ce71c>] server_start_targets+0x1c4c/0x1e00 [obdclass] [<ffffffffa08a0a7b>] ? lustre_start_mgc+0x48b/0x1e60 [obdclass] [<ffffffffa08989e0>] ? class_config_llog_handler+0x0/0x1880 [obdclass] [<ffffffffa08d3708>] server_fill_super+0xb98/0x1a64 [obdclass] [<ffffffffa08a2628>] lustre_fill_super+0x1d8/0x530 [obdclass] [<ffffffffa08a2450>] ? lustre_fill_super+0x0/0x530 [obdclass] [<ffffffff811845df>] get_sb_nodev+0x5f/0xa0 [<ffffffffa089a365>] lustre_get_sb+0x25/0x30 [obdclass] [<ffffffff81183c1b>] vfs_kern_mount+0x7b/0x1b0 [<ffffffff81183dc2>] do_kern_mount+0x52/0x130 [<ffffffff81195382>] ? vfs_ioctl+0x22/0xa0 [<ffffffff811a3f82>] do_mount+0x2d2/0x8d0 [<ffffffff811a4610>] sys_mount+0x90/0xe0 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Lustre: 3388:0:(client.c:1903:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1390291996/real 1390291996] req@ffff88006adf3400 x1457826572534332/t0(0) o38->lustre-MDT0000-lwp-OST0001@10.10.4.198@tcp:12/10 lens 400/544 e 0 to 1 dl 1390292021 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: 3388:0:(client.c:1903:ptlrpc_expire_one_request()) Skipped 7 previous similar messages LustreError: 137-5: lustre-OST0003_UUID: not available for connect from 10.10.4.202@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 124 previous similar messages INFO: task mount.lustre:3972 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mount.lustre D 0000000000000001 0 3972 3971 0x00000080
Attachments
Issue Links
- is related to
-
LU-2059 mgc to backup configuration on osd-based llogs
- Resolved