Details
-
Bug
-
Resolution: Low Priority
-
Minor
-
None
-
Lustre 2.4.0
-
None
-
3
-
7728
Description
This issue was created by maloo for Li Wei <liwei@whamcloud.com>
This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/cb34d710-a15b-11e2-b429-52540035b04c.
The sub-test test_3 failed with the following error:
test failed to respond and timed out
Info required for matching: replay-ost-single 3
From the test log:
== replay-ost-single test 3: Fail OST during write, with verification == 10:40:41 (1365529241)
Failing ost1 on wtm-13vm8
CMD: wtm-13vm8 grep -c /mnt/ost1' ' /proc/mounts
Stopping /mnt/ost1 (opts on wtm-13vm8
CMD: wtm-13vm8 umount -d /mnt/ost1
CMD: wtm-13vm8 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
reboot facets: ost1
Failover ost1 to wtm-13vm8
10:41:01 (1365529261) waiting for wtm-13vm8 network 900 secs ...
10:41:01 (1365529261) network interface is UP
CMD: wtm-13vm8 hostname
mount facets: ost1
CMD: wtm-13vm8 test -b /dev/lvm-OSS/P1
Starting ost1: /dev/lvm-OSS/P1 /mnt/ost1
CMD: wtm-13vm8 mkdir -p /mnt/ost1; mount -t lustre /dev/lvm-OSS/P1 /mnt/ost1
CMD: wtm-13vm8 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin: NAME=autotest_config sh rpc.sh set_default_debug \"0x33f0404\" \" 0xffb7e3ff\" 32
CMD: wtm-13vm8 e2label /dev/lvm-OSS/P1 2>/dev/null
Started lustre-OST0000
CMD: wtm-13vm1,wtm-13vm2.rosso.whamcloud.com PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin: NAME=autotest_config sh rpc.sh wait_import_state_mount FULL osc.lustre-OST0000-osc-*.ost_server_uuid
wtm-13vm2: CMD: wtm-13vm2.rosso.whamcloud.com lctl get_param -n at_max
wtm-13vm1: CMD: wtm-13vm1.rosso.whamcloud.com lctl get_param -n at_max
wtm-13vm1: rpc : @@@@@@ FAIL: can't put import for osc.lustre-OST0000-osc-*.ost_server_uuid into FULL state after 662 sec, have REPLAY_WAIT
wtm-13vm1: Trace dump:
wtm-13vm1: = /usr/lib64/lustre/tests/test-framework.sh:4024:error_noexit()
wtm-13vm1: = /usr/lib64/lustre/tests/test-framework.sh:4047:error()
wtm-13vm1: = /usr/lib64/lustre/tests/test-framework.sh:5070:_wait_import_state()
wtm-13vm1: = /usr/lib64/lustre/tests/test-framework.sh:5089:wait_import_state()
wtm-13vm1: = /usr/lib64/lustre/tests/test-framework.sh:5098:wait_import_state_mount()
wtm-13vm1: = rpc.sh:20:main()
[...]
From the OSS console log:
[...]
10:52:20:Lustre: lustre-OST0000: Client e2129d7f-62b9-b16f-bf66-e1b35c1765b7 (at 10.10.16.142@tcp) reconnecting, waiting for 3 clients in recovery for 0:41
10:52:20:Lustre: Skipped 78214 previous similar messages
10:52:20:LustreError: 5789:0:(ldlm_resource.c:1171:ldlm_resource_get()) lvbo_init failed for resource 2820: rc -2
10:52:20:LustreError: 5789:0:(ldlm_resource.c:1171:ldlm_resource_get()) Skipped 544404 previous similar messages
10:52:20:Lustre: lustre-OST0000: Client e2129d7f-62b9-b16f-bf66-e1b35c1765b7 (at 10.10.16.142@tcp) reconnecting, waiting for 3 clients in recovery for 0:42
10:52:20:Lustre: Skipped 160009 previous similar messages
10:52:20:Lustre: DEBUG MARKER: /usr/sbin/lctl mark rpc : @@@@@@ FAIL: can\'t put import for osc.lustre-OST0000-osc-*.ost_server_uuid into FULL state after 662 sec, have REPLAY_WAIT
[...]
Attachments
Issue Links
- is duplicated by
-
LU-4230 Test failure on test suite replay-ost-single, subtest test_3
- Resolved