Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
Lustre 2.4.0
-
None
-
3
-
5730
Description
This issue was created by maloo for sarah <sarah@whamcloud.com>
This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/8106fea4-3a9d-11e2-b2e6-52540035b04c.
The sub-test test_failover_mds failed with the following error:
test_failover_mds returned 7
test log shows
== recovery-mds-scale test failover_mds: failover MDS == 13:08:23 (1354136903) Started client load: dd on client-28vm5 CMD: client-28vm5 PATH=/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin: MOUNT=/mnt/lustre ERRORS_OK= BREAK_ON_ERROR= END_RUN_FILE=/home/autotest/.autotest/shared_dir/2012-11-28/112808-70113261691540/end_run_file LOAD_PID_FILE=/tmp/client-load.pid TESTLOG_PREFIX=/logdir/test_logs/2012-11-28/lustre-master-el6-x86_64-fo__1065__-70113261691540-112807/recovery-mds-scale TESTNAME=test_failover_mds DBENCH_LIB=/usr/share/doc/dbench/loadfiles DBENCH_SRC= run_dd.sh Started client load: tar on client-28vm6 CMD: client-28vm6 PATH=/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin: MOUNT=/mnt/lustre ERRORS_OK= BREAK_ON_ERROR= END_RUN_FILE=/home/autotest/.autotest/shared_dir/2012-11-28/112808-70113261691540/end_run_file LOAD_PID_FILE=/tmp/client-load.pid TESTLOG_PREFIX=/logdir/test_logs/2012-11-28/lustre-master-el6-x86_64-fo__1065__-70113261691540-112807/recovery-mds-scale TESTNAME=test_failover_mds DBENCH_LIB=/usr/share/doc/dbench/loadfiles DBENCH_SRC= run_tar.sh client loads pids: CMD: client-28vm5,client-28vm6 cat /tmp/client-load.pid client-28vm6: 4127 client-28vm5: 4080 ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=0 DURATION=86400 PERIOD=900 CMD: client-28vm5 rc=\$([ -f /proc/sys/lnet/catastrophe ] && echo \$(< /proc/sys/lnet/catastrophe) || echo 0); if [ \$rc -ne 0 ]; then echo \$(hostname): \$rc; fi exit \$rc; CMD: client-28vm5 ps auxwww | grep -v grep | grep -q run_dd.sh CMD: client-28vm6 rc=\$([ -f /proc/sys/lnet/catastrophe ] && echo \$(< /proc/sys/lnet/catastrophe) || echo 0); if [ \$rc -ne 0 ]; then echo \$(hostname): \$rc; fi exit \$rc; CMD: client-28vm6 ps auxwww | grep -v grep | grep -q run_tar.sh Wait mds1 recovery complete before doing next failover... CMD: client-28vm1.lab.whamcloud.com lctl get_param -n at_max affected facets: mds1 CMD: client-28vm3 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin: NAME=autotest_config sh rpc.sh _wait_recovery_complete *.lustre:MDT0000.recovery_status 662 client-28vm3: error: get_param: /proc/{fs,sys}/{lnet,lustre}/*/lustre:MDT0000/recovery_status: Found no match mds1 recovery is not completed! 2012-11-28 13:08:32 Terminating clients loads ...