Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8805

Failover: recovery-mds-scale test_failover_mds: test_failover_mds returned 4

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.9.0
    • Lustre 2.9.0
    • None
    • Failover: EL7 Server/Client
      master, build# 3468
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/be9e4ae0-a1c0-11e6-8ed2-5254006e85c2.

      The sub-test test_failover_mds failed with the following error:

      test_failover_mds returned 4
      

      test_log:

      == recovery-mds-scale test failover_mds: failover MDS ================================================ 17:03:39 (1478131419)
      Started client load: dd on onyx-40vm5
      CMD: onyx-40vm5 PATH=/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/compat-openmpi16/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin:/bin: MOUNT=/mnt/lustre ERRORS_OK= BREAK_ON_ERROR= END_RUN_FILE=/shared_test/autotest/2016-11-02/153732-70163256913820/end_run_file LOAD_PID_FILE=/tmp/client-load.pid TESTLOG_PREFIX=/logdir/test_logs/2016-11-02/lustre-master-el7-x86_64--failover--1_15_1__3468__-70163256913820-153732/recovery-mds-scale TESTNAME=test_failover_mds DBENCH_LIB=/usr/share/doc/dbench/loadfiles DBENCH_SRC= CLIENT_COUNT=2 LFS=/usr/bin/lfs run_dd.sh
      Started client load: tar on onyx-40vm6
      CMD: onyx-40vm6 PATH=/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/compat-openmpi16/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin:/bin: MOUNT=/mnt/lustre ERRORS_OK= BREAK_ON_ERROR= END_RUN_FILE=/shared_test/autotest/2016-11-02/153732-70163256913820/end_run_file LOAD_PID_FILE=/tmp/client-load.pid TESTLOG_PREFIX=/logdir/test_logs/2016-11-02/lustre-master-el7-x86_64--failover--1_15_1__3468__-70163256913820-153732/recovery-mds-scale TESTNAME=test_failover_mds DBENCH_LIB=/usr/share/doc/dbench/loadfiles DBENCH_SRC= CLIENT_COUNT=2 LFS=/usr/bin/lfs run_tar.sh
      client loads pids:
      CMD: onyx-40vm5,onyx-40vm6 cat /tmp/client-load.pid
      onyx-40vm6: 7449
      onyx-40vm5: 7479
      ==== Checking the clients loads BEFORE failover -- failure NOT OK              ELAPSED=0 DURATION=86400 PERIOD=1200
      Client load failed on node onyx-40vm5, rc=1
      2016-11-02 17:03:46 Terminating clients loads ...
      Duration:               86400
      Server failover period: 1200 seconds
      Exited after:           0 seconds
      Number of failovers before exit:
      mds1: 0 times
      ost1: 0 times
      ost2: 0 times
      ost3: 0 times
      ost4: 0 times
      ost5: 0 times
      ost6: 0 times
      ost7: 0 times
      Status: FAIL: rc=4
      CMD: onyx-40vm5,onyx-40vm6 test -f /tmp/client-load.pid &&
              { kill -s TERM \$(cat /tmp/client-load.pid); rm -f /tmp/client-load.pid; }
      /usr/lib64/lustre/tests/recovery-mds-scale.sh: line 103: 22083 Killed                  do_node $client "PATH=$PATH MOUNT=$MOUNT ERRORS_OK=$ERRORS_OK BREAK_ON_ERROR=$BREAK_ON_ERROR END_RUN_FILE=$END_RUN_FILE LOAD_PID_FILE=$LOAD_PID_FILE TESTLOG_PREFIX=$TESTLOG_PREFIX TESTNAME=$TESTNAME DBENCH_LIB=$DBENCH_LIB DBENCH_SRC=$DBENCH_SRC CLIENT_COUNT=$((CLIENTCOUNT - 1)) LFS=$LFS run_${load}.sh"
      /usr/lib64/lustre/tests/recovery-mds-scale.sh: line 103: 22277 Killed                  do_node $client "PATH=$PATH MOUNT=$MOUNT ERRORS_OK=$ERRORS_OK BREAK_ON_ERROR=$BREAK_ON_ERROR END_RUN_FILE=$END_RUN_FILE LOAD_PID_FILE=$LOAD_PID_FILE TESTLOG_PREFIX=$TESTLOG_PREFIX TESTNAME=$TESTNAME DBENCH_LIB=$DBENCH_LIB DBENCH_SRC=$DBENCH_SRC CLIENT_COUNT=$((CLIENTCOUNT - 1)) LFS=$LFS run_${load}.sh"
      Dumping lctl log to /logdir/test_logs/2016-11-02/lustre-master-el7-x86_64--failover--1_15_1__3468__-70163256913820-153732/recovery-mds-scale.test_failover_mds.*.1478131427.log
      CMD: onyx-40vm3,onyx-40vm4,onyx-40vm7,onyx-40vm8 /usr/sbin/lctl dk > /logdir/test_logs/2016-11-02/lustre-master-el7-x86_64--failover--1_15_1__3468__-70163256913820-153732/recovery-mds-scale.test_failover_mds.debug_log.\$(hostname -s).1478131427.log;
               dmesg > /logdir/test_logs/2016-11-02/lustre-master-el7-x86_64--failover--1_15_1__3468__-70163256913820-153732/recovery-mds-scale.test_failover_mds.dmesg.\$(hostname -s).1478131427.log
      onyx-40vm3: invalid parameter 'dump_kernel'
      onyx-40vm3: open(dump_kernel) failed: No such file or directory
      onyx-40vm4: invalid parameter 'dump_kernel'
      onyx-40vm4: open(dump_kernel) failed: No such file or directory
      

      Could not find another useful information.
      Can be related to LU-5483

      Attachments

        Issue Links

          Activity

            People

              hongchao.zhang Hongchao Zhang
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: