Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10540

recovery-small test 104 fails with 'ir status on ost1 should be DISABLED'

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.11.0, Lustre 2.12.0, Lustre 2.10.4, Lustre 2.10.5, Lustre 2.10.6, Lustre 2.10.7
    • None
    • SLES12 SP2 and SP3 environments
    • 3
    • 9223372036854775807

    Description

      recovery-small test_104 fails in full and failover test sessions for, so far, only SLES12 SP2 and SLES12 SP3.

      Looking at the client test_log, we see two failures:

      Started lustre-OST0000
      CMD: trevis-7vm7 /usr/sbin/lctl get_param -n obdfilter.lustre-OST0000.recovery_status |
      			awk '/status:/{ print \$2}'
      CMD: trevis-7vm7 lctl get_param -n obdfilter.lustre-OST0000.recovery_status |
                                     awk '/IR:/{ print \$2}'
      /usr/lib64/lustre/tests/recovery-small.sh: line 1630: [: too many arguments
       recovery-small test_104: @@@@@@ FAIL: Error state , must be ENABLED or DISABLED 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:5335:error()
        = /usr/lib64/lustre/tests/recovery-small.sh:1631:check_target_ir_state()
        = /usr/lib64/lustre/tests/recovery-small.sh:1873:test_104()
        = /usr/lib64/lustre/tests/test-framework.sh:5611:run_one()
        = /usr/lib64/lustre/tests/test-framework.sh:5650:run_one_logged()
        = /usr/lib64/lustre/tests/test-framework.sh:5497:run_test()
        = /usr/lib64/lustre/tests/recovery-small.sh:1877:main()
      CMD: trevis-7vm5,trevis-7vm6,trevis-7vm7,trevis-7vm8 /usr/sbin/lctl dk > /home/autotest/autotest/logs/test_logs/2018-01-19/lustre-master-patchless-sles12sp3-x86_64--full--1_5_1__58___d9f8a5c0-4038-4a31-8ae1-d00da7add1bf/recovery-small.test_104.debug_log.\$(hostname -s).1516426665.log;
               dmesg > /home/autotest/autotest/logs/test_logs/2018-01-19/lustre-master-patchless-sles12sp3-x86_64--full--1_5_1__58___d9f8a5c0-4038-4a31-8ae1-d00da7add1bf/recovery-small.test_104.dmesg.\$(hostname -s).1516426665.log
      CMD: trevis-7vm5,trevis-7vm6,trevis-7vm7,trevis-7vm8 lctl set_param -n fail_loc=0 	    fail_val=0 2>/dev/null
      /usr/lib64/lustre/tests/recovery-small.sh: line 1874: [: too many arguments
       recovery-small test_104: @@@@@@ FAIL: ir status on ost1 should be DISABLED 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:5335:error()
        = /usr/lib64/lustre/tests/recovery-small.sh:1875:test_104()
        = /usr/lib64/lustre/tests/test-framework.sh:5611:run_one()
        = /usr/lib64/lustre/tests/test-framework.sh:5650:run_one_logged()
        = /usr/lib64/lustre/tests/test-framework.sh:5497:run_test()
        = /usr/lib64/lustre/tests/recovery-small.sh:1877:main()
      

      The first error comes from the routine check_target_ir_state():

      1616 check_target_ir_state()
      1617 {
      1618         local target=${1}
      1619         local name=${target}_svc
      1620         local recovery_proc=obdfilter.${!name}.recovery_status
      1621         local st
      1622 
      1623         while : ; do
      1624                 st=$(do_facet $target "$LCTL get_param -n $recovery_proc |
      1625                         awk '/status:/{ print \\\$2}'")
      1626                 [ x$st = xRECOVERING ] || break
      1627         done
      1628         st=$(do_facet $target "lctl get_param -n $recovery_proc |
      1629                                awk '/IR:/{ print \\\$2}'")
      1630         [ $st != ON -o $st != OFF -o $st != ENABLED -o $st != DISABLED ] ||
      1631                 error "Error state $st, must be ENABLED or DISABLED"
      1632         echo -n $st
      1633 }
      

      The second error comes from test_104 itself from the following test code and is due to the previous failure check_target_ir_state() error:

      1873         local ir_state=$(check_target_ir_state ost1)
      1874         [ $ir_state = "DISABLED" -o $ir_state = "OFF" ] ||
      1875                 error "ir status on ost1 should be DISABLED"
      

      This test started failing on 2018-01-09 for lustre-master-patchless branch build #53 and lustre-master branch build #3693. Logs for these failures are at
      https://testing.hpdd.intel.com/test_sets/ab46a562-f595-11e7-a169-52540065bddc
      https://testing.hpdd.intel.com/test_sets/0f1fc7ca-f68c-11e7-a7cd-52540065bddc
      https://testing.hpdd.intel.com/test_sets/8802771e-fde4-11e7-a7cd-52540065bddc

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated: