Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6935

replay-single test_70b FAIL: import is not in FULL state

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.8.0
    • None
    • review-dne-part-2 in autotest
    • 3
    • 9223372036854775807

    Description

      replay-single test 70 fails, actually hangs. There are several problems here:
      1. The test fails complaining that "import is not in FULL state":

      07:07:46:shadow-9vm9:  rpc : @@@@@@ FAIL: can't put import for mdc.lustre-MDT0002-mdc-*.mds_server_uuid into FULL state after 1475 sec, have REPLAY 
      07:07:46:shadow-9vm6:    1  cleanup 2741 sec
      07:07:46:shadow-9vm9:    1  cleanup 2741 sec
      07:07:47:shadow-9vm9:   Trace dump:
      07:07:47:shadow-9vm9:   = /usr/lib64/lustre/tests/test-framework.sh:4727:error_noexit()
      07:07:47:shadow-9vm9:   = /usr/lib64/lustre/tests/test-framework.sh:4758:error()
      07:07:47:shadow-9vm9:   = /usr/lib64/lustre/tests/test-framework.sh:5830:_wait_import_state()
      07:07:47:shadow-9vm9:   = /usr/lib64/lustre/tests/test-framework.sh:5849:wait_import_state()
      07:07:47:shadow-9vm9:   = /usr/lib64/lustre/tests/test-framework.sh:5858:wait_import_state_mount()
      07:07:47:shadow-9vm9:   = rpc.sh:20:main()
      07:07:47:shadow-9vm9: CMD: shadow-9vm4,shadow-9vm6,shadow-9vm7,shadow-9vm8,shadow-9vm9.shadow.whamcloud.com PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:./../utils:/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/bin:/bin:/usr/sbin:/sbin::/sbin:/bin:/usr/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh check_logdir /tmp/test_logs/1435992181 
      07:07:47:shadow-9vm6:  rpc : @@@@@@ FAIL: can't put import for mdc.lustre-MDT0002-mdc-*.mds_server_uuid into FULL state after 1475 sec, have REPLAY 
      07:07:47:shadow-9vm6:    1  cleanup 2742 sec
      07:07:47:shadow-9vm9:    1  cleanup 2742 sec
      07:07:47:shadow-9vm6:   Trace dump:
      07:07:47:shadow-9vm6:   = /usr/lib64/lustre/tests/test-framework.sh:4727:error_noexit()
      07:07:47:shadow-9vm6:   = /usr/lib64/lustre/tests/test-framework.sh:4758:error()
      07:07:47:shadow-9vm6:   = /usr/lib64/lustre/tests/test-framework.sh:5830:_wait_import_state()
      07:07:47:shadow-9vm6:   = /usr/lib64/lustre/tests/test-framework.sh:5849:wait_import_state()
      07:07:47:shadow-9vm6:   = /usr/lib64/lustre/tests/test-framework.sh:5858:wait_import_state_mount()
      07:07:47:shadow-9vm6:   = rpc.sh:20:main()
      07:07:47:shadow-9vm9: CMD: shadow-9vm4 uname -n
      07:07:47:shadow-9vm6: CMD: shadow-9vm4,shadow-9vm6,shadow-9vm6.shadow.whamcloud.com,shadow-9vm7,shadow-9vm8 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:./../utils:/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/bin:/bin:/usr/sbin:/sbin::/sbin:/bin:/usr/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh check_logdir /tmp/test_logs/1435992181 
      07:07:47:shadow-9vm9: Dumping lctl log to /tmp/test_logs/1435992181/rpc..*.1435993666.log
      07:07:47:shadow-9vm9: CMD: shadow-9vm4,shadow-9vm6,shadow-9vm7,shadow-9vm8,shadow-9vm9.shadow.whamcloud.com /usr/sbin/lctl dk > /tmp/test_logs/1435992181/rpc..debug_log.\$(hostname -s).1435993666.log;
      07:07:47:shadow-9vm9:          dmesg > /tmp/test_logs/1435992181/rpc..dmesg.\$(hostname -s).1435993666.log
      07:07:48:shadow-9vm6: CMD: shadow-9vm4 uname -n
      07:07:48:shadow-9vm6: Dumping lctl log to /tmp/test_logs/1435992181/rpc..*.1435993667.log
      07:07:48:shadow-9vm6: CMD: shadow-9vm4,shadow-9vm6,shadow-9vm6.shadow.whamcloud.com,shadow-9vm7,shadow-9vm8 /usr/sbin/lctl dk > /tmp/test_logs/1435992181/rpc..debug_log.\$(hostname -s).1435993667.log;
      07:07:48:shadow-9vm6:          dmesg > /tmp/test_logs/1435992181/rpc..dmesg.\$(hostname -s).1435993667.log
      07:07:48:shadow-9vm6:    1  cleanup 2743 sec
      07:07:48:shadow-9vm9:    1  cleanup 2743 sec
      07:07:48:shadow-9vm9: CMD: shadow-9vm4,shadow-9vm6,shadow-9vm7,shadow-9vm8,shadow-9vm9.shadow.whamcloud.com rsync -az /tmp/test_logs/1435992181/rpc..*.1435993666.log shadow-9vm9.shadow.whamcloud.com:/tmp/test_logs/1435992181
      07:07:48:shadow-9vm6: CMD: shadow-9vm4,shadow-9vm6,shadow-9vm6.shadow.whamcloud.com,shadow-9vm7,shadow-9vm8 rsync -az /tmp/test_logs/1435992181/rpc..*.1435993667.log shadow-9vm6.shadow.whamcloud.com:/tmp/test_logs/1435992181
      07:07:49:shadow-9vm6:    1  cleanup 2744 sec
      07:07:49:shadow-9vm9:    1  cleanup 2744 sec
      07:07:49: replay-single test_70b: @@@@@@ FAIL: import is not in FULL state 
      07:07:49:  Trace dump:
      07:07:49:  = /usr/lib64/lustre/tests/test-framework.sh:4727:error_noexit()
      07:07:49:  = /usr/lib64/lustre/tests/test-framework.sh:4758:error()
      07:07:49:  = /usr/lib64/lustre/tests/test-framework.sh:6004:wait_clients_import_state()
      07:07:49:  = /usr/lib64/lustre/tests/test-framework.sh:2574:fail()
      07:07:49:  = /usr/lib64/lustre/tests/replay-single.sh:2091:test_70b()
      07:07:49:  = /usr/lib64/lustre/tests/test-framework.sh:5020:run_one()
      07:07:49:  = /usr/lib64/lustre/tests/test-framework.sh:5057:run_one_logged()
      07:07:49:  = /usr/lib64/lustre/tests/test-framework.sh:4907:run_test()
      07:07:49:  = /usr/lib64/lustre/tests/replay-single.sh:2102:main()
      

      2. The test does fail in a way that Maloo can recognize. So, autotest times the test out. In the test reports below, it looks like test 70b never ran, but that the test suite failed. In the Maloo report, 93/93 tests pass, but clearly not all the replay-single tests were run and looking at the suite_stdout, we see the error message above.

      3. No logs are collected to analyze this failure.

      This test has failed in this way five times this month:
      2015-07-10 08:11:47 - https://testing.hpdd.intel.com/test_sets/a941c760-2725-11e5-bc86-5254006e85c2
      2015-07-11 21:00:38 - https://testing.hpdd.intel.com/test_sets/ee8a4f52-2858-11e5-ba19-5254006e85c2
      2015-07-20 14:23:34 - https://testing.hpdd.intel.com/test_sets/cff07ed6-2f33-11e5-92dd-5254006e85c2
      2015-07-21 16:12:58 - https://testing.hpdd.intel.com/test_sets/3fadf97a-300f-11e5-97d6-5254006e85c2
      2015-07-31 06:19:18 - https://testing.hpdd.intel.com/test_sets/9635fe7a-3797-11e5-9d53-5254006e85c2

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: