Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4241

Test failure on test recovery-small test_101: import is not in FULL state

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.6.0, Lustre 2.5.1, Lustre 2.7.0, Lustre 2.8.0, Lustre 2.10.0, Lustre 2.10.5
    • None
    • 3
    • 11545

    Description

      This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/af3aacb8-48e7-11e3-b8b9-52540035b04c.

      The sub-test test_101 failed with the following error:

      import is not in FULL state

      Info required for matching: recovery-small 101

      Attachments

        Issue Links

          Activity

            [LU-4241] Test failure on test recovery-small test_101: import is not in FULL state
            di.wang Di Wang (Inactive) added a comment - - edited

            https://testing.hpdd.intel.com/test_sets/33fdd074-3b3a-11e5-95fa-5254006e85c2
            On OST, I saw this

            05:01:24:Lustre: lustre-OST0000: Will be in recovery for at least 1:00, or until 3 clients reconnect
            05:01:24:Lustre: lustre-OST0000: Client d57edd96-7283-285b-d594-fd90ad5b6b79 (at 10.1.4.207@tcp) reconnecting, waiting for 3 clients in recovery for 0:46
            05:01:24:Lustre: lustre-OST0000: Client lustre-MDT0000-mdtlov_UUID (at 10.1.4.201@tcp) reconnecting, waiting for 3 clients in recovery for 0:44
            05:01:24:Lustre: lustre-OST0000: Client d57edd96-7283-285b-d594-fd90ad5b6b79 (at 10.1.4.207@tcp) reconnecting, waiting for 3 clients in recovery for 0:03
            05:01:24:Lustre: lustre-OST0000: Client lustre-MDT0000-mdtlov_UUID (at 10.1.4.201@tcp) reconnecting, waiting for 3 clients in recovery for 0:01
            05:01:24:Lustre: lustre-OST0000: recovery is timed out, evict stale exports
            05:01:24:Lustre: lustre-OST0000: disconnecting 1 stale clients
            05:01:24:Lustre: 16475:0:(ldlm_lib.c:1883:target_recovery_overseer()) recovery is aborted by hard timeout
            05:01:24:Lustre: 16475:0:(ldlm_lib.c:1893:target_recovery_overseer()) recovery is aborted, evict exports in recovery
            05:01:24:Lustre: 16475:0:(ldlm_lib.c:1893:target_recovery_overseer()) Skipped 2 previous similar messages
            05:01:24:Lustre: lustre-OST0000: Recovery over after 1:33, of 3 clients 2 recovered and 1 was evicted.
            05:01:24:Lustre: lustre-OST0000: deleting orphan objects from 0x0:11395 to 0x0:11457
            05:01:24:Lustre: DEBUG MARKER: /usr/sbin/lctl mark osc.lustre-OST0000-osc-*.ost_server_uuid in FULL state after 99 sec
            05:01:24:Lustre: DEBUG MARKER: osc.lustre-OST0000-osc-*.ost_server_uuid in FULL state after 99 sec
            05:01:24:Lustre: DEBUG MARKER: /usr/sbin/lctl mark  rpc : @@@@@@ FAIL: can\'t put import for osc.lustre-OST0000-osc-*.ost_server_uuid into FULL state after 1475 sec, have DISCONN 
            05:01:24:Lustre: DEBUG MARKER: rpc : @@@@@@ FAIL: can't put import for osc.lustre-OST0000-osc-*.ost_server_uuid into FULL state after 1475 sec, have DISCONN
            
            di.wang Di Wang (Inactive) added a comment - - edited https://testing.hpdd.intel.com/test_sets/33fdd074-3b3a-11e5-95fa-5254006e85c2 On OST, I saw this 05:01:24:Lustre: lustre-OST0000: Will be in recovery for at least 1:00, or until 3 clients reconnect 05:01:24:Lustre: lustre-OST0000: Client d57edd96-7283-285b-d594-fd90ad5b6b79 (at 10.1.4.207@tcp) reconnecting, waiting for 3 clients in recovery for 0:46 05:01:24:Lustre: lustre-OST0000: Client lustre-MDT0000-mdtlov_UUID (at 10.1.4.201@tcp) reconnecting, waiting for 3 clients in recovery for 0:44 05:01:24:Lustre: lustre-OST0000: Client d57edd96-7283-285b-d594-fd90ad5b6b79 (at 10.1.4.207@tcp) reconnecting, waiting for 3 clients in recovery for 0:03 05:01:24:Lustre: lustre-OST0000: Client lustre-MDT0000-mdtlov_UUID (at 10.1.4.201@tcp) reconnecting, waiting for 3 clients in recovery for 0:01 05:01:24:Lustre: lustre-OST0000: recovery is timed out, evict stale exports 05:01:24:Lustre: lustre-OST0000: disconnecting 1 stale clients 05:01:24:Lustre: 16475:0:(ldlm_lib.c:1883:target_recovery_overseer()) recovery is aborted by hard timeout 05:01:24:Lustre: 16475:0:(ldlm_lib.c:1893:target_recovery_overseer()) recovery is aborted, evict exports in recovery 05:01:24:Lustre: 16475:0:(ldlm_lib.c:1893:target_recovery_overseer()) Skipped 2 previous similar messages 05:01:24:Lustre: lustre-OST0000: Recovery over after 1:33, of 3 clients 2 recovered and 1 was evicted. 05:01:24:Lustre: lustre-OST0000: deleting orphan objects from 0x0:11395 to 0x0:11457 05:01:24:Lustre: DEBUG MARKER: /usr/sbin/lctl mark osc.lustre-OST0000-osc-*.ost_server_uuid in FULL state after 99 sec 05:01:24:Lustre: DEBUG MARKER: osc.lustre-OST0000-osc-*.ost_server_uuid in FULL state after 99 sec 05:01:24:Lustre: DEBUG MARKER: /usr/sbin/lctl mark rpc : @@@@@@ FAIL: can\'t put import for osc.lustre-OST0000-osc-*.ost_server_uuid into FULL state after 1475 sec, have DISCONN 05:01:24:Lustre: DEBUG MARKER: rpc : @@@@@@ FAIL: can't put import for osc.lustre-OST0000-osc-*.ost_server_uuid into FULL state after 1475 sec, have DISCONN
            di.wang Di Wang (Inactive) added a comment - hit again https://testing.hpdd.intel.com/test_sets/c6cd44f2-3b53-11e5-95fa-5254006e85c2
            jamesanunez James Nunez (Inactive) added a comment - - edited Hit this issue again: 2015-07-10 02:34:57 - https://testing.hpdd.intel.com/test_sets/c778c684-26c1-11e5-8cf5-5254006e85c2 2015-07-30 16:30:15 - https://testing.hpdd.intel.com/test_sets/b7c5dba2-36f0-11e5-84a8-5254006e85c2
            yujian Jian Yu added a comment -

            While verifying patch http://review.whamcloud.com/#/c/8729/ on Lustre b2_5 branch, the same failure occurred on review-zfs test session:
            https://maloo.whamcloud.com/test_sets/91181414-78b7-11e3-809c-52540035b04c

            yujian Jian Yu added a comment - While verifying patch http://review.whamcloud.com/#/c/8729/ on Lustre b2_5 branch, the same failure occurred on review-zfs test session: https://maloo.whamcloud.com/test_sets/91181414-78b7-11e3-809c-52540035b04c

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: