Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1890

Test failure on test suite recovery-small, subtest test_101

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.3.0
    • None
    • 3
    • 4143

    Description

      This issue was created by maloo for yujian <yujian@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/ae7cd73a-f760-11e1-8b95-52540035b04c.

      Lustre Build: http://build.whamcloud.com/job/lustre-b2_3/12

      The sub-test test_101 failed with the following error:

      test failed to respond and timed out

      Info required for matching: recovery-small 101

      MGS/MDS Nodes: fat-intel-3vm3(10.10.4.88), fat-intel-3vm7(10.10.4.92)
      OSS Nodes: fat-intel-3vm4(10.10.4.89), fat-intel-3vm8(10.10.4.93)
      Client Nodes: fat-intel-3vm1(10.10.4.86), fat-intel-3vm5(10.10.4.90), fat-intel-3vm6(10.10.4.91)

      Syslog on OSS fat-intel-3vm8(10.10.4.93):

      Sep  4 18:14:34 fat-intel-3vm8 rshd[4135]: root@fat-intel-3vm1.lab.whamcloud.com as root: cmd='(PATH=$PATH:/usr/lib64/lustre/utils:/usr/lib64/lustre/tests:/sbin:/usr/sbin; cd /usr/lib64/lustre/tests; LUSTRE="/usr/lib64/lustre" sh -c "e2label /dev/lvm-OSS/P7 2>/dev/null");echo XXRETCODE:$?'
      Sep  4 18:14:35 fat-intel-3vm8 kernel: Lustre: lustre-OST0006: Will be in recovery for at least 1:00, or until 4 clients reconnect
      Sep  4 18:14:39 fat-intel-3vm8 kernel: Lustre: lustre-OST0006: Recovery over after 0:04, of 4 clients 4 recovered and 0 were evicted.
      Sep  4 18:14:39 fat-intel-3vm8 kernel: Lustre: lustre-OST0006: received MDS connection from 10.10.4.92@tcp
      Sep  4 18:18:07 fat-intel-3vm8 kernel: Lustre: 2075:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1346807862/real 13468078
      62]  req@ffff880037705800 x1412229540872552/t0(0) o250->MGC10.10.4.88@tcp@10.10.4.88@tcp:26/25 lens 400/544 e 0 to 1 dl 1346807887 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Sep  4 18:18:07 fat-intel-3vm8 kernel: Lustre: 2075:0:(client.c:1917:ptlrpc_expire_one_request()) Skipped 8 previous similar messages
      Sep  4 18:27:07 fat-intel-3vm8 kernel: Lustre: 2075:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1346808402/real 1346808402]  req@ffff88007c723400 x1412229540872570/t0(0) o250->MGC10.10.4.88@tcp@10.10.4.88@tcp:26/25 lens 400/544 e 0 to 1 dl 1346808427 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Sep  4 18:27:07 fat-intel-3vm8 kernel: Lustre: 2075:0:(client.c:1917:ptlrpc_expire_one_request()) Skipped 17 previous similar messages
      Sep  4 18:37:37 fat-intel-3vm8 kernel: Lustre: 2075:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1346809032/real 1346809032]  req@ffff88007c723c00 x1412229540872591/t0(0) o250->MGC10.10.4.88@tcp@10.10.4.88@tcp:26/25 lens 400/544 e 0 to 1 dl 1346809057 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Sep  4 18:37:37 fat-intel-3vm8 kernel: Lustre: 2075:0:(client.c:1917:ptlrpc_expire_one_request()) Skipped 20 previous similar messages
      Sep  4 18:48:07 fat-intel-3vm8 kernel: Lustre: 2075:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1346809662/real 13468096
      62]  req@ffff8800709db800 x1412229540872612/t0(0) o250->MGC10.10.4.88@tcp@10.10.4.88@tcp:26/25 lens 400/544 e 0 to 1 dl 1346809687 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Sep  4 18:48:07 fat-intel-3vm8 kernel: Lustre: 2075:0:(client.c:1917:ptlrpc_expire_one_request()) Skipped 20 previous similar messages
      Sep  4 18:58:37 fat-intel-3vm8 kernel: Lustre: 2075:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1346810292/real 13468102
      92]  req@ffff880037705c00 x1412229540872633/t0(0) o250->MGC10.10.4.88@tcp@10.10.4.88@tcp:26/25 lens 400/544 e 0 to 1 dl 1346810317 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Sep  4 18:58:37 fat-intel-3vm8 kernel: Lustre: 2075:0:(client.c:1917:ptlrpc_expire_one_request()) Skipped 20 previous similar messages
      Sep  4 19:04:34 fat-intel-3vm8 kernel: Lustre: lustre-OST0004: haven't heard from client 85be9246-99f9-3b19-5f81-ad80152fd420 (at 10.10.4.91@tcp) in 47 seconds. I think it's dead, and I am evicting it. exp ffff88007254fc00, cur 1346810674 expire 1346810644 last 1346810627
      Sep  4 19:04:34 fat-intel-3vm8 kernel: Lustre: lustre-OST0004: haven't heard from client f75916a5-eaec-360e-dafc-5ebb1448c443 (at 10.10.4.90@tcp) in 34 seconds. I think it's dead, and I am evicting it. exp ffff88007598b000, cur 1346810674 expire 1346810644 last 1346810640
      Sep  4 19:04:39 fat-intel-3vm8 kernel: Lustre: lustre-OST0000: haven't heard from client 85be9246-99f9-3b19-5f81-ad80152fd420 (at 10.10.4.91@tcp) in 52 seconds. I think it's dead, and I am evicting it. exp ffff8800732a6000, cur 1346810679 expire 1346810649 last 1346810627
      Sep  4 19:04:39 fat-intel-3vm8 kernel: Lustre: Skipped 10 previous similar messages
      Sep  4 19:04:40 fat-intel-3vm8 kernel: connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4298439482, last ping 4298444482, now 4298449482
      Sep  4 19:04:40 fat-intel-3vm8 kernel: connection1:0: detected conn error (1011)
      Sep  4 19:04:41 fat-intel-3vm8 iscsid: Kernel reported iSCSI connection 1:0 error (1011 - ISCSI_ERR_CONN_FAILED: iSCSI connection failed) state (3)
      Sep  4 19:05:38 fat-intel-3vm8 kernel: MTRR variable ranges enabled:
      Sep  4 19:05:38 fat-intel-3vm8 kernel:  0 base 00E0000000 mask FFE0000000 uncachable
      

      Syslog on client fat-intel-3vm6(10.10.4.91):

      Sep  4 18:03:35 fat-intel-3vm6 kernel: Lustre: DEBUG MARKER: == recovery-small test 101: IR: Make sure IR works w/o normal recovery == 18:03:35 (1346807015)
      Sep  4 18:04:07 fat-intel-3vm6 kernel: Lustre: lustre-OST0001-osc-ffff8800749ac400: Connection to lustre-OST0001 (at 10.10.4.89@tcp) was lost; in progress operations using this
       service will wait for recovery to complete
      Sep  4 18:04:07 fat-intel-3vm6 kernel: Lustre: Skipped 3 previous similar messages
      Sep  4 18:05:07 fat-intel-3vm6 kernel: LustreError: 11-0: an error occurred while communicating with 10.10.4.93@tcp. The ost_connect operation failed with -19
      Sep  4 18:06:17 fat-intel-3vm6 kernel: LustreError: 11-0: an error occurred while communicating with 10.10.4.93@tcp. The ost_connect operation failed with -19
      Sep  4 18:06:17 fat-intel-3vm6 kernel: LustreError: Skipped 28 previous similar messages
      Sep  4 18:06:24 fat-intel-3vm6 kernel: Lustre: lustre-OST0000-osc-ffff8800749ac400: Connection restored to lustre-OST0000 (at 10.10.4.93@tcp)
      Sep  4 18:06:24 fat-intel-3vm6 kernel: Lustre: Skipped 6 previous similar messages
      Sep  4 18:07:47 fat-intel-3vm6 kernel: Lustre: 2131:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request  sent has failed due to network error: [sent 1346807267/real 13468
      07267]  req@ffff880074801800 x1412225718819426/t0(0) o8->lustre-OST0002-osc-ffff8800749ac400@10.10.4.89@tcp:28/4 lens 400/544 e 0 to 1 dl 1346807293 ref 1 fl Rpc:XN/0/ffffffff 
      rc 0/-1
      Sep  4 18:07:47 fat-intel-3vm6 kernel: Lustre: 2131:0:(client.c:1917:ptlrpc_expire_one_request()) Skipped 159 previous similar messages
      Sep  4 18:08:27 fat-intel-3vm6 kernel: LustreError: 11-0: an error occurred while communicating with 10.10.4.93@tcp. The ost_connect operation failed with -11
      Sep  4 18:08:27 fat-intel-3vm6 kernel: LustreError: Skipped 49 previous similar messages
      Sep  4 18:13:02 fat-intel-3vm6 kernel: LustreError: 11-0: an error occurred while communicating with 10.10.4.93@tcp. The ost_connect operation failed with -11
      Sep  4 18:13:02 fat-intel-3vm6 kernel: LustreError: Skipped 51 previous similar messages
      Sep  4 19:04:28 fat-intel-3vm6 kernel: MTRR variable ranges enabled:
      Sep  4 19:04:28 fat-intel-3vm6 kernel:  0 base 00E0000000 mask FFE0000000 uncachable
      

      Syslog on client fat-intel-3vm5(10.10.4.90):

      Sep  4 18:03:35 fat-intel-3vm5 kernel: Lustre: DEBUG MARKER: == recovery-small test 101: IR: Make sure IR works w/o normal recovery == 18:03:35 (1346807015)
      Sep  4 18:04:10 fat-intel-3vm5 kernel: Lustre: lustre-OST0001-osc-ffff880037556c00: Connection to lustre-OST0001 (at 10.10.4.89@tcp) was lost; in progress operations using this
       service will wait for recovery to complete
      Sep  4 18:04:10 fat-intel-3vm5 kernel: Lustre: Skipped 3 previous similar messages
      Sep  4 18:05:20 fat-intel-3vm5 kernel: LustreError: 11-0: an error occurred while communicating with 10.10.4.93@tcp. The ost_connect operation failed with -19
      Sep  4 18:06:24 fat-intel-3vm5 kernel: Lustre: lustre-OST0000-osc-ffff880037556c00: Connection restored to lustre-OST0000 (at 10.10.4.93@tcp)
      Sep  4 18:06:24 fat-intel-3vm5 kernel: Lustre: Skipped 6 previous similar messages
      Sep  4 18:07:30 fat-intel-3vm5 kernel: LustreError: 11-0: an error occurred while communicating with 10.10.4.93@tcp. The ost_connect operation failed with -19
      Sep  4 18:07:30 fat-intel-3vm5 kernel: LustreError: Skipped 37 previous similar messages
      Sep  4 18:10:06 fat-intel-3vm5 kernel: Lustre: 2138:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1346807380/real 13468073
      80]  req@ffff8800794cd000 x1412225718819590/t0(0) o8->lustre-OST0003-osc-ffff880037556c00@10.10.4.89@tcp:28/4 lens 400/544 e 0 to 1 dl 1346807406 ref 1 fl Rpc:XN/0/ffffffff rc 
      0/-1
      Sep  4 18:10:06 fat-intel-3vm5 kernel: Lustre: 2138:0:(client.c:1917:ptlrpc_expire_one_request()) Skipped 173 previous similar messages
      Sep  4 18:12:20 fat-intel-3vm5 kernel: LustreError: 11-0: an error occurred while communicating with 10.10.4.93@tcp. The ost_connect operation failed with -19
      Sep  4 18:12:20 fat-intel-3vm5 kernel: LustreError: Skipped 70 previous similar messages
      Sep  4 19:04:42 fat-intel-3vm5 kernel: MTRR variable ranges enabled:
      Sep  4 19:04:42 fat-intel-3vm5 kernel:  0 base 00E0000000 mask FFE0000000 uncachable
      

      Syslog on client fat-intel-3vm1(10.10.4.86):

      Sep  4 18:03:35 fat-intel-3vm1 kernel: Lustre: DEBUG MARKER: == recovery-small test 101: IR: Make sure IR works w/o normal recovery == 18:03:35 (1346807015)
      Sep  4 18:05:08 fat-intel-3vm1 kernel: LustreError: 11-0: an error occurred while communicating with 10.10.4.93@tcp. The ost_connect operation failed with -19
      Sep  4 18:05:08 fat-intel-3vm1 kernel: LustreError: Skipped 6 previous similar messages
      Sep  4 18:06:54 fat-intel-3vm1 kernel: Lustre: 3295:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1346807188/real 13468071
      88]  req@ffff88002a4e5000 x1412225453747374/t0(0) o8->lustre-OST0005-osc-ffff880047d0f000@10.10.4.89@tcp:28/4 lens 400/544 e 0 to 1 dl 1346807213 ref 1 fl Rpc:XN/0/ffffffff rc 
      0/-1
      Sep  4 18:06:54 fat-intel-3vm1 kernel: Lustre: 3295:0:(client.c:1917:ptlrpc_expire_one_request()) Skipped 119 previous similar messages
      Sep  4 18:11:35 fat-intel-3vm1 kernel: Lustre: lustre-OST0004-osc-ffff880047d0f000: Connection restored to lustre-OST0004 (at 10.10.4.93@tcp)
      Sep  4 18:11:35 fat-intel-3vm1 kernel: Lustre: Skipped 5 previous similar messages
      Sep  4 19:05:09 fat-intel-3vm1 kernel: MTRR variable ranges enabled:
      Sep  4 19:05:09 fat-intel-3vm1 kernel:  0 base 00E0000000 mask FFE0000000 uncachable
      

      Attachments

        Activity

          People

            laisiyao Lai Siyao
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: