Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1863

Test failure with MDS spontaneous rebooting (test suite sanity, subtest test_32n)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.3.0
    • 3
    • 4270

    Description

      This issue was created by maloo for Minh Diep <mdiep@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/f3243a50-f9b2-11e1-b8d8-52540035b04c.

      The sub-test test_32n failed with the following error:

      test failed to respond and timed out

      The console on the OSS shows it rebooted for no reason and conman fail to capture the issue

      Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity test 32n: open d32n\/symlink-\>tmp\/symlink-\>lustre-root ====================================== 16:04:09 (1346972649)^M
      Lustre: DEBUG MARKER: == sanity test 32n: open d32n/symlink->tmp/symlink->lustre-root ====================================== 16:04:09 (1346972649)^M
      Lustre: 3770:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1346972670/real 1346972670] req@ffff880071728c00 x1412399426961989/t0(0) o400->MGC10.10.4.222@tcp@10.10.4.222@tcp:26/25 lens 224/224 e 0 to 1 dl 1346972677 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1^M
      LustreError: 166-1: MGC10.10.4.222@tcp: Connection to MGS (at 10.10.4.222@tcp) was lost; in progress operations using this service will fail^M
      Lustre: 3768:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1346972702/real 1346972702] req@ffff880078a46400 x1412399426961991/t0(0) o250->MGC10.10.4.222@tcp@10.10.4.222@tcp:26/25 lens 400/544 e 0 to 1 dl 1346972713 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1^M
      Lustre: 3768:0:(client.c:1917:ptlrpc_expire_one_request()) Skipped 1 previous similar message^M
      Lustre: 3768:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1346972727/real 1346972729] req@ffff880074d4bc00 x1412399426961992/t0(0) o250->MGC10.10.4.222@tcp@10.10.4.222@tcp:26/25 lens 400/544 e 0 to 1 dl 1346972743 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1^M
      Lustre: 3768:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1346972777/real 1346972777] req@ffff8800791f2400 x1412399426961994/t0(0) o250->MGC10.10.4.222@tcp@10.10.4.222@tcp:26/25 lens 400/544 e 0 to 1 dl 1346972803 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1^M
      Lustre: 3768:0:(client.c:1917:ptlrpc_expire_one_request()) Skipped 1 previous similar message^M
      Lustre: lustre-OST0000: haven't heard from client lustre-MDT0000-mdtlov_UUID (at 10.10.4.222@tcp) in 235 seconds. I think it's dead, and I am evicting it. exp ffff88007178e400, cur 1346972885 expire 1346972735 last 1346972650^M
      Lustre: lustre-OST0002: haven't heard from client lustre-MDT0000-mdtlov_UUID (at 10.10.4.222@tcp) in 235 seconds. I think it's dead, and I am evicting it. exp ffff880061d38400, cur 1346972885 expire 1346972735 last 1346972650^M
      Lustre: 3768:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1346972877/real 1346972880] req@ffff8800791f2c00 x1412399426961996/t0(0) o250->MGC10.10.4.222@tcp@10.10.4.222@tcp:26/25 lens 400/544 e 0 to 1 dl 1346972913 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1^M
      Lustre: 3768:0:(client.c:1917:ptlrpc_expire_one_request()) Skipped 1 previous similar message^M
      Lustre: 3768:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1346973002/real 1346973005] req@ffff880074050800 x1412399426961999/t0(0) o250->MGC10.10.4.222@tcp@10.10.4.222@tcp:26/25 lens 400/544 e 0 to 1 dl 1346973053 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1^M
      Lustre: 3768:0:(client.c:1917:ptlrpc_expire_one_request()) Skipped 2 previous similar messages^M
      Lustre: 3768:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1346973302/real 1346973305] req@ffff880072541c00 x1412399426962003/t0(0) o250->MGC10.10.4.222@tcp@10.10.4.222@tcp:26/25 lens 400/544 e 0 to 1 dl 1346973357 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1^M
      Lustre: 3768:0:(client.c:1917:ptlrpc_expire_one_request()) Skipped 3 previous similar messages^M
      Lustre: 3768:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1346973827/real 1346973830] req@ffff88007171f800 x1412399426962010/t0(0) o250->MGC10.10.4.222@tcp@10.10.4.222@tcp:26/25 lens 400/544 e 0 to 1 dl 1346973882 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1^M
      Lustre: 3768:0:(client.c:1917:ptlrpc_expire_one_request()) Skipped 6 previous similar messages^M
      Lustre: 3768:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1346974502/real 1346974505] req@ffff880078a46800 x1412399426962019/t0(0) o250->MGC10.10.4.222@tcp@10.10.4.222@tcp:26/25 lens 400/544 e 0 to 1 dl 1346974557 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1^M
      Lustre: 3768:0:(client.c:1917:ptlrpc_expire_one_request()) Skipped 8 previous similar messages^M
      Lustre: 3768:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1346975177/real 1346975180] req@ffff88005ecfac00 x1412399426962028/t0(0) o250->MGC10.10.4.222@tcp@10.10.4.222@tcp:26/25 lens 400/544 e 0 to 1 dl 1346975232 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1^M
      Lustre: 3768:0:(client.c:1917:ptlrpc_expire_one_request()) Skipped 8 previous similar messages^M
      Lustre: 3768:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1346975852/real 1346975855] req@ffff880078a46800 x1412399426962037/t0(0) o250->MGC10.10.4.222@tcp@10.10.4.222@tcp:26/25 lens 400/544 e 0 to 1 dl 1346975907 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1^M
      Lustre: 3768:0:(client.c:1917:ptlrpc_expire_one_request()) Skipped 8 previous similar messages^M
      ^M
      <ConMan> Console [client-19vm4] disconnected from <client-19:6003> at 09-06 17:05.^M
      ^M
      <ConMan> Console [client-19vm4] connected to <client-19:6003> at 09-06 17:05.^M
      ^MPress any key to continue.^M
      ^MPress any key to continue.^M
      ^MPress any key to continue.^M
      ^MPress any key to continue.^M
      ^MPress any key to continue.^M
      [[H[[J^M

      Info required for matching: sanity 32n

      Attachments

        Issue Links

          Activity

            People

              bobijam Zhenyu Xu
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: