Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11713

parallel-scale-nfsv3 test_compilebench: timeout

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/0554ff76-ef60-11e8-bfe1-52540065bddc

      test_compilebench failed with the following error:

      Timeout occurred after 126 mins, last suite running was parallel-scale-nfsv3, restarting cluster to continue tests
      

      on OSS, found this

      [55355.821476] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == parallel-scale-nfsv3 test compilebench: compilebench 
      [55356.063925] Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test compilebench: compilebench 
      [55356.412040] Lustre: DEBUG MARKER: /usr/sbin/lctl mark .\/compilebench -D \/mnt\/lustre\/d0.parallel-scale-nfs\/d0.compilebench.9892 -i 2 -r 2 --makej 
      [55356.649288] Lustre: DEBUG MARKER: ./compilebench -D /mnt/lustre/d0.parallel-scale-nfs/d0.compilebench.9892 -i 2 -r 2 --makej [55419.035415] Lustre: 13184:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1542997349/real 1542997349] req@ffff9f6c52602a00 x1617892067390944/t0(0) o400->lustre-MDT0001-lwp-OST0000@10.9.3.110@tcp:12/10 lens 224/224 e 0 to 1 dl 1542997356 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 
      [55419.040134] Lustre: 13184:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 38 previous similar messages 
      [55419.041806] Lustre: lustre-MDT0001-lwp-OST0000: Connection to lustre-MDT0001 (at 10.9.3.110@tcp) was lost; in progress operations using this service will wait for recovery to complete 
      [55419.044455] Lustre: Skipped 39 previous similar messages 
      [55457.070142] Lustre: lustre-OST0007: haven't heard from client lustre-MDT0003-mdtlov_UUID (at 10.9.3.110@tcp) in 50 seconds. I think it's dead, and I am evicting it. exp ffff9f6c8da2d400, cur 1542997394 expire 1542997364 last 1542997344 
      [55457.073608] Lustre: Skipped 7 previous similar messages 
      [55494.045571] Lustre: 13182:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1542997431/real 1542997431] req@ffff9f6caf1eea00 x1617892067398112/t0(0) o38->lustre-MDT0001-lwp-OST0000@10.9.3.110@tcp:12/10 lens 520/544 e 0 to 1 dl 1542997456 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1 
      [55494.050771] Lustre: 13182:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 175 previous similar messages 
      [55494.606938] Lustre: lustre-OST0004: haven't heard from client lustre-MDT0003-mdtlov_UUID (at 10.9.3.110@tcp) in 86 seconds. I think it's dead, and I am evicting it. exp ffff9f6c91cb2400, cur 1542997432 expire 1542997366 last 1542997346 
      [55494.610792] Lustre: Skipped 6 previous similar messages 
      [55501.045420] LustreError: 166-1: MGC10.9.3.109@tcp: Connection to MGS (at 10.9.3.109@tcp) was lost; in progress operations using this service will fail <ConMan> Console [trevis-54vm8] disconnected from <trevis-54:6007> at 11-23 18:24.
      
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      parallel-scale-nfsv3 test_compilebench - Timeout occurred after 126 mins, last suite running was parallel-scale-nfsv3, restarting cluster to continue tests

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: