[LU-11713] parallel-scale-nfsv3 test_compilebench: timeout Created: 29/Nov/18  Updated: 29/Nov/18

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/0554ff76-ef60-11e8-bfe1-52540065bddc

test_compilebench failed with the following error:

Timeout occurred after 126 mins, last suite running was parallel-scale-nfsv3, restarting cluster to continue tests

on OSS, found this

[55355.821476] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == parallel-scale-nfsv3 test compilebench: compilebench 
[55356.063925] Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test compilebench: compilebench 
[55356.412040] Lustre: DEBUG MARKER: /usr/sbin/lctl mark .\/compilebench -D \/mnt\/lustre\/d0.parallel-scale-nfs\/d0.compilebench.9892 -i 2 -r 2 --makej 
[55356.649288] Lustre: DEBUG MARKER: ./compilebench -D /mnt/lustre/d0.parallel-scale-nfs/d0.compilebench.9892 -i 2 -r 2 --makej [55419.035415] Lustre: 13184:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1542997349/real 1542997349] req@ffff9f6c52602a00 x1617892067390944/t0(0) o400->lustre-MDT0001-lwp-OST0000@10.9.3.110@tcp:12/10 lens 224/224 e 0 to 1 dl 1542997356 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 
[55419.040134] Lustre: 13184:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 38 previous similar messages 
[55419.041806] Lustre: lustre-MDT0001-lwp-OST0000: Connection to lustre-MDT0001 (at 10.9.3.110@tcp) was lost; in progress operations using this service will wait for recovery to complete 
[55419.044455] Lustre: Skipped 39 previous similar messages 
[55457.070142] Lustre: lustre-OST0007: haven't heard from client lustre-MDT0003-mdtlov_UUID (at 10.9.3.110@tcp) in 50 seconds. I think it's dead, and I am evicting it. exp ffff9f6c8da2d400, cur 1542997394 expire 1542997364 last 1542997344 
[55457.073608] Lustre: Skipped 7 previous similar messages 
[55494.045571] Lustre: 13182:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1542997431/real 1542997431] req@ffff9f6caf1eea00 x1617892067398112/t0(0) o38->lustre-MDT0001-lwp-OST0000@10.9.3.110@tcp:12/10 lens 520/544 e 0 to 1 dl 1542997456 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1 
[55494.050771] Lustre: 13182:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 175 previous similar messages 
[55494.606938] Lustre: lustre-OST0004: haven't heard from client lustre-MDT0003-mdtlov_UUID (at 10.9.3.110@tcp) in 86 seconds. I think it's dead, and I am evicting it. exp ffff9f6c91cb2400, cur 1542997432 expire 1542997366 last 1542997346 
[55494.610792] Lustre: Skipped 6 previous similar messages 
[55501.045420] LustreError: 166-1: MGC10.9.3.109@tcp: Connection to MGS (at 10.9.3.109@tcp) was lost; in progress operations using this service will fail <ConMan> Console [trevis-54vm8] disconnected from <trevis-54:6007> at 11-23 18:24.

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
parallel-scale-nfsv3 test_compilebench - Timeout occurred after 126 mins, last suite running was parallel-scale-nfsv3, restarting cluster to continue tests


Generated at Sat Feb 10 02:46:17 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.