Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.17.0, Lustre 2.15.7
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for Marc Vef <mvef@whamcloud.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/a1bdbb1f-684a-44dd-9683-4a52e4eaaf4c
test_31 failed with the following error:
Timeout occurred after 445 minutes, last suite running was sanity-sec
Test session details:
clients: https://build.whamcloud.com/job/lustre-reviews/109661 - 5.15.0-94-generic
servers: https://build.whamcloud.com/job/lustre-reviews/109661 - 4.18.0-553.27.1.el8_lustre.x86_64
Both clients (vm1 and vm2) lost connection after mounting:
[21607.508267] Lustre: Mounted lustre-client [21607.835897] Lustre: DEBUG MARKER: mount | grep /mnt/lustre' ' [21629.086605] Lustre: 79408:0:(client.c:2358:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1734413476/real 0] req@ffff9c605474f740 x1818664259031424/t0(0) o400->lustre-OST0002-osc-ffff9c6049741800@10.240.22.182@tcp:28/4 lens 224/224 e 0 to 1 dl 1734413492 ref 2 fl Rpc:XNr/200/ffffffff rc 0/-1 job:'' uid:0 gid:0 [21629.086637] Lustre: lustre-MDT0000-mdc-ffff9c6049741800: Connection to lustre-MDT0000 (at 10.240.22.189@tcp) was lost; in progress operations using this service will wait for recovery to complete [21629.091974] Lustre: 79408:0:(client.c:2358:ptlrpc_expire_one_request()) Skipped 1 previous similar message [21629.097039] LustreError: MGC10.240.22.189@tcp: Connection to MGS (at 10.240.22.189@tcp) was lost; in progress operations using this service will fail [21634.206602] Lustre: 79408:0:(client.c:2358:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1734413481/real 0] req@ffff9c605474e700 x1818664259032448/t0(0) o400->lustre-OST0001-osc-ffff9c6049741800@10.240.22.182@tcp:28/4 lens 224/224 e 0 to 1 dl 1734413497 ref 2 fl Rpc:XNr/200/ffffffff rc 0/-1 job:'' uid:0 gid:0 [21634.212144] Lustre: 79408:0:(client.c:2358:ptlrpc_expire_one_request()) Skipped 8 previous similar messages [21639.326639] Lustre: 79408:0:(client.c:2358:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1734413486/real 0] req@ffff9c6054447a80 x1818664259034240/t0(0) o400->lustre-OST0006-osc-ffff9c6049741800@10.240.22.182@tcp:28/4 lens 224/224 e 0 to 1 dl 1734413502 ref 2 fl Rpc:XNr/200/ffffffff rc 0/-1 job:'' uid:0 gid:0 [21639.332231] Lustre: 79408:0:(client.c:2358:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [21644.446523] Lustre: 79407:0:(client.c:2358:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1734413492/real 0] req@ffff9c6053e87a80 x1818664259035776/t0(0) o400->lustre-OST0000-osc-ffff9c6049741800@10.240.22.182@tcp:28/4 lens 224/224 e 0 to 1 dl 1734413508 ref 2 fl Rpc:XNr/200/ffffffff rc 0/-1 job:'' uid:0 gid:0 [21644.452090] Lustre: 79407:0:(client.c:2358:ptlrpc_expire_one_request()) Skipped 10 previous similar messages [21797.277618] nfs: server 10.240.16.204 not responding, timed out [21831.069446] LNet: 2 peer NIs in recovery (showing 2): 10.240.22.189@tcp, 10.240.22.182@tcp [21831.071116] Lustre: 79408:0:(client.c:2358:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1734413486/real 1734413694] req@ffff9c6054445380 x1818664259033216/t0(0) o400->MGC10.240.22.189@tcp@10.240.22.189@tcp:26/25 lens 224/224 e 0 to 1 dl 1734413502 ref 1 fl Rpc:EeXNQU/200/ffffffff rc -5/-1 job:'' uid:0 gid:0 [21831.076441] Lustre: 79408:0:(client.c:2358:ptlrpc_expire_one_request()) Skipped 9 previous similar messages [21831.077739] LNetError: Unexpected error -2 connecting to 10.240.22.189@tcp at host 10.240.22.189:7988 [21832.093479] LNetError: Unexpected error -2 connecting to 10.240.22.189@tcp at host 10.240.22.189:7988 [21832.095303] LNetError: Skipped 2 previous similar messages
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity-sec test_31 - Timeout occurred after 445 minutes, last suite running was sanity-sec
Attachments
Issue Links
- duplicates
-
LU-18693 sanity-lnet: nfs: server 10.240.16.204 not responding, timed out
-
- Open
-
-
LU-18829 sanity-sec test_27ab: timeout
-
- Resolved
-
- is duplicated by
-
LU-18527 sanity-sec: test_64e timed out (client lost connection)
-
- Open
-
- is related to
-
LU-19297 sanity-sec test_72b: test_72b returned 1
-
- Open
-
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...