[LU-16066] replay-dual test_30: timeout, cannot cleanup orphans: rc = -107 Created: 02/Aug/22  Updated: 21/Jun/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.16.0, Lustre 2.15.1, Lustre 2.15.2, Lustre 2.15.3
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: failing_tests

Issue Links:
Related
is related to LU-15809 replay-dual test_29: timeout llog_ver... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/abd05f73-e3f1-4f4d-8d6b-70923bb58dd5

test_30 failed with the following error:

Timeout occurred after 119 minutes, last suite running was replay-dual

This may be related with LU-15657

MDS dmesg shows:

[Mon Jul 18 05:59:21 2022] Lustre: 6599:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1658123361/real 1658123361]  req@000000008b9f0b5d x1738664899455104/t0(0) o6->lustre-OST0002-osc-MDT0000@10.240.28.246@tcp:28/4 lens 544/432 e 20 to 1 dl 1658123962 ref 1 fl Rpc:XQr/0/ffffffff rc 0/-1 job:'osp-syn-2-0.0'
[Mon Jul 18 05:59:21 2022] Lustre: lustre-OST0001-osc-MDT0000: Connection to lustre-OST0001 (at 10.240.28.246@tcp) was lost; in progress operations using this service will wait for recovery to complete
[Mon Jul 18 05:59:21 2022] Lustre: 6599:0:(client.c:2295:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
[Mon Jul 18 05:59:21 2022] Lustre: lustre-OST0002-osc-MDT0000: Connection restored to  (at 10.240.28.246@tcp)
[Mon Jul 18 05:59:22 2022] Lustre: lustre-OST0005-osc-MDT0000: Connection to lustre-OST0005 (at 10.240.28.246@tcp) was lost; in progress operations using this service will wait for recovery to complete
[Mon Jul 18 05:59:22 2022] Lustre: Skipped 1 previous similar message
[Mon Jul 18 05:59:22 2022] Lustre: lustre-OST0003-osc-MDT0000: Connection restored to  (at 10.240.28.246@tcp)
[Mon Jul 18 05:59:22 2022] Lustre: Skipped 1 previous similar message
[Mon Jul 18 05:59:35 2022] LustreError: 199852:0:(osp_precreate.c:966:osp_precreate_cleanup_orphans()) lustre-OST0005-osc-MDT0000: cannot cleanup orphans: rc = -107
[Mon Jul 18 05:59:35 2022] Lustre: lustre-OST0000-osc-MDT0000: Connection to lustre-OST0000 (at 10.240.28.246@tcp) was lost; in progress operations using this service will wait for recovery to complete
[Mon Jul 18 05:59:35 2022] LustreError: 199852:0:(osp_precreate.c:966:osp_precreate_cleanup_orphans()) Skipped 9 previous similar messages
[Mon Jul 18 05:59:35 2022] Lustre: Skipped 2 previous similar messages
[Mon Jul 18 05:59:35 2022] Lustre: lustre-OST0000-osc-MDT0000: Connection restored to  (at 10.240.28.246@tcp)
[Mon Jul 18 05:59:35 2022] Lustre: Skipped 3 previous similar messages
[Mon Jul 18 06:09:22 2022] Lustre: 6599:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1658123963/real 1658123963]  req@000000008b9f0b5d x1738664899455104/t0(0) o6->lustre-OST0002-osc-MDT0000@10.240.28.246@tcp:28/4 lens 544/432 e 20 to 1 dl 1658124564 ref 1 fl Rpc:XQr/2/ffffffff rc 0/-1 job:'osp-syn-2-0.0'
[Mon Jul 18 06:09:22 2022] Lustre: lustre-OST0001-osc-MDT0000: Connection to lustre-OST0001 (at 10.240.28.246@tcp) was lost; in progress operations using this service will wait for recovery to complete
[Mon Jul 18 06:09:22 2022] Lustre: 6599:0:(client.c:2295:ptlrpc_expire_one_request()) Skipped 47 previous similar messages
[Mon Jul 18 06:09:22 2022] Lustre: lustre-OST0001-osc-MDT0000: Connection restored to  (at 10.240.28.246@tcp)
[Mon Jul 18 06:12:22 2022] Lustre: lustre-OST0000-osc-MDT0000: Connection to lustre-OST0000 (at 10.240.28.246@tcp) was lost; in progress operations using this service will wait for recovery to complete

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
replay-dual test_30 - Timeout occurred after 119 minutes, last suite running was replay-dual



 Comments   
Comment by Chris Horn [ 12/May/23 ]

+1 on master - https://testing.whamcloud.com/test_sets/e92cca2e-39ea-459e-9528-c4f2226cf840

Comment by Andreas Dilger [ 13/Jun/23 ]

Failed 17x on master and b2_15 in the past week.

Generated at Sat Feb 10 03:23:42 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.