[LU-15755] replay-dual test_30: osp_precreate.c:966:osp_precreate_cleanup_orphans() ...cannot cleanup orphans: rc = -107 Created: 18/Apr/22  Updated: 20/Apr/22  Resolved: 20/Apr/22

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Cliff White <cwhite@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/dfd0a347-d14e-4393-84a7-997380b92ba7

System appears to have hung while in recovery after failover test, logs show:

[ 9030.199929] Lustre: DEBUG MARKER: zfs get -H -o value lustre:svname 		                           lustre-mdt1/mdt1 2>/dev/null
[ 9625.185448] Lustre: 11096:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1649038880/real 1649038880]  req@00000000a66848b2 x1729133296521408/t0(0) o6->lustre-OST0001-osc-MDT0000@10.240.44.2@tcp:28/4 lens 544/432 e 20 to 1 dl 1649039481 ref 1 fl Rpc:XQr/0/ffffffff rc 0/-1 job:'osp-syn-1-0.0'
[ 9625.185457] Lustre: lustre-OST0002-osc-MDT0000: Connection to lustre-OST0002 (at 10.240.44.2@tcp) was lost; in progress operations using this service will wait for recovery to complete
[ 9625.190075] Lustre: 11096:0:(client.c:2295:ptlrpc_expire_one_request()) Skipped 1 previous similar message
[ 9625.196741] Lustre: lustre-OST0002-osc-MDT0000: Connection restored to  (at 10.240.44.2@tcp)
[ 9630.241420] Lustre: lustre-OST0003-osc-MDT0000: Connection to lustre-OST0003 (at 10.240.44.2@tcp) was lost; in progress operations using this service will wait for recovery to complete
[ 9630.243974] Lustre: Skipped 4 previous similar messages
[ 9643.105373] LustreError: 359222:0:(osp_precreate.c:966:osp_precreate_cleanup_orphans()) lustre-OST0006-osc-MDT0000: cannot cleanup orphans: rc = -107
[ 9643.105498] Lustre: lustre-OST0000-osc-MDT0000: Connection to lustre-OST0000 (at 10.240.44.2@tcp) was lost; in progress operations using this service will wait for recovery to complete
[ 9643.107517] LustreError: 359222:0:(osp_precreate.c:966:osp_precreate_cleanup_orphans()) Skipped 9 previous similar messages
[10226.717210] Lustre: 11095:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1649039481/real 1649039481]  req@0000000036f5895c x1729133296521920/t0(0) o6->lustre-OST0002-osc-MDT0000@10.240.44.2@tcp:28/4 lens 544/432 e 20 to 1 dl 1649040082 ref 1 fl Rpc:XQr/2/ffffffff rc 0/-1 job:'osp-syn-2-0.0'
[10226.717220] Lustre: lustre-OST0001-osc-MDT0000: Connection to lustre-OST0001 (at 10.240.44.2@tcp) was lost; in progress operations using this service will wait for recovery to complete
[10226.721846] Lustre: 11095:0:(client.c:2295:ptlrpc_expire_one_request()) Skipped 47 previous similar messages
[10226.725906] Lustre: Skipped 1 previous similar message
[10226.729739] Lustre: lustre-OST0001-osc-MDT0000: Connection restored to  (at 10.240.44.2@tcp)
[10226.731108] Lustre: Skipped 6 previous similar messages
[10231.773182] Lustre: lustre-OST0006-osc-MDT0000: Connection to lustre-OST0006 (at 10.240.44.2@tcp) was lost; in progress operations using this service will wait for recovery to complete
[10231.775766] Lustre: Skipped 1 previous similar message
[10424.923835] Lustre: lustre-OST0000-osc-MDT0000: Connection to lustre-OST0000 (at 10.240.44.2@tcp) was lost; in progress operations using this service will wait for recovery to complete
[10424.923876] LustreError: 359220:0:(osp_precreate.c:966:osp_precreate_cleanup_orphans()) lustre-OST0005-osc-MDT0000: cannot cleanup orphans: rc = -107
[10424.926453] Lustre: Skipped 2 previous similar messages

Test timed out after 200+ seconds



 Comments   
Comment by Andreas Dilger [ 20/Apr/22 ]

Hi Cliff, could you please include the subtest number (test_30 in this case) into the patch summary line. That makes it easier to find related tickets.

Comment by James Nunez (Inactive) [ 20/Apr/22 ]

Closing as a duplicate of LU-15657.

Generated at Sat Feb 10 03:21:01 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.