[LU-16380] conf-sanity test_108b: timeout at read, write and append Created: 10/Dec/22  Updated: 12/Dec/23  Resolved: 21/Jan/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: None

Attachments: Zip Archive lustre-log-crash 2.log.zip    
Issue Links:
Related
is related to LU-15643 do not loop on OI Scrub on same FID Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/9a2819f9-34e0-4cd6-9930-78d2ee19929c

test_108b failed with the following error:

Timeout occurred after 682 minutes, last suite running was conf-sanity

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
conf-sanity test_108b - Timeout occurred after 682 minutes, last suite running was conf-sanity



 Comments   
Comment by Alex Zhuravlev [ 11/Dec/22 ]

hitting this locally as well. the last activity in the log is:

[ 7920.470684] Lustre: 406340:0:(client.c:1485:after_reply()) @@@ resending request on EINPROGRESS req@000000009d60d042 x1751967146795584/t0(0) o101->lustre-MDT0000-mdc-ffff9f397275d000@0@lo:12/10 lens 576/224 e 0 to 0 dl 1670806351 ref 2 fl Rpc:RQU/2/0 rc 0/-115 job:'sha1sum.0'
[ 7971.670605] Lustre: 406340:0:(client.c:1485:after_reply()) @@@ resending request on EINPROGRESS req@000000009d60d042 x1751967146810496/t0(0) o101->lustre-MDT0000-mdc-ffff9f397275d000@0@lo:12/10 lens 576/224 e 0 to 0 dl 1670806403 ref 2 fl Rpc:RQU/2/0 rc 0/-115 job:'sha1sum.0'
[ 8022.870656] Lustre: 406340:0:(client.c:1485:after_reply()) @@@ resending request on EINPROGRESS req@000000009d60d042 x1751967146826560/t0(0) o101->lustre-MDT0000-mdc-ffff9f397275d000@0@lo:12/10 lens 576/224 e 0 to 0 dl 1670806454 ref 2 fl Rpc:RQU/2/0 rc 0/-115 job:'sha1sum.0'
[ 8125.270637] Lustre: 406340:0:(client.c:1485:after_reply()) @@@ resending request on EINPROGRESS req@000000009d60d042 x1751967146856064/t0(0) o101->lustre-MDT0000-mdc-ffff9f397275d000@0@lo:12/10 lens 576/224 e 0 to 0 dl 1670806556 ref 2 fl Rpc:RQU/2/0 rc 0/-115 job:'sha1sum.0'

LU-15643 osd-ldiskfs: don't trigger scrub on irreparable FIDs – can be related?

Comment by Alex Zhuravlev [ 13/Dec/22 ]
COMMIT          TESTED  PASSED  FAILED          COMMIT DESCRIPTION
4c0c01e29c      28      27      1       BAD     LU-10391 lnet: change lnet_find_best_lpni to handle large NIDs
558784caad      10      9       1       BAD     LU-15643 osd-ldiskfs: don't trigger scrub on irreparable FIDs
c74c630ff7      30      30      0       GOOD    LU-16317 build: dkms build requires flex, bison and libmount-devel
Comment by Alex Zhuravlev [ 23/Dec/22 ]

laisiyao if needed I can try to reproduce with specific PTLDEBUG or a debugging patch, it's not frequent, but happens (~6% of runs)

Comment by Lai Siyao [ 23/Dec/22 ]

It'll be great if you can capture debug logs with "trace lfsck inode info" enabled. I'm testing in my local system too, but haven't reproduced yet.

Comment by Alex Zhuravlev [ 23/Dec/22 ]

sure, will do

Comment by Alex Zhuravlev [ 23/Dec/22 ]

attached, if this is not what you need, ping again.

Comment by Gerrit Updater [ 26/Dec/22 ]

"Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49514
Subject: LU-16380 osd-ldiskfs: race in OI mapping
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5d57c11758229071ca481f8ccb9cb6142c2b8993

Comment by Lai Siyao [ 26/Dec/22 ]

Alex, thanks, patch uploaded.

Comment by Gerrit Updater [ 19/Jan/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49514/
Subject: LU-16380 osd-ldiskfs: race in OI mapping
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 43fe6e51804f8fb4cca4445be576233595e27b42

Comment by Peter Jones [ 21/Jan/23 ]

Landed for 2.16

Generated at Sat Feb 10 03:26:31 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.