[LU-11152] sanity test_133g: ost1 find /proc/fs/lustre/ /proc/sys/lnet/ /proc/sys/lustre/ failed Created: 17/Jul/18  Updated: 19/Dec/19  Resolved: 19/Dec/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Andreas Dilger
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-9700 Interop 2.9.0<->master sanity test_13... Open
is related to LU-10038 sanity test 133g fails with “ '$'mds1... Resolved
is related to LU-11552 improper FMR/FastReg pool cleanup Reopened
is related to LU-11735 o2iblnd - bad check for fmr Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/28765bb2-8831-11e8-9e83-52540065bddc

test_133g failed with the following error:

$'ost1 find /proc/fs/lustre/n/sys/fs/lustre/n/sys/kernel/debug/lnet/n/sys/kernel/debug/lustre/ failed'

This seems to be only failing on review-dne-zfs-part-1 and review-dne-part-1.

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_133g - $'ost1 find /proc/fs/lustre/n/sys/fs/lustre/n/sys/kernel/debug/lnet/n/sys/kernel/debug/lustre/ failed'



 Comments   
Comment by Andreas Dilger [ 17/Jul/18 ]

This is similar to LU-9700, but no interop involved. LU-10038 was reported fixed for 2.11, but this problem is still being seen (about once a day in the past two weeks).

Comment by Andreas Dilger [ 17/Jul/18 ]

Also seeing some cases like:

trevis-57vm8: find: ‘/proc/fs/lustre/obdfilter/lustre-OST0000/exports/10.9.1.249@tcp’: No such file or directory
trevis-57vm8: find: ‘/proc/fs/lustre/obdfilter/lustre-OST0005/exports/10.9.1.249@tcp’: No such file or directory
trevis-57vm8: find: ‘/proc/fs/lustre/obdfilter/lustre-OST0006/exports/10.9.1.249@tcp’: No such file or directory
trevis-57vm8: find: ‘/proc/fs/lustre/obdfilter/lustre-OST0007/exports/10.9.1.249@tcp’: No such file or directory
Comment by John Hammond [ 03/Aug/18 ]

This is a normal race which we handle by passing the -ignore_readdir_race flag to find. See LU-10038. However, in find 4.5.11 which is included in CentOS 7.3-7.5 the -ignore_readdir_race flag is ineffective. See https://bugs.centos.org/view.php?id=13685.

Comment by Gerrit Updater [ 03/Aug/18 ]

John L. Hammond (jhammond@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/32934
Subject: LU-11152 test: work around find bug in sanity 133[fg]
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: aa345cb2e7eeba0699157a1a16e242e9a1a54a4e

Comment by Gerrit Updater [ 15/Aug/18 ]

Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/32934/
Subject: LU-11152 test: work around find bug in sanity 133[fg]
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: edd1b078941b9cc97e39e0b54b9daabe0e6f2792

Comment by Peter Jones [ 15/Aug/18 ]

Landed for 2.12

Comment by Gerrit Updater [ 20/Oct/18 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch:
https://review.whamcloud.com/33408
Subject: LU-11152 lnd: correctly cleanup the FMR/FastReg pools
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5ea8405a7be061bd485c2f94c09d14ccfbac48b0

Comment by Gerrit Updater [ 02/Nov/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch
https://review.whamcloud.com/33408/
Subject: LU-11152 lnd: test fpo_fmr_poool pointer instead of special bool
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 9b790ba0f5606c0a91563828fa43f5e4ae210425

Comment by Gerrit Updater [ 06/Dec/18 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch:
https://review.whamcloud.com/33802
Subject: Revert "LU-11152 lnd: test fpo_fmr_poool pointer instead of special bool"
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b169980b805b78df9b05e10079d27c4a6d3dbbc1

Comment by Gerrit Updater [ 08/Dec/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch
https://review.whamcloud.com/33802/
Subject: Revert "LU-11152 lnd: test fpo_fmr_poool pointer instead of special bool"
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: a65d072fa45fc90c2fc74b61d214de79c0bf33e5

Comment by Peter Jones [ 08/Dec/18 ]

Reverting because this has introduced LU-11735

Comment by Cory Spitz [ 18/Dec/19 ]

It seems that we have a bit of mix-up here.

9b790ba0f5606c0a91563828fa43f5e4ae210425 (LU-11152 lnd: test fpo_fmr_poool pointer instead of special bool) was landed for "LU-11152", but it was really intended for LU-11552. The FMR pool stuff shouldn't have anything to do with this ticket.

Then it was reverted with:
commit a65d072fa45fc90c2fc74b61d214de79c0bf33e5
Author: Amir Shehata <ashehata@whamcloud.com>
Date: Thu Dec 6 20:52:22 2018 +0000

Revert "LU-11152 lnd: test fpo_fmr_poool pointer instead of special bool"

This reverts commit 9b790ba0f5606c0a91563828fa43f5e4ae210425.

And then this ticket was implicated.

I don't know how to clean up this mess, but I'm going to post a duplicate comment in LU-11552. Maybe we don't need to do anything special other than to add a helpful comment to the commit message if/when LU-11735 or LU-11552 are addressed.

Comment by Andreas Dilger [ 19/Dec/19 ]

Cory, thanks for pointing out the discrepancy here. Too bad this wasn't caught before these patches were landed, but it isn't possible to change the commit comments in Git.

Generated at Sat Feb 10 02:41:23 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.