[LU-14938] fail_abort() in t-f should take care of MDTs Created: 16/Aug/21  Updated: 21/Jan/23  Resolved: 27/Oct/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Upstream
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Minor
Reporter: Alex Zhuravlev Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

currently fail_abort() in t-f ensures that all the clients are back, using lfs df: the first df ensures that the clients recognize its eviction, initiates reconnect and the second one verifies the clients are really back.
this doesn't work with MDT and the consequence is that such an evicted state is deteted by another MDT(s) at random point, results in EIO and breaks testing in constructions like mkdir $DIR/$tdir || error ...
IMO, fail_abort() should do something similar for MDTs.



 Comments   
Comment by Gerrit Updater [ 16/Aug/21 ]

"Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/44671
Subject: LU-14938 tests: fail_abort() in t-f to take care of MDTs
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5e8b1989dc40319c3793e0e7f9f7261041cb00b4

Comment by Andreas Dilger [ 23/Aug/21 ]

Shouldn't this be handled automatically for the MDS connections, rather than depending on the test framework to fix the connections? Otherwise there will be issues for systems on real recovery?

Comment by Alex Zhuravlev [ 23/Aug/21 ]

I'm not sure - this is an eviction case which means "IO error is possible" ? and this is why we do have clients_up() in fail_abort() ?

Comment by Gerrit Updater [ 27/Oct/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44671/
Subject: LU-14938 tests: fail_abort() in t-f to take care of MDTs
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 436cd4fd21ffee5830c9b4e75055db80c47547d5

Comment by Peter Jones [ 27/Oct/21 ]

Landed for 2.15

Generated at Sat Feb 10 03:14:03 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.