[LU-14738] recovery-small: FAIL: remove sub-test dirs failed Created: 06/Jun/21  Updated: 18/Aug/21  Resolved: 18/Aug/21

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Alex Zhuravlev Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Related
is related to LU-9602 recovery-random-scale test_fail_clien... In Progress
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
== recovery-small test complete, duration 238 sec ==================================================== 20:17:10 (1623010630)
Lustre: DEBUG MARKER: == recovery-small test complete, duration 238 sec ==================================================== 20:17:10 (1623010630)
LustreError: 167-0: lustre-MDT0001-osp-MDT0000: This client was evicted by lustre-MDT0001; in progress operations using this service will fail.
rm: cannot remove '/mnt/lustre/d110h.recovery-small/target_dir/tgt_file': Input/output error
 recovery-small test_148: @@@@@@ FAIL: remove sub-test dirs failed 
Lustre: DEBUG MARKER: recovery-small test_148: @@@@@@ FAIL: remove sub-test dirs failed
  Trace dump:
  = ./../tests/test-framework.sh:6166:error()
  = ./../tests/test-framework.sh:5650:check_and_cleanup_lustre()
  = recovery-small.sh:3158:main()

the root cause is that one MDS got evicted from another one and subsequent RPC (which is part of MDS_REINT(rmdir) processing) observes eviction, returning -EIO.



 Comments   
Comment by Gerrit Updater [ 07/Jun/21 ]

Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43935
Subject: LU-14738 tests: recovery-small/110k should wait for MDS
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b5080c27fb4e4f1805a3e4331f15ee95872e8a5f

Comment by Alex Zhuravlev [ 18/Aug/21 ]

a dup of LU-14938, same problem in a different place

Generated at Sat Feb 10 03:12:21 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.