[LU-15907] sanityn test_41i: fix the OBD_FAIL_MDS_REINT_OPEN2 race Created: 01/Jun/22  Updated: 01/Jun/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Etienne Aujames Assignee: Etienne Aujames
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

With the https://review.whamcloud.com/47487 ("LU-15546 mdt: mdt_reint_open lookup before locking") , the OBD_FAIL_MDS_REINT_OPEN2 race timeouts in sanityn test_41i:

LustreError: 3945:0:(libcfs_fail.h:178:cfs_race()) cfs_fail_race id 16a awake: rc=0

Now, the first thread take a PW parent lock (by checking the child existence before locking) . So the second thread is waiting for lock (PR locks are compatible but not the PW locks) .

We have to force the first thread to take a PR parent lock to keep testing the full lock cycle:

  • take PR parent lock
  • lockup child (do not exist)
  • take PW parent lock
  • re-lookup
  • create child


 Comments   
Comment by Gerrit Updater [ 01/Jun/22 ]

"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/47506
Subject: LU-15907 mdt: fix the OBD_FAIL_MDS_REINT_OPEN2 race
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d514137a5bf5bc53c2c35fb2c81f840813c45212

Generated at Sat Feb 10 03:22:18 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.