[LU-7400] top_trans_create() followed by top_trans_stop() get stuck Created: 06/Nov/15  Updated: 02/Dec/15  Resolved: 02/Dec/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Blocker
Reporter: Alex Zhuravlev Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

as sub_thandle_register_stop_cb() is called in top_trans_start(), then missing top_trans_start() (which is valid case) cause top_trans_stop() to wait indefinitely for a missing stop callbacks.



 Comments   
Comment by Gerrit Updater [ 06/Nov/15 ]

Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: http://review.whamcloud.com/17059
Subject: LU-7400 lod: register commit callbacks at create
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 58a978451fd7504ecc9ab3bbf841a3499401f0b2

Comment by Andreas Dilger [ 06/Nov/15 ]

Alex, under what kind of workload is this bug hit, and how easily does that happen? Is this a patch that needs to be landed for 2.8.0?

Comment by Alex Zhuravlev [ 06/Nov/15 ]

for example, if a target fails during recovery and some of preparation RPC (e.g. fetching EA) returns an error, then the original migration process got stuck. even if the failed target is back, it's still stuck. I think it makes sense to consider landing.

Comment by Di Wang [ 12/Nov/15 ]

I think we also have this issue for commit callback, which I found in failover soak-test. I will update the patch to resolve them together. IMHO, this should get into 2.8, since it will cause endless recovery.

Comment by Di Wang [ 12/Nov/15 ]

Since we saw this in soak-test DNE, and we have to pass it before release, so let's make it blocker for now.

Comment by Gerrit Updater [ 02/Dec/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17059/
Subject: LU-7400 lod: register stop callbacks at create
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 082eabdeaa0c2a0f536accf7028e5ab5061c2c46

Comment by Joseph Gmitter (Inactive) [ 02/Dec/15 ]

Landed for 2.8

Generated at Sat Feb 10 02:08:34 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.