[LU-4509] clio can be stuck in osc_extent_wait Created: 20/Jan/14  Updated: 15/Dec/15  Resolved: 03/Feb/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.6.0, Lustre 2.5.2

Type: Bug Priority: Blocker
Reporter: Liang Zhen (Inactive) Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: mn4

Issue Links:
Duplicate
is duplicated by LU-4790 CPU soft lockups (60+ seconds) in ptl... Resolved
Related
is related to LU-4576 Seeing kernel panics in osc_extent_wa... Resolved
Severity: 3
Rank (Obsolete): 12342

 Description   

Assume scenario like this:
1) thread-1 held an active extent
2) thread-2 called flush cache, and marked this extent as "urgent" and "sync_wait"
3) thread-3 wants to write to the same extent, osc_extent_find will get "conflict" because this extent is "sync_wait", so it starts to wait...
4) cl_writeback_work has been scheduled by thread-4 to write some other extents, it has sent RPCs but not returned yet.
5) thread-1 finished his work, and called osc_extent_release()> osc_io_unplug_async()>ptlrpcd_queue_work(), but found cl_writeback_work is still running, so it's ignored (-EBUSY)
6) thread-3 is stuck because nobody will wake him up.

This is an issue I hit while testing DAOS, but it should be addressed in common code.



 Comments   
Comment by Liang Zhen (Inactive) [ 20/Jan/14 ]

patch is here: http://review.whamcloud.com/8922

Comment by Jodi Levi (Inactive) [ 03/Feb/14 ]

Patch landed to Master. Please reopen ticket if more work is needed.

Comment by James Nunez (Inactive) [ 01/May/14 ]

Patch for b2_5 at http://review.whamcloud.com/#/c/9705/

Comment by Andreas Dilger [ 28/May/14 ]

Patch landed for 2.5.2.

Generated at Sat Feb 10 01:43:21 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.