Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
None
-
3
-
12342
Description
Assume scenario like this:
1) thread-1 held an active extent
2) thread-2 called flush cache, and marked this extent as "urgent" and "sync_wait"
3) thread-3 wants to write to the same extent, osc_extent_find will get "conflict" because this extent is "sync_wait", so it starts to wait...
4) cl_writeback_work has been scheduled by thread-4 to write some other extents, it has sent RPCs but not returned yet.
5) thread-1 finished his work, and called osc_extent_release()> osc_io_unplug_async()>ptlrpcd_queue_work(), but found cl_writeback_work is still running, so it's ignored (-EBUSY)
6) thread-3 is stuck because nobody will wake him up.
This is an issue I hit while testing DAOS, but it should be addressed in common code.