Details
-
Improvement
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
9223372036854775807
Description
ptlrpc_free_committed can be extremely time consuming if there are many async requests outstanding, such as with small async DIO as potentially created by LU-13805, or in other unusual circumstances.
The biggest problem is that we do all the work in a single threaded manner while holding the imp_lock. A recent patch ( https://review.whamcloud.com/c/fs/lustre-release/+/48629 ) changed it so the thread will release the lock and 'hand off' to a waiter if it's running too long (and there is a waiter), but the process is still serial, so this avoids stalls, but there's no improvement in overall efficiency.
Most of the work (in terms of time consumed) in ptlrpc_free_committed can be deferred and moved out from under the imp_lock. If we do this, we can also use our ability to know there is a waiter to 'batch' sets of requests, where we find a certain number, then hand off to the waiter. This has the effect of parallelizing this work and should speed it up significantly.