Affects Version/s: None
Fix Version/s: Lustre 2.15.0
The AIO implementation created in
LU-4198 is able to perform at extremely high speeds because it submits multiple i/os via the direct i/o path, in a manner similar to the buffered i/o path.
Consider the case where we do 1 MiB AIO requests with a queue depth of 64 MiB. In this case, we submit 64 1 MiB DIO requests, and then we wait for them to complete. (Assume we do only 64 MiB of i/o total, just for ease of conversation.)
Critically, we submit all the i/o requests and then wait for completion. We do not wait for completion of individual 1 MiB writes.
Compare this now to the case where we write do a 64 MiB DIO write (or some smaller size, but > stripe size). As
LU-4198 originally noted, the performance of DIO does not scale when adding stripes.
Consider a file with a stripe size of 1 MiB.
This 64 MiB DIO generates 64 1 MiB writes, exactly the same as AIO with a queue depth of 64.
Except that while the AIO request performs at ~4-5 GiB/s, the DIO request performs at ~300 MiB/s.
This is because the DIO system submits each 1 MiB request and then waits for it:
(Submit 1 stripe(1 MiB)) --> wait for sync, (Submit 1 stripe (1 MiB)) --> wait for sync ... etc, 64 times.
AIO submits all of the requests and then waits, so:
(Submit 1 stripe(1 MiB)) -> (Submit 1 stripe(1 MiB)) -> (Submit 1 stripe(1 MiB)) -> (Submit 1 stripe(1 MiB)) -> (Submit 1 stripe(1 MiB)) -> (Submit 1 stripe(1 MiB)) -> ... (Wait for all writes to complete)
There is no reason DIO cannot work the same way, and when we make this change, large DIO writes & reads jump in performance to the same levels as AIO with an equivalent queue depth.
The change consists of essentially moving the waiting from the ll_direct_rw_* code up to the ll_file_io_generic layer and waiting for the completion of all submitted i/os rather than one at a time - It is a relatively simple change.
The improvement is dramatic, from a few hundred MiB/s to roughly 5 GiB/s.
The basic patch is relatively simple, but there are a number of additional subtleties to work out around when to do this and what sizes to submit, etc, etc. Basic patch will be forthcoming shortly.