Details
-
Improvement
-
Resolution: Fixed
-
Major
-
None
-
None
-
9223372036854775807
Description
The AIO implementation created in LU-4198 is able to perform at extremely high speeds because it submits multiple i/os via the direct i/o path, in a manner similar to the buffered i/o path.
Consider the case where we do 1 MiB AIO requests with a queue depth of 64 MiB. In this case, we submit 64 1 MiB DIO requests, and then we wait for them to complete. (Assume we do only 64 MiB of i/o total, just for ease of conversation.)
Critically, we submit all the i/o requests and then wait for completion. We do not wait for completion of individual 1 MiB writes.
Compare this now to the case where we write do a 64 MiB DIO write (or some smaller size, but > stripe size). As LU-4198 originally noted, the performance of DIO does not scale when adding stripes.
Consider a file with a stripe size of 1 MiB.
This 64 MiB DIO generates 64 1 MiB writes, exactly the same as AIO with a queue depth of 64.
Except that while the AIO request performs at ~4-5 GiB/s, the DIO request performs at ~300 MiB/s.
This is because the DIO system submits each 1 MiB request and then waits for it:
(Submit 1 stripe(1 MiB)) --> wait for sync, (Submit 1 stripe (1 MiB)) --> wait for sync ... etc, 64 times.
AIO submits all of the requests and then waits, so:
(Submit 1 stripe(1 MiB)) -> (Submit 1 stripe(1 MiB)) -> (Submit 1 stripe(1 MiB)) -> (Submit 1 stripe(1 MiB)) -> (Submit 1 stripe(1 MiB)) -> (Submit 1 stripe(1 MiB)) -> ... (Wait for all writes to complete)
There is no reason DIO cannot work the same way, and when we make this change, large DIO writes & reads jump in performance to the same levels as AIO with an equivalent queue depth.
The change consists of essentially moving the waiting from the ll_direct_rw_* code up to the ll_file_io_generic layer and waiting for the completion of all submitted i/os rather than one at a time - It is a relatively simple change.
The improvement is dramatic, from a few hundred MiB/s to roughly 5 GiB/s.
Quick benchmark:
mpirun -np 1 $IOR -w -r -t 256M -b 64G -o ./iorfile --posix.odirect Before: Max Write: 583.03 MiB/sec (611.35 MB/sec) Max Read: 641.03 MiB/sec (672.17 MB/sec) After (w/patch): Max Write: 5185.96 MiB/sec (5437.87 MB/sec) Max Read: 5093.06 MiB/sec (5340.46 MB/sec)
The basic patch is relatively simple, but there are a number of additional subtleties to work out around when to do this and what sizes to submit, etc, etc. Basic patch will be forthcoming shortly.
Attachments
Issue Links
- is related to
-
LU-14828 Remove extra debug from 398m
- Resolved
-
LU-4198 Improve IO performance when using DIRECT IO using libaio
- Resolved
- is related to
-
LU-13802 New i/o path: Buffered i/o as DIO
- Open
-
LU-13799 DIO/AIO efficiency improvements
- Resolved
-
LU-13805 i/o path: Unaligned direct i/o
- Open
-
LU-13814 DIO performance: cl_page struct removal for DIO path
- Open