Details
-
Improvement
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
9223372036854775807
Description
For files that fit comfortably within the client RAM it would be possible to tune the readahead algorithm to fetch the entire file into the client RAM even with a totally random read workload (see e.g. LU-5561).
This behaviour can be simulated to some effect by setting the following parameters:
lctl set_param llite.*.max_read_ahead_mb=16384 \ llite.${Lustre_NID}\*.max_read_ahead_per_file_mb=1024 \ llite.${Lustre_NID}\*.max_read_ahead_whole_mb=512
or similar. This causes the Lustre client to read all file data into cache if the file is <= 512MiB in size and is accessed more than once. Performance testing shows that this can improve random 4KB reads dramatically compared to the default settings (64MiB, 64MiB, 2MiB respectively - 7574016 IOPS vs. 175616 IOPS.
On the one hand, this might be considered "cheating" to read the whole file into RAM before doing random IOPS, but on the other hand if this behaviour can be generalized and is automatic for such workloads it is no more of a "cheat" than readahead or write-behind optimizations for a sequential workload.
The main drawback from just increasing max_read_ahead_whole_mb significantly is that this can have negative effects on some workloads that are only reading a small amount of data from many large files, such as reading an index/header from output files to find parameters, or file/finder/etc that read only the head/tail of a file to determine content.
A reasonable compromise is to implement the "read ahead whole file" heuristics for files that have a significant number of random accesses, but can easily fit within the client RAM. As a default starting point, we could add max_read_ahead_random_whole_mb an max_read_ahead_random_ratio parameters that encode this behaviour specifically. The default file size could be something reasonable like totalram_pages / num_online_cpus / 2 (e.g. 1820MB for a 36-core 128GB client) and a ratio of 1/4096 pages read before triggering readahead.
That would mean a 1GB file would be fully prefetched after 64 random pages were read, and the maximum-sized 1820MB file would be fully prefetched after 113 pages are read.
Attachments
Issue Links
- is related to
-
LU-11657 Prefetch whole ZFS block into client cache on random read
- Open
-
LU-15100 Add ability to tune definition of loose sequential read
- Open
- is related to
-
LU-2032 small random read i/o performance regression
- Open
-
LU-5561 Lustre random reads: 80% performance loss from 1.8 to 2.6
- Resolved
-
LU-18430 add random fadvise to disable read-ahead on a file
- Open
-
LU-8709 parallel asynchronous readahead
- Resolved