Loading...

XML

Word

Printable

Details

Type: Improvement
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: None
Labels:
- perf_optimization
- readahead

Rank (Obsolete):
9223372036854775807

Description

For files that fit comfortably within the client RAM it would be possible to tune the readahead algorithm to fetch the entire file into the client RAM even with a totally random read workload (see e.g. ~~LU-5561~~).

This behaviour can be simulated to some effect by setting the following parameters:

lctl set_param llite.$FSNAME*.max_read_ahead_mb=16384 \
        llite.$FSNAME*.max_read_ahead_per_file_mb=1024 \
        llite.$FSNAME*.max_read_ahead_whole_mb=512

or similar. This causes the Lustre client to read all file data into cache if the file is <= 512MiB in size and is accessed more than once. Performance testing shows that this can improve random 4KB reads dramatically compared to the default settings (64MiB, 64MiB, 2MiB respectively - 7574016 IOPS vs. 175616 IOPS.

On the one hand, this might be considered "cheating" to read the whole file into RAM before doing random IOPS, but on the other hand if this behaviour can be generalized and is automatic for such workloads it is no more of a "cheat" than readahead or write-behind optimizations for a sequential workload.

The main drawback from just increasing max_read_ahead_whole_mb significantly is that this can have negative effects on some workloads that are only reading a small amount of data from many large files, such as reading an index/header from output files to find parameters, or file/finder/etc that read only the head/tail of a file to determine content.

A reasonable compromise is to implement the "read ahead whole file" heuristics for files that have a significant number of random accesses, but can easily fit within the client RAM. As a default starting point, we could add max_read_ahead_random_whole_mb an max_read_ahead_random_ratio parameters that encode this behaviour specifically. The default file size could be something reasonable like totalram_pages / num_online_cpus / 2 (e.g. 1820MB for a 36-core 128GB client) and a ratio of 1/4096 pages read before triggering readahead.

That would mean a 1GB file would be fully prefetched after 64 random pages were read, and the maximum-sized 1820MB file would be fully prefetched after 113 pages are read.

Attachments

Issue Links

is related to

LU-11657 Prefetch whole ZFS block into client cache on random read

Open

LU-15100 Add ability to tune definition of loose sequential read

Open

LU-13669 improve mmap performances

Resolved

is related to

LU-2032 small random read i/o performance regression

Open

LU-5561 Lustre random reads: 80% performance loss from 1.8 to 2.6

Resolved

LU-8709 parallel asynchronous readahead

Resolved

LU-18430 add random fadvise to disable read-ahead on a file

Resolved

(2 is related to )

Activity

People

Assignee:: WC Triage

Reporter:: Andreas Dilger

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 21/Sep/18 11:28 PM

Updated:: 24/Apr/25 4:33 AM