Details
-
Improvement
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
9223372036854775807
Description
Loose sequential read is the Lustre term for reads which do not read all pages and are not strided, but do proceed in a semi-random fashion forward or backward through the file. Basically, they jump forward or backward a small random amount between each read. This is a fairly common pattern in database queries, for example, which have a certain hit rate on a large table and so pull a certain % of pages, mostly randomly.
The definition of this in Lustre has been limited to "within 8 pages of previous access" for a very long time. This is a tiny range, and it should be larger - and it should be tunable. This tiny range means that an application which reads page 5, then page 18, then page 27, then page 50, etc, is considered entirely random, which is very bad for performance. Making the limit on 'loose forward read' larger will allow readahead to recognize these cases, and perform readahead. (The cost of reading 1 MiB is only slightly higher than the cost of reading 1 page, so if we get even a few hits per MiB, it's worth reading in the data. So it makes sense to pull in all the data for these "loose sequential" reads.)
Patch forthcoming.
"The requirement that max_readahead_per_file >= max_readahead_whole makes sense to me. If the user specifies "don't do more than 256MB of readahead for a single file" it doesn't really make sense to speculatively readahead all of a 1GB file on the first or second access. That can cause a lot of data to be read for some workloads that are only accessing a tiny amount of data in each file."
Sure, but these are separate tunables. If the user doesn't want whole files read in above a certain size, then that's what max_readahead_whole is for. The existing requirement means if they want to turn up max readahead whole, they have to increase the window size for ongoing reads. And with the patch to link the tunables, when they turn up max readahead whole, the window size will go up automatically if they turn up max readahead whole.
I think LU-11416 is a good idea, there's a lot of detail to be sorted out about when to do that vs do other things - that seems like it's often the hardest part of readahead. When to do one thing vs another.