Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.9.0
-
ZFS based OSTs
-
3
-
9223372036854775807
Description
When max_read_ahead_per_file_mb is set to be smaller than the size of RPCs (specified through max_pages_per_rpc), prefetch stops working. The prefetch window stays at zero and we have no prefetch. The IO transactions of the size of what the user requested (in our case 32kB) are what is passed on to the OSTs, as tiny RPCs. In the past the read ahead was at least as large as the max_read_ahead_per_file_mb. Now it is zero.
I consider this a regression, it actually broke performance of a system that was tuned for many streams per client. I think it came with the switch to large (16MB) RPCs.
ZFS setups are particularly sensitive on the prefetch pattern and we discovered this by looking at the read_ahead_stats:
[nec@z0073 miifs01-ffff88085beff000]$ cat read_ahead_stats snapshot_time 1487584498.777917 secs.usecs hits 5365098708 samples [pages] misses 65570130 samples [pages] readpage not consecutive 1195 samples [pages] miss inside window 2469645 samples [pages] failed grab_cache_page 5894 samples [pages] read but discarded 1605074 samples [pages] zero length file 9 samples [pages] zero size window 2762049936 samples [pages] read-ahead to EOF 28918 samples [pages] hit max r-a issue 4900588 samples [pages] failed to reach end 2396865 samples [pages]
The zero size window samples are in the same order of magnitude like the hits. Actually about half of the hits. But: we have huge files (many GB large)! The code for read ahead was changed with the patch http://review.whamcloud.com/19368.