[LU-17026] fix issues with async readahead disrupting readahead state Created: 11/Aug/23 Updated: 11/Aug/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Minor |
| Reporter: | Patrick Farrell | Assignee: | Patrick Farrell |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | readahead | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Async readahead poses challenges for regular readahead, because the async nature of it can generate unexpected misses - normal synchronous readahead guarantees that the reading thread completes the readahead before asking for more pages. But it's possible for async readahead to have been issued and not have completed before the reading thread tries to do another read. This generates a miss, which causes the readahead code to reset. At runtime, this is mostly an annoyance with minor performance implications, but it makes testing difficult. It's not 100% clear to me this is worth the difficulty of solving it, so this ticket for now is to note the problem and sketch possible solutions. |
| Comments |
| Comment by Patrick Farrell [ 11/Aug/23 ] |
|
This will be fixed by adjusting readahead so it can wait for async readahead, but this has to be done carefully. The easy way to do this would be to wait for async readahead to have issued the page reads so the page cache can catch the problem for us, but this would take away most of the benefit of async readahead. Instead, we'll need to do something like this - somehow annotate which reads have been issued to async readahead and wait for them to complete. My thoughts are around something like an extent tree documenting the read range(s) issued by readahead, so if we go to start a new read, we see that someone else is trying to do that. The problem is doing this wait safely, etc - what if the async readahead fails? And how do we know exactly what pages it's reading? The core of this problem in practice is that it makes readahead and async readahead very difficult to test. Current readahead tests basically have to turn off async readahead to do proper testing of readahead's behavior with specific IO patterns. So it may be that we do this, plus have a few dedicated tests of async readahead written with care, and just allow this problem to stay a problem, because the effort-reward isn't ideal. It's also worth noting that async readahead doesn't work with strided readahead because strided readahead requires perfect tracking of read and readahead state, and async readahead (as currently done) makes that impossible. It's only the extremely forgiving nature of readahead for straightline or fuzzy reads that allows async readahead to work reliably. |