[LU-12718] Readahead can trigger an assertion failire in cl_io_read_ahead Created: 30/Aug/19 Updated: 14/Dec/19 Resolved: 14/Dec/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.14.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Neil Brown | Assignee: | Neil Brown |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
The follow crash message was observed (on the upstream-linux client). If the pre-read needed for a sub-page write triggers read-ahead, this will happen. We should probably just disable read-ahead for writes. [ 1666.576466] LustreError: 3089:0:(cl_io.c:549:cl_io_read_ahead()) ASSERTION( io->ci_type == CIT_READ || io->ci_type |
| Comments |
| Comment by Gerrit Updater [ 30/Aug/19 ] |
|
Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36000 |
| Comment by Andreas Dilger [ 30/Aug/19 ] |
|
Actually, the readahead for sub-page write was explicitly implemented not so long ago, because this improved performance noticeably. Otherwise, the client may be waiting for two separate reads (first partial page and last partial page) for each write. See patch https://review.whamcloud.com/27544 |
| Comment by Neil Brown [ 30/Aug/19 ] |
|
Ahh... maybe it's just the LINVRNT() in cl_io_read_ahead() that is bad then ... but if it is, I would have expected it to show up when this feature was tested. But then, there is a lot that I don't understand yet, so maybe there is a good explanation. I did have the "Use readahead for partial page write" patch in the kernel that crashed.
|
| Comment by Patrick Farrell (Inactive) [ 30/Aug/19 ] |
|
Generally: So, this didn't show up during original testing because the LINVRNT checks are only enabled by a build flag (I think it's config-debug-check-expensive or similar) that isn't part of the normal builds, but I think it's come up before that you use it, Neil. (This is good! We should use it in some of our builds as well...) So it didn't hit in the testing for this feature because the check wasn't there. Specifically: Definitely the problem is the LINVRNT - It's just a belt and suspenders, "check what you know to be true" sort of thing, it's not actually enforcing any constraints that I can see. Certainly I'll also drop this in Gerrit... |
| Comment by James A Simmons [ 30/Aug/19 ] |
|
Let's talk to Oleg about him enabling LINVRNT in this test rig. |
| Comment by James A Simmons [ 30/Aug/19 ] |
|
I discussed with Oleg about enabling CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK in his test rig. We also have this issue with lu_ref as well. Its never tested and Neil has purposed just removing it since its never used and no one seems to be interesting in it. |
| Comment by Gerrit Updater [ 14/Dec/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36000/ |
| Comment by Peter Jones [ 14/Dec/19 ] |
|
Landed for 2.14 |