[LU-12718] Readahead can trigger an assertion failire in cl_io_read_ahead Created: 30/Aug/19  Updated: 14/Dec/19  Resolved: 14/Dec/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Minor
Reporter: Neil Brown Assignee: Neil Brown
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-9618 Connect readahead to prep_partial_pag... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The follow crash message was observed (on the upstream-linux client).

If the pre-read needed for a sub-page write triggers read-ahead, this will happen.

We should probably just disable read-ahead for writes.

[ 1666.576466] LustreError: 3089:0:(cl_io.c:549:cl_io_read_ahead()) ASSERTION( io->ci_type == CIT_READ || io->ci_type
[ 1666.578761] LustreError: 3089:0:(cl_io.c:549:cl_io_read_ahead()) LBUG
[ 1666.579807] Pid: 3089, comm: loop3 5.2.0-rc2-00877-gca23fc4b3908 #278 SMP PREEMPT Mon Jul 22 10:32:24 AEST 2019
[ 1666.581132] Call Trace:
[ 1666.581519] libcfs_debug_dumpstack+0x67/0x84
[ 1666.582072] lbug_with_loc+0x33/0x75
[ 1666.582520] cl_io_read_ahead+0x86/0xd6
[ 1666.582979] ll_io_read_page+0x887/0x1089
[ 1666.583457] ll_write_begin+0x4fa/0x69b
[ 1666.583917] generic_perform_write+0xb4/0x193
[ 1666.584448] __generic_file_write_iter+0x12c/0x16c
[ 1666.585011] vvp_io_write_start+0x492/0x60c
[ 1666.585555] cl_io_start+0xef/0x105
[ 1666.586049] cl_io_loop+0xaa/0x1ec
[ 1666.586527] ll_file_io_generic+0x475/0x7ca
[ 1666.587131] ll_file_write_iter+0x224/0x274
[ 1666.587718] do_iter_readv_writev+0x13a/0x176
[ 1666.588308] do_iter_write+0x8b/0xff
[ 1666.588866] lo_write_bvec+0x74/0xf8
[ 1666.589374] loop_queue_work+0x376/0x8bc
[ 1666.589908] kthread_worker_fn+0xc3/0x154
[ 1666.590452] kthread+0x111/0x119
[ 1666.590892] ret_from_fork+0x24/0x30



 Comments   
Comment by Gerrit Updater [ 30/Aug/19 ]

Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36000
Subject: LU-12718 llite: don't trigger read-ahead for a write.
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ff8d8420ebdcd819c10824ca5c95f1a3163c9a80

Comment by Andreas Dilger [ 30/Aug/19 ]

Actually, the readahead for sub-page write was explicitly implemented not so long ago, because this improved performance noticeably. Otherwise, the client may be waiting for two separate reads (first partial page and last partial page) for each write. See patch https://review.whamcloud.com/27544
"LU-9618 clio: Use readahead for partial page write" for details.

Comment by Neil Brown [ 30/Aug/19 ]

Ahh... maybe it's just the LINVRNT() in cl_io_read_ahead() that is bad then ... but if it is, I would have expected it to show up when this feature was tested.

But then, there is a lot that I don't understand yet, so maybe there is a good explanation.

I did have the "Use readahead for partial page write" patch in the kernel that crashed.

 

Comment by Patrick Farrell (Inactive) [ 30/Aug/19 ]

Generally:

So, this didn't show up during original testing because the LINVRNT checks are only enabled by a build flag (I think it's config-debug-check-expensive or similar) that isn't part of the normal builds, but I think it's come up before that you use it, Neil.  (This is good!  We should use it in some of our builds as well...)

So it didn't hit in the testing for this feature because the check wasn't there.

Specifically:

Definitely the problem is the LINVRNT - It's just a belt and suspenders, "check what you know to be true" sort of thing, it's not actually enforcing any constraints that I can see.  Certainly LU-9618 has been in for a while and doesn't seem to cause any trouble

I'll also drop this in Gerrit...

Comment by James A Simmons [ 30/Aug/19 ]

Let's talk to Oleg about him enabling LINVRNT in this test rig.

Comment by James A Simmons [ 30/Aug/19 ]

I discussed with Oleg about enabling CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK in his test rig. We also have this issue with lu_ref as well. Its never tested and Neil has purposed just removing it since its never used and no one seems to be interesting in it.

Comment by Gerrit Updater [ 14/Dec/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36000/
Subject: LU-12718 obdclass: Allow read-ahead for write requests
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 514bd936d06141319fb5e8032e4b7fa7bc0d1194

Comment by Peter Jones [ 14/Dec/19 ]

Landed for 2.14

Generated at Sat Feb 10 02:55:02 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.