[LU-16019] sanity test_101j: FAIL: expected 4096 got 8192 Created: 15/Jul/22  Updated: 30/Jan/24  Resolved: 08/Aug/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.1, Lustre 2.15.3
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Qian Yingjin
Resolution: Fixed Votes: 0
Labels: None
Environment:

RHEL 9.0 client


Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for jianyu <yujian@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/3a8eb542-e2e6-4223-8a43-3f37b87a4636

test_101j failed with the following error:

== sanity test 101j: A complete read block should be submitted when no RA ========================================================== 19:37:12 (1657913832)
Disable read-ahead
16+0 records in
16+0 records out
16777216 bytes (17 MB, 16 MiB) copied, 0.0565881 s, 296 MB/s
Reset readahead stats
4096+0 records in
4096+0 records out
16777216 bytes (17 MB, 16 MiB) copied, 1.08708 s, 15.4 MB/s
snapshot_time             3868.949651534 secs.nsecs
start_time                0.000000000 secs.nsecs
elapsed_time              3868.949651534 secs.nsecs
failed_to_fast_read       8192 samples [pages]
 sanity test_101j: @@@@@@ FAIL: expected 4096 got 8192

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_101j - expected 4096 got 8192



 Comments   
Comment by Jian Yu [ 15/Jul/22 ]

Hi Yingjin,
You fixed a similar issue in LU-15244. Could you please take a look at this one?

Comment by Qian Yingjin [ 18/Jul/22 ]

In the new kernel 5.14 (rhel9), the redahead may be out of Lustre control:

 

void page_cache_sync_ra(struct readahead_control *ractl,
		unsigned long req_count)
{
	bool do_forced_ra = ractl->file && (ractl->file->f_mode & FMODE_RANDOM);

	/*
	 * Even if read-ahead is disabled, issue this request as read-ahead
	 * as we'll need it to satisfy the requested range. The forced
	 * read-ahead will do the right thing and limit the read to just the
	 * requested range, which we'll set to 1 page for this case.
	 */
	if (!ractl->ra->ra_pages || blk_cgroup_congested()) {
		if (!ractl->file)
			return;
		req_count = 1;
		do_forced_ra = true;
	}

	/* be dumb */
	if (do_forced_ra) {
		force_page_cache_ra(ractl, req_count);
		return;
	}

	/* do read-ahead */
	ondemand_readahead(ractl, false, req_count);
}

generic_file_read_iter()
  ->filemap_read()
    ->filemap_get_pages()
      ->page_cache_sync_readahead()
        ->page_cache_sync_ra()

Only setting @ra_pages with 0 can not totally avoid the readahead in the kernel I/O path.

I will try to set @bdi->io_pages to see whether it can totally avoid the readahead in the kernel.

Comment by Gerrit Updater [ 20/Jul/22 ]

"Yingjin Qian <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/47993
Subject: LU-16019 llite: fully disable readahead in kernel I/O path
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b3cafb06ad0dd347e487df3e4c7b0086f8c23378

Comment by Gerrit Updater [ 08/Aug/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47993/
Subject: LU-16019 llite: fully disable readahead in kernel I/O path
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f0cf7fd3cccb2313fa94a307cf862afba256b8d8

Comment by Peter Jones [ 08/Aug/22 ]

Landed for 2.16

Comment by Gerrit Updater [ 15/Aug/22 ]

"Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/48219
Subject: LU-16019 llite: fully disable readahead in kernel I/O path
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: a0309239b7bf3d2ea1edd2b3c6899e62310f2121

Comment by Xing Huang [ 29/Dec/22 ]

+1 on b2_15
https://testing.whamcloud.com/test_sets/a24fd753-7784-4fee-a205-80c2f69c3b9c

Comment by Etienne Aujames [ 22/Jan/24 ]

This test seems to fail systematically on b2_15 with SLES 15.5:

Maloo request:
https://testing.whamcloud.com/search?end_date=2024-01-22&source=sub_tests&start_date=2024-01-16&status=FAIL&sub_test_script_id=4e0877da-9d97-11e9-9e3d-52540065bddc&test_set_script_id=f9516376-32bc-11e0-aaee-52540025f9ae#redirect

List of failed tests:

Comment by Xing Huang [ 30/Jan/24 ]

+1 on b2_15 again
https://testing.whamcloud.com/test_sessions/66a00fbf-6bc8-4f72-a0c3-b1a5479c0ca9

Generated at Sat Feb 10 03:23:17 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.