[LU-16338] read-ahead more than file size for a small file Created: 23/Nov/22  Updated: 25/Apr/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Qian Yingjin Assignee: Qian Yingjin
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-16579 llite: Fix the wrong ending offset ca... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

The following scripts will read pages more than file size:

test_907()
{
        local file=$DIR/$tfile

        $LCTL set_param -n llite.*.read_ahead_stats=0
        fallocate -l 16K $file || error "failed to fallocate $file"
        cancel_lru_locks osc
        #dd if=$file of=/dev/null bs=1M
        $MULTIOP $file or1048576c
        $LCTL get_param llite.*.read_ahead_stats
}
run_test 907 "read ahaed for small file"

The readahead page stats are as follows:

== sanity test 907: read ahaed for small file ============ 04:38:01 (1669196281)
short read: 16384/1048576
short read: 0/1032192
llite.lustre-ffff938b7849d000.read_ahead_stats=
snapshot_time             4011.320890492 secs.nsecs
start_time                0.000000000 secs.nsecs
elapsed_time              4011.320890492 secs.nsecs
hits                      4 samples [pages]
misses                    1 samples [pages]
zero_size_window          4 samples [pages]
failed_to_reach_end       1 samples [pages]
failed_to_fast_read       1 samples [pages]
readahead_pages           1 samples [pages] 255 255 255

For small files, the read-ahead pages reach 255, it is not reasonable...



 Comments   
Comment by Gerrit Updater [ 23/Nov/22 ]

"Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49224
Subject: LU-16338 readahead: add stats for read-ahead page count
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 34779e7c2f51d93e190c863682c79ecd0060b06c

Comment by Gerrit Updater [ 23/Nov/22 ]

"Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49226
Subject: LU-16338 readahead: clip readahead with kms
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9404bfb6f825c28d692d9c833ccd2137c46a84d3

Comment by Qian Yingjin [ 25/Nov/22 ]

our read-ahead mechanism is much radical.
And there is some logic unreasonable in the read-ahed code path.
In @ll_readahead(), it already clips the readahead end index with the kms, but in the following @ll_read_ahead_pages(), it increases the end index to align with optimal RPC size and @ria->ria_end_idx_min.

One solution for this problem is not to increase the end index larger than kms in llite layer.
Another solution is the patch above: clip the returned readahead range with Kim in OSC layer.

Comment by Patrick Farrell [ 18/Jan/23 ]

I think the logic at the top level is a bit weird - I have mixed feelings about it - but I think you have the right idea about clipping at the OSC layer.  Even if we change the higher level logic, clipping at the OSC layer seems like a good idea to me.

Comment by Gerrit Updater [ 27/Jan/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49224/
Subject: LU-16338 readahead: add stats for read-ahead page count
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: cdcf97e17e73dfdd65c4e46bb30c4a07f5e710cf

Comment by Qian Yingjin [ 17/Feb/23 ]

For a single-stripe file, after applied https://review.whamcloud.com/c/fs/lustre-release/+/49226
Subject: LU-16338 readahead: clip readahead with kms,
It cause a dead loop:
https://testing.whamcloud.com/test_logs/8d73c4e7-0e6e-482c-b93c-bf7159706890/show_text

00000080:00200000:1.0:1676614736.277040:0:21215:0:(rw.c:1981:ll_readpage()) pgno:260, cnt:1032192, pos:0
00000080:00000001:1.0:1676614736.277041:0:21215:0:(vvp_io.c:1654:vvp_io_read_ahead()) Process entered
00000080:00000001:1.0:1676614736.277041:0:21215:0:(vvp_io.c:1666:vvp_io_read_ahead()) Process leaving (rc=0 : 0 : 0)
00000008:00000001:1.0:1676614736.277042:0:21215:0:(osc_io.c:83:osc_io_read_ahead()) Process entered
00000008:00000001:1.0:1676614736.277042:0:21215:0:(osc_lock.c:1281:osc_obj_dlmlock_at_pgoff()) Process entered
00000008:00000001:1.0:1676614736.277043:0:21215:0:(osc_request.c:3137:osc_match_base()) Process entered
00000008:00000001:1.0:1676614736.277043:0:21215:0:(osc_request.c:3172:osc_match_base()) Process leaving (rc=4 : 4 : 4)
00000008:00000001:1.0:1676614736.277044:0:21215:0:(osc_lock.c:1315:osc_obj_dlmlock_at_pgoff()) Process leaving (rc=18446620395345229440 : -123678364322176 : ffff8f83e3073680)
00000008:00000001:1.0:1676614736.277045:0:21215:0:(osc_io.c:112:osc_io_read_ahead()) Process leaving (rc=0 : 0 : 0)
00000080:00000001:1.0:1676614736.277045:0:21215:0:(rw.c:2012:ll_readpage()) Process leaving (rc=524289 : 524289 : 80001)
00000080:00000001:1.0:1676614736.277046:0:21215:0:(rw.c:1873:ll_readpage()) Process entered
00000080:00200000:1.0:1676614736.277046:0:21215:0:(rw.c:1981:ll_readpage()) pgno:260, cnt:1032192, pos:0
00000080:00000001:1.0:1676614736.277047:0:21215:0:(vvp_io.c:1654:vvp_io_read_ahead()) Process entered
00000080:00000001:1.0:1676614736.277047:0:21215:0:(vvp_io.c:1666:vvp_io_read_ahead()) Process leaving (rc=0 : 0 : 0)
00000008:00000001:1.0:1676614736.277048:0:21215:0:(osc_io.c:83:osc_io_read_ahead()) Process entered
00000008:00000001:1.0:1676614736.277048:0:21215:0:(osc_lock.c:1281:osc_obj_dlmlock_at_pgoff()) Process entered
00000008:00000001:1.0:1676614736.277049:0:21215:0:(osc_request.c:3137:osc_match_base()) Process entered
00000008:00000001:1.0:1676614736.277049:0:21215:0:(osc_request.c:3172:osc_match_base()) Process leaving (rc=4 : 4 : 4)
00000008:00000001:1.0:1676614736.277050:0:21215:0:(osc_lock.c:1315:osc_obj_dlmlock_at_pgoff()) Process leaving (rc=18446620395345229440 : -123678364322176 : ffff8f83e3073680)
00000008:00000001:1.0:1676614736.277051:0:21215:0:(osc_io.c:112:osc_io_read_ahead()) Process leaving (rc=0 : 0 : 0)
00000080:00000001:1.0:1676614736.277051:0:21215:0:(rw.c:2012:ll_readpage()) Process leaving (rc=524289 : 524289 : 80001)

The dead loop code lines:

if (cl_offset(clob, vmpage->index) >= iter->count + iocb->ki_pos) {
                        result = cl_io_read_ahead(env, io, vmpage->index, &ra);
                        if (result < 0 || vmpage->index > ra.cra_end_idx) {
                                cl_read_ahead_release(env, &ra);
                                unlock_page(vmpage);
                                RETURN(AOP_TRUNCATED_PAGE); ===>AOP_TRUNCATED_PAGE = 80001
                        }
                }

I will investigate whether this bug existed without the patch.

Comment by Gerrit Updater [ 28/Mar/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49226/
Subject: LU-16338 readahead: clip readahead with kms
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b33808d3aebb06cf0660baee1186b2f944d9d06d

Generated at Sat Feb 10 03:26:09 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.