[LU-13293] Readahead doesn't work well for non-stride SSF Created: 25/Feb/20  Updated: 17/Mar/20  Resolved: 01/Mar/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Major
Reporter: Shuichi Ihara Assignee: Wang Shilong (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

master


Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The workload is SSF (Single shared File, non-stride, xfer=4KB) from 8 clients with 128 processes. As an client veiw, it's aggregatable IO with large RPC, but readahead seems to not be working well. there are a lot of RA misses and small RPCs to servers.

[root@ec01 ~]# salloc --nodes=8 --ntasks-per-node=16 mpirun --allow-run-as-root /work/tools/bin/ior -b 1G -o /scratch/dir/file -a POSIX -w -r -t 4k -e -C -Q 17 -vv


Max Write: 13082.37 MiB/sec (13717.85 MB/sec)
Max Read:  854.17 MiB/sec (895.67 MB/sec)
[root@ec01 ~]# lctl get_param llite.*.read_ahead_stats
llite.scratch-ffff96ef4843c800.read_ahead_stats=
snapshot_time             1582596230.643029314 secs.nsecs
hits                      460552 samples [pages]
misses                    3733752 samples [pages]
readpage not consecutive  16 samples [pages]
miss inside window        69 samples [pages]
read but discarded        9991 samples [pages]
zero size window          371 samples [pages]
failed to reach end       3733526 samples [pages]
async readahead           65 samples [pages]
failed to fast read       3733782 samples [pages]
[root@ec01 ~]# lctl get_param osc.*.rpc_stats
osc.scratch-OST0000-osc-ffff96ef4843c800.rpc_stats=
snapshot_time:         1582596234.368149548 (secs.nsecs)
read RPCs in flight:  0
write RPCs in flight: 0
pending write pages:  0
pending read pages:   0

			read			write
pages per rpc         rpcs   % cum % |       rpcs   % cum %
1:		    257335  74  74   |          0   0   0
2:		     71811  20  94   |          0   0   0
4:		     16834   4  99   |          0   0   0
8:		       653   0  99   |          0   0   0
16:		         5   0  99   |          0   0   0
32:		         0   0  99   |          0   0   0
64:		         0   0  99   |          0   0   0
128:		         0   0  99   |          0   0   0
256:		         0   0  99   |          0   0   0
512:		         0   0  99   |          0   0   0
1024:		         0   0  99   |          1   0   0
2048:		         1   0  99   |          1   0   1
4096:		        17   0 100   |        134  98 100


 Comments   
Comment by Gerrit Updater [ 25/Feb/20 ]

Wang Shilong (wshilong@ddn.com) uploaded a new patch: https://review.whamcloud.com/37697
Subject: LU-13293 llite: don't abort readahead too aggressively
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 3ba00bd642ddba6f856f123a2b6c1f80f12126a2

Comment by Gerrit Updater [ 01/Mar/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37697/
Subject: LU-13293 llite: don't abort readahead too aggressively
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 347ec7b87c766613d43aa340c3a1e6fa3cdcf044

Comment by Peter Jones [ 01/Mar/20 ]

Landed for 2.14

Generated at Sat Feb 10 03:00:02 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.