[LU-13312] Optimized RA for stride read under memory pressure - Whamcloud Community JIRA

Details

Type: Improvement
Resolution: Fixed
Priority: Minor
Fix Version/s: Lustre 2.14.0
Affects Version/s: None
Labels:
- IO500
Environment:
master

Rank (Obsolete):
9223372036854775807

Description

~~LU-12518~~ introduced new RA and it supports for page unaligned stride IO and significant improved performance (e.g. IO500 IOR_hard_read). However, it still can be optimized. The current patch sometimes doesn't work well under memory pressure?, but performance is back after dropping page caches before read. Here is a reproducer and results.

4 x client(1 x Gold 5218, 96GB RAM)
segment=400000 (~300GB per node)

# mpirun -np 64 ior -w -s 400000 -a POSIX -i 1 -C -Q 1 -g -G 27 -k -e -t 47008 -b 47008 -o /fast/dir/file -O stoneWallingStatusFile=/fast/dir/stonewall -O stoneWallingWearOut=1 -D 300

# mpirun -np 64 ior -r -s 400000 -a POSIX -i 1 -C -Q 1 -g -G 27 -k -e -t 47008 -b 47008 -o /fast/dir/file -O stoneWallingStatusFile=/fast/dir/stonewall -O stoneWallingWearOut=1 -D 300
 
Max Read:  5087.32 MiB/sec (5334.44 MB/sec)

One of client's RA stat
# lctl get_param llite.*.read_ahead_stats
llite.fast-ffff99878133d000.read_ahead_stats=
snapshot_time             1582946538.113259755 secs.nsecs
hits                      72125088 samples [pages]
misses                    1686810 samples [pages]
readpage not consecutive  6400000 samples [pages]
miss inside window        3011 samples [pages]
failed grab_cache_page    2945424 samples [pages]
read but discarded        35565 samples [pages]
zero size window          100245 samples [pages]
failed to reach end       73663094 samples [pages]
failed to fast read       6396933 samples [pages]

After dropping pagecache on clients before read.

# clush -a "echo 3 > /proc/sys/vm/drop_caches "
# mpirun -np 64 ior -r -s 400000 -a POSIX -i 1 -C -Q 1 -g -G 27 -k -e -t 47008 -b 47008 -o /fast/dir/file -O stoneWallingStatusFile=/fast/dir/stonewall -O stoneWallingWearOut=1 -D 300

Max Read:  16244.62 MiB/sec (17033.72 MB/sec)

Client's RA stat
# lctl get_param llite.*.read_ahead_stats
llite.fast-ffff99878133d000.read_ahead_stats=
snapshot_time             1582947544.040550353 secs.nsecs
hits                      73799940 samples [pages]
misses                    63 samples [pages]
readpage not consecutive  6400000 samples [pages]
failed grab_cache_page    2654231 samples [pages]
read but discarded        1 samples [pages]
zero size window          500 samples [pages]
failed to reach end       402367 samples [pages]
failed to fast read       35075 samples [pages]

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

lctl-dk-ra.txt.gz
8.85 MB
29/Feb/20 4:03 AM

Activity

People

Assignee:: Wang Shilong (Inactive)

Reporter:: Shuichi Ihara

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 29/Feb/20 4:01 AM

Updated:: 29/Aug/24 3:16 AM

Resolved:: 17/Feb/21 10:29 PM