[LU-13312] Optimized RA for stride read under memory pressure Created: 29/Feb/20 Updated: 17/Feb/21 Resolved: 17/Feb/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.14.0 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Shuichi Ihara | Assignee: | Wang Shilong (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
master |
||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
4 x client(1 x Gold 5218, 96GB RAM) # mpirun -np 64 ior -w -s 400000 -a POSIX -i 1 -C -Q 1 -g -G 27 -k -e -t 47008 -b 47008 -o /fast/dir/file -O stoneWallingStatusFile=/fast/dir/stonewall -O stoneWallingWearOut=1 -D 300 # mpirun -np 64 ior -r -s 400000 -a POSIX -i 1 -C -Q 1 -g -G 27 -k -e -t 47008 -b 47008 -o /fast/dir/file -O stoneWallingStatusFile=/fast/dir/stonewall -O stoneWallingWearOut=1 -D 300 Max Read: 5087.32 MiB/sec (5334.44 MB/sec) One of client's RA stat # lctl get_param llite.*.read_ahead_stats llite.fast-ffff99878133d000.read_ahead_stats= snapshot_time 1582946538.113259755 secs.nsecs hits 72125088 samples [pages] misses 1686810 samples [pages] readpage not consecutive 6400000 samples [pages] miss inside window 3011 samples [pages] failed grab_cache_page 2945424 samples [pages] read but discarded 35565 samples [pages] zero size window 100245 samples [pages] failed to reach end 73663094 samples [pages] failed to fast read 6396933 samples [pages] After dropping pagecache on clients before read. # clush -a "echo 3 > /proc/sys/vm/drop_caches " # mpirun -np 64 ior -r -s 400000 -a POSIX -i 1 -C -Q 1 -g -G 27 -k -e -t 47008 -b 47008 -o /fast/dir/file -O stoneWallingStatusFile=/fast/dir/stonewall -O stoneWallingWearOut=1 -D 300 Max Read: 16244.62 MiB/sec (17033.72 MB/sec) Client's RA stat # lctl get_param llite.*.read_ahead_stats llite.fast-ffff99878133d000.read_ahead_stats= snapshot_time 1582947544.040550353 secs.nsecs hits 73799940 samples [pages] misses 63 samples [pages] readpage not consecutive 6400000 samples [pages] failed grab_cache_page 2654231 samples [pages] read but discarded 1 samples [pages] zero size window 500 samples [pages] failed to reach end 402367 samples [pages] failed to fast read 35075 samples [pages]
|
| Comments |
| Comment by Shuichi Ihara [ 29/Feb/20 ] |
|
attached is debug=reada in an bad performance case. |
| Comment by Gerrit Updater [ 29/Feb/20 ] |
|
Wang Shilong (wshilong@ddn.com) uploaded a new patch: https://review.whamcloud.com/37761 |
| Comment by Wang Shilong (Inactive) [ 29/Feb/20 ] |
|
To be clear for the ticket, there might be several problems here: 1) This behavior is not acually a regression from 2) there might be two main reasons that make RA stopped currently: We should isolate problems, at least focus problem 2.1 in this ticket. |
| Comment by Wang Shilong (Inactive) [ 29/Feb/20 ] |
|
After checking debugs logs, there are many bunch of error logs like: 00020000:00400000:5.0:1582946316.433652:0:18406:0:(lov_io.c:1049:lov_io_read_ahead()) [0x200000404:0x8:0x0] cra_end = 0, stripes = 240, rc = -61 -61 is ENODATA which returned by osc_io_read_ahead(), it means readahead could not grab locks ahead, this might be related to So that explain ldlm.namespaces.*.lru_size=clear before reading testing start, it guarantee there is no PW locks from other clients and PR locks could be grabbed very aggressively which makes our readahead go very well. |
| Comment by Wang Shilong (Inactive) [ 29/Feb/20 ] |
|
I guess why your set lru_max_age=100 is because of after writing, lock cancel could take a bit time if there is too many PW locks cached in memory. |
| Comment by Gerrit Updater [ 29/Feb/20 ] |
|
Wang Shilong (wshilong@ddn.com) uploaded a new patch: https://review.whamcloud.com/37762 |
| Comment by Wang Shilong (Inactive) [ 29/Feb/20 ] |
|
Regarding to lock cancel problem, i think we talked somewhere, but finally not get a chance to push a known issue there, let's push it this ticket. |
| Comment by Shuichi Ihara [ 01/Mar/20 ] |
-61 is ENODATA which returned by osc_io_read_ahead(), it means readahead could not grab locks ahead, this might be related to your "lru_max_age=100" Shuichi Ihara? Nope, I didn't chnage lru_max_age when I got this log. |
| Comment by Shuichi Ihara [ 01/Mar/20 ] |
|
I've also confimred canceling whole locks before read helped a lot always regardless under memory pressure or not. # mpirun -np 64 ior -w -s 400000 -a POSIX -i 1 -C -Q 1 -g -G 27 -k -e -t 47008 -b 47008 -o /fast/dir/file -O stoneWallingStatusFile=/fast/dir/stonewall -O stoneWallingWearOut=1 -D 300 # clush -w ec[01-04] lctl set_param ldlm.namespaces.*.lru_size=clear > /dev/null # mpirun -np 64 ior -w -s 400000 -a POSIX -i 1 -C -Q 1 -g -G 27 -k -e -t 47008 -b 47008 -o /fast/dir/file -O stoneWallingStatusFile=/fast/dir/stonewall -O stoneWallingWearOut=1 -D 300 Max Read: 22606.54 MiB/sec (23704.67 MB/sec) Without canceling locks before read Max Read: 4241.10 MiB/sec (4447.12 MB/sec) |
| Comment by Gerrit Updater [ 24/Mar/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37762/ |
| Comment by Wang Shilong (Inactive) [ 24/Mar/20 ] |
|
This is not acutally memory problem. |
| Comment by Cory Spitz [ 24/Mar/20 ] |
|
wshilong, you closed this, but https://review.whamcloud.com/#/c/37761/ is still pending for this LU. Do you intend to abandon or re-target that patch? Or, shall we re-open this ticket? |
| Comment by Wang Shilong (Inactive) [ 25/Mar/20 ] |
|
spitzcor i'll abandon that patch. |