Investigating some customer issues I noticed that currently our read looks roughly like this:
1. obtain the pages and lock them
2. prepare and then execute our bulk request
3. unlock the pages.
essentially holding pages locked for network IO. This seems to be not super optimal since parallel reads cannot proceed and must wait for each other read to complete first. This would also help in case of client death/network problems since any hung bulk RPCs would not have as much impact on parallel operations.
We already have dlm locks to protect us so we probably should be fine to drop the locked pages at step2 and possibly even for writes as well (need to investigate eviction implications) - we were already operating in this mode in 18. and prior before read only cache on the server was implemented and each RPC had a private pool of pages not connected to any inode mappings.
Attachments
Issue Links
is duplicated by
LU-9232Reduce osd-ldiskfs page lock hold time for read
Resolved
Activity
[LU-11221] Do not hold pages locked for network IO on the server.
Oleg, Alex - why PG_writeback isn't used for this case?
from other side have pages unlocked open race with truncate from same host.
Alexey Lyashkov
added a comment - Oleg, Alex - why PG_writeback isn't used for this case?
from other side have pages unlocked open race with truncate from same host.
thanks for the report. I need some time to analyze this. my first guess is that pagecache overhead (need to allocate pages, scanning to release old pages) introduces additional gaps between I/Os, but this is not true as each thread allocates all needed pages and only after that submit I/Os. so it must be something else.
Alex Zhuravlev
added a comment - thanks for the report. I need some time to analyze this. my first guess is that pagecache overhead (need to allocate pages, scanning to release old pages) introduces additional gaps between I/Os, but this is not true as each thread allocates all needed pages and only after that submit I/Os. so it must be something else.
I'm not sure patch https://review.whamcloud.com/33521 can behave using page caches if file size < readcache_max_filesize, but no page caches after readcache_max_filesize.
At least, I have been seeing very good merged IOs with "read_cache_enable=0 writethrough_cache_enable=0", but "read_cache_enable=1 writethrough_cache_enable=1 and readcache_max_filesize=1048576" seems to be less chance to make merged IOs and less performance.
Please see the following test results.
Disable Pagecache mode
[root@es18k-vm11 ~]# lctl set_param osd-ldiskfs.*.read_cache_enable=0 obdfilter.*.writethrough_cache_enable=0 obdfilter.*.brw_size=16
Run IOR from 32 clients, 256 process into single OST
[root@c01 ~]# salloc -N 32 --ntasks-per-node=8 mpirun -np 256 --allow-run-as-root /work/tools/bin/ior -w -t 1m -b 1g -e -F -v -C -Q 24 -o /scratch0/ost0/file
Max Write: 8461.94 MiB/sec (8872.99 MB/sec)
I saw good merged IOs (16MB IOs on storage side) in this case.
[root@es18k-vm11 ~]# blktrace /dev/sda -a issue -a complete -o - | blkiomon -I 120 -h -
time: Sun Apr 7 14:19:45 2019
device: 8,0
sizes read (bytes): num 0, min -1, max 0, sum 0, squ 0, avg 0.0, var 0.0
sizes write (bytes): num 17619, min 4096, max 16777216, sum 274927271936, squ 4533348865750859776, avg 15604022.5, var 257270866642200.3
d2c read (usec): num 0, min -1, max 0, sum 0, squ 0, avg 0.0, var 0.0
d2c write (usec): num 17619, min 48, max 6425912, sum 8433387602, squ 5102891088905970, avg 478653.0, var 238785697942.2
throughput read (bytes/msec): num 0, min -1, max 0, sum 0, squ 0, avg 0.0, var 0.0
throughput write (bytes/msec): num 17619, min 0, max 4273907, sum 464644931, squ 427646612524675, avg 26371.8, var 23576427970.2
sizes histogram (bytes):
0: 0 1024: 0 2048: 0 4096: 153
8192: 11 16384: 8 32768: 12 65536: 18
131072: 27 262144: 31 524288: 79 1048576: 77
2097152: 21 4194304: 458 8388608: 742 > 8388608: 15982 <--- good 16M IOs here
d2c histogram (usec):
0: 0 8: 0 16: 0 32: 0
64: 43 128: 36 256: 57 512: 90
1024: 32 2048: 18 4096: 2 8192: 12
16384: 31 32768: 169 65536: 339 131072: 1528
262144: 1009 524288: 5433 1048576: 8751 2097152: 53
4194304: 13 8388608: 3 16777216: 0 33554432: 0
>33554432: 0
bidirectional requests: 0
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36570/
Subject:
LU-11221osd: allow concurrent bulks from pagecacheProject: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: efcdfe9e075fdfa334d16bcb53399f2978c16d42