I see what you're getting at, but then I would expect the % of small RPCs to drop as we go up in file size, but I in fact see the opposite. Something goes badly wrong for me around 4 GiB in size, which is 50% of memory on this VM node (which is unlikely to be a coincidence).
Examples - 1000 MiB:
[root@cent7c01 cent7s02]# ls -la ost0; cat ost0 > /dev/null; cat /proc/fs/lustre/osc/cent7s02-OST0000-osc-ffff88022437c800/rpc_stats
rw-rr- 1 root root 1048576000 Dec 7 16:52 ost0
snapshot_time: 1512683598.957290341 (secs.nsecs)
read RPCs in flight: 0
write RPCs in flight: 0
pending write pages: 0
pending read pages: 0
read write
pages per rpc rpcs % cum % | rpcs % cum %
1: 22 8 8 | 0 0 0
2: 0 0 8 | 0 0 0
4: 0 0 8 | 0 0 0
8: 0 0 8 | 0 0 0
16: 0 0 8 | 0 0 0
32: 0 0 8 | 0 0 0
64: 0 0 8 | 0 0 0
128: 0 0 8 | 0 0 0
256: 0 0 8 | 0 0 0
512: 0 0 8 | 0 0 0
1024: 250 91 100 | 0 0 0
2000 MiB:
[root@cent7c01 cent7s02]# dd if=/dev/zero bs=1M count=2000 of=./ost0; echo 3 > /proc/sys/vm/drop_caches ; echo clear > /sys/fs/lustre/ldlm/namespaces/cent7s02-OST0000-osc-ffff88022437c800/lru_size; echo c > /proc/fs/lustre/osc/cent7s02-OST0000-osc-ffff88022437c800/rpc_stats; cat ost0 > /dev/null; cat /proc/fs/lustre/osc/cent7s02-OST0000-osc-ffff88022437c800/rpc_stats
2000+0 records in
2000+0 records out
2097152000 bytes (2.1 GB) copied, 7.99026 s, 262 MB/s
snapshot_time: 1512683716.923342172 (secs.nsecs)
read RPCs in flight: 0
write RPCs in flight: 0
pending write pages: 0
pending read pages: 0
read write
pages per rpc rpcs % cum % | rpcs % cum %
1: 37 6 6 | 0 0 0
2: 0 0 6 | 0 0 0
4: 0 0 6 | 0 0 0
8: 0 0 6 | 0 0 0
16: 0 0 6 | 0 0 0
32: 0 0 6 | 0 0 0
64: 0 0 6 | 0 0 0
128: 0 0 6 | 0 0 0
256: 0 0 6 | 0 0 0
512: 0 0 6 | 0 0 0
1024: 500 93 100 | 0 0 0
3000 MiB:
[root@cent7c01 cent7s02]# dd if=/dev/zero bs=1M count=3000 of=./ost0; echo 3 > /proc/sys/vm/drop_caches ; echo clear > /sys/fs/lustre/ldlm/namespaces/cent7s02-OST0000-osc-ffff88022437c800/lru_size; echo c > /proc/fs/lustre/osc/cent7s02-OST0000-osc-ffff88022437c800/rpc_stats; cat ost0 > /dev/null; cat /proc/fs/lustre/osc/cent7s02-OST0000-osc-ffff88022437c800/rpc_stats
3000+0 records in
3000+0 records out
3145728000 bytes (3.1 GB) copied, 10.6623 s, 295 MB/s
snapshot_time: 1512683893.046695412 (secs.nsecs)
read RPCs in flight: 0
write RPCs in flight: 0
pending write pages: 0
pending read pages: 0
read write
pages per rpc rpcs % cum % | rpcs % cum %
1: 53 6 6 | 0 0 0
2: 0 0 6 | 0 0 0
4: 0 0 6 | 0 0 0
8: 0 0 6 | 0 0 0
16: 0 0 6 | 0 0 0
32: 0 0 6 | 0 0 0
64: 0 0 6 | 0 0 0
128: 0 0 6 | 0 0 0
256: 0 0 6 | 0 0 0
512: 0 0 6 | 0 0 0
1024: 750 93 100 | 0 0 0
4000 MiB:
[root@cent7c01 cent7s02]# dd if=/dev/zero bs=1M count=4000 of=./ost0; echo 3 > /proc/sys/vm/drop_caches ; echo clear > /sys/fs/lustre/ldlm/namespaces/cent7s02-OST0000-osc-ffff88022437c800/lru_size; echo c > /proc/fs/lustre/osc/cent7s02-OST0000-osc-ffff88022437c800/rpc_stats; cat ost0 > /dev/null; cat /proc/fs/lustre/osc/cent7s02-OST0000-osc-ffff88022437c800/rpc_stats
4000+0 records in
4000+0 records out
4194304000 bytes (4.2 GB) copied, 13.1352 s, 319 MB/s
snapshot_time: 1512683761.337612432 (secs.nsecs)
read RPCs in flight: 0
write RPCs in flight: 0
pending write pages: 0
pending read pages: 0
read write
pages per rpc rpcs % cum % | rpcs % cum %
1: 8074 89 89 | 0 0 0
2: 0 0 89 | 0 0 0
4: 0 0 89 | 0 0 0
8: 0 0 89 | 0 0 0
16: 0 0 89 | 0 0 0
32: 0 0 89 | 0 0 0
64: 0 0 89 | 0 0 0
128: 0 0 89 | 0 0 0
256: 1 0 89 | 0 0 0
512: 0 0 89 | 0 0 0
1024: 992 10 100 | 0 0 0
—
Larger sizes remain problematic. So when I hit the cached mb limit on the node, something goes totally off the rails, I think. Perhaps we're getting that from the kernel, but it's still a major degradation.
LU-12403will do this work correctly.