Details
Description
Our test:
Test pool contains 106 HDDs in two draid2:11d:1s:53c vdevs with two NVMe special devices.
Run obdfilter-survey on the OST with rszlo=128k and rszhi=4M
During the run, monitor ZFS IOs with `zpool iostat -r 10`
We observed the overwhelming majority of IOs to disk were size 4k, for all record sizes obdfilter-survey tested. Write rates were much worse than we observed in the past on the same system.
For rsz 128K - 4K, we saw (updated results)
ost 1 sz 268435456K rsz 128K obj 128 thr 1024 write 1245.37 [ 246.13, 6871.38] read 3627.31 [2415.46, 22124.38] ost 1 sz 268435456K rsz 256K obj 128 thr 1024 write 419.23 [ 104.22, 9325.36] read 3000.64 [ 491.82, 11144.38] ost 1 sz 268435456K rsz 512K obj 128 thr 1024 write 166.58 [ 0.00, 4366.21] read 1829.00 [ 237.22, 5730.30] ost 1 sz 268435456K rsz 1024K obj 128 thr 1024 write 299.09 [ 0.00, 8173.13] read 3357.30 [1417.14, 7969.38] ost 1 sz 268435456K rsz 2048K obj 128 thr 1024 write 161.40 [ 0.00, 12826.61] read 2221.06 [ 375.63, 6200.53] ost 1 sz 268435456K rsz 4096K obj 128 thr 1024 write 123.78 [ 0.00, 4330.61] read 1784.79 [ 347.93, 3969.97]
vs earlier performance
ost 1 sz 268435456K rsz 128K obj 128 thr 1024 write 2528.98 [1197.09, 10672.67] read 11173.41 [3197.39, 17061.16] ost 1 sz 268435456K rsz 256K obj 128 thr 1024 write 3714.67 [2467.70, 3841.77] read 10762.76 [3691.20, 17293.81] ost 1 sz 268435456K rsz 512K obj 128 thr 1024 write 5686.06 [3543.69, 6544.91] read 10420.38 [2314.15, 17217.78] ost 1 sz 268435456K rsz 1024K obj 128 thr 1024 write 8539.14 [5295.25, 13158.89] read 12566.79 [5110.33, 17110.34] ost 1 sz 268435456K rsz 2048K obj 128 thr 1024 write 13092.97 [7478.98, 13707.40] read 15171.22 [10413.99, 17550.10] ost 1 sz 268435456K rsz 4096K obj 128 thr 1024 write 16943.89 [9380.09, 20107.31] read 14525.17 [5994.41, 16746.13]
We also observed performance measured via IOR was much worse. The files created via IOR had a 4k data block size, per ZDB, like this:
The above statement was incorrect - the example file below with a 4K block sizes wasn't created via IOR, it was created by the standard RHEL 8 utility "cp"
Dataset kern4/ost1 [ZPL], ID 644, cr_txg 28, 13.1T, 6380645 objects Object lvl iblk dblk dsize dnsize lsize %full type 99201 3 128K 4K 247M 512 23.1M 100.00 ZFS plain file
We have also observed the issue with files created by mpifileutils utility "dsync" and "dd".
We tested with older ZFS and Lustre versions, and identified this patch as the culprit:
https://review.whamcloud.com/c/fs/lustre-release/+/47768 "LU-15963 osd-zfs: use contiguous chunk to grow blocksize"
Testing on the same system with 2.15.7_2.llnl, which had patch 47768 reverted, restored performance and showed the expected data block sizes.
Attachments
Issue Links
- is related to
-
LU-15963 sanityn test_56b: OSS OOM with ZFS
-
- Reopened
-