Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Unresolved
Priority: Medium
Fix Version/s: None
Affects Version/s: Lustre 2.15.7
Labels:
- llnl
- topllnl
Environment:
zfs-osd OSTs with HDD pools
lustre-2.15.7_1.llnl-1.t4.x86_64
zfs-2.2.8_1llnl-1.t4.x86_64

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

Our test:
Test pool contains 106 HDDs in two draid2:11d:1s:53c vdevs with two NVMe special devices.
Run obdfilter-survey on the OST with rszlo=128k and rszhi=4M
During the run, monitor ZFS IOs with `zpool iostat -r 10`

We observed the overwhelming majority of IOs to disk were size 4k, for all record sizes obdfilter-survey tested. Write rates were much worse than we observed in the past on the same system.

For rsz 128K - 4K, we saw (updated results)

ost  1 sz 268435456K rsz  128K obj  128 thr 1024 write 1245.37 [ 246.13, 6871.38] read 3627.31 [2415.46, 22124.38] 
ost  1 sz 268435456K rsz  256K obj  128 thr 1024 write  419.23 [ 104.22, 9325.36] read 3000.64 [ 491.82, 11144.38] 
ost  1 sz 268435456K rsz  512K obj  128 thr 1024 write  166.58 [   0.00, 4366.21] read 1829.00 [ 237.22, 5730.30] 
ost  1 sz 268435456K rsz 1024K obj  128 thr 1024 write  299.09 [   0.00, 8173.13] read 3357.30 [1417.14, 7969.38] 
ost  1 sz 268435456K rsz 2048K obj  128 thr 1024 write  161.40 [   0.00, 12826.61] read 2221.06 [ 375.63, 6200.53] 
ost  1 sz 268435456K rsz 4096K obj  128 thr 1024 write  123.78 [   0.00, 4330.61] read 1784.79 [ 347.93, 3969.97]

vs earlier performance

ost  1 sz 268435456K rsz  128K obj  128 thr 1024 write 2528.98 [1197.09, 10672.67] read 11173.41 [3197.39, 17061.16] 
ost  1 sz 268435456K rsz  256K obj  128 thr 1024 write 3714.67 [2467.70, 3841.77] read 10762.76 [3691.20, 17293.81] 
ost  1 sz 268435456K rsz  512K obj  128 thr 1024 write 5686.06 [3543.69, 6544.91] read 10420.38 [2314.15, 17217.78] 
ost  1 sz 268435456K rsz 1024K obj  128 thr 1024 write 8539.14 [5295.25, 13158.89] read 12566.79 [5110.33, 17110.34] 
ost  1 sz 268435456K rsz 2048K obj  128 thr 1024 write 13092.97 [7478.98, 13707.40] read 15171.22 [10413.99, 17550.10] 
ost  1 sz 268435456K rsz 4096K obj  128 thr 1024 write 16943.89 [9380.09, 20107.31] read 14525.17 [5994.41, 16746.13]

~~We also observed performance measured via IOR was much worse. The files created via IOR had a 4k data block size, per ZDB, like this:~~

The above statement was incorrect - the example file below with a 4K block sizes wasn't created via IOR, it was created by the standard RHEL 8 utility "cp"

Dataset kern4/ost1 [ZPL], ID 644, cr_txg 28, 13.1T, 6380645 objects

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
     99201    3   128K     4K   247M     512  23.1M  100.00  ZFS plain file

We have also observed the issue with files created by mpifileutils utility "dsync" and "dd".

We tested with older ZFS and Lustre versions, and identified this patch as the culprit:
https://review.whamcloud.com/c/fs/lustre-release/+/47768 "LU-15963 osd-zfs: use contiguous chunk to grow blocksize"

Testing on the same system with 2.15.7_2.llnl, which had patch 47768 reverted, restored performance and showed the expected data block sizes.

Attachments

Issue Links

is related to

LU-15963 sanityn test_56b: OSS OOM with ZFS

Reopened

Activity

People

Assignee:: Alex Zhuravlev

Reporter:: Olaf Faaland

Votes:: 0 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 21/Jul/25 9:17 PM

Updated:: 11/Sep/25 2:44 PM