Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
None
-
master
-
3
-
9223372036854775807
Description
stripe width and stride was calculated based on optimal_io_size that kernel got from backend storage devices. If storage controller returns large optimal io size (e.g. 16MB) to kernel, stripe_width and stride are are large value (e.g. optimal_io_size/blcok size) at mkfs.
[root@es18k-vm1 ~]# dumpe2fs -h /dev/sda | grep RAID dumpe2fs 1.44.3.wc1 (23-July-2018) RAID stride: 4096 RAID stripe width: 4096 [root@es18k-vm1 ~]# dumpe2fs -h /dev/sdi | grep RAID dumpe2fs 1.44.3.wc1 (23-July-2018) RAID stride: 512 RAID stripe width: 512 [root@es18k-vm1 ~]# cat /sys/block/sda/queue/optimal_io_size 16777216 [root@es18k-vm1 ~]# cat /sys/block/sdi/queue/optimal_io_size 2097152
However, such large stripe_width and stride size causes performance regression. because ext4 takes more costs to find stripe_width-sized free chunks.
2MB Chunk size | 256K Chunk size | |||
---|---|---|---|---|
Write(MB/s) | Read(MB/s) | Write(MB/s) | Read(MB/s) | |
stripe_width=512,stride=512 | 10,810 | 10,124 | 10,492 | 6,923 |
stripe_width=1024,stride=1024 | 10,793 | 10,064 | 10,431 | 6,921 |
stripe_width=2048,stride=2048 | 8,047 | 10,080 | 6,629 | 7,381 |
stripe_width=4096,stride=4096 | 7,350 | 10,089 | 6,505 | 7,282 |
Also, stripe_width and stride are tunable (e.g. -E stripe_width= 4096,stride=4096) option to mkfs. If administrator formats OSTs with wrong stripe_width and stride, it may cause unexpected performance regressions.
mkfs.lustre should avoid such large stripe_width/stride and prints a warning message if it's trying to configure.
Looks like this is fixed and just needs a new e2fsprogs release so it is available