Details
-
Improvement
-
Resolution: Fixed
-
Minor
-
None
-
9223372036854775807
Description
There is an upstream patch series that is adding improved mballoc handling for efficiently finding suitable allocation groups in a filesystem. In particular, patch
https://patchwork.ozlabs.org/project/linux-ext4/patch/20210209202857.4185846-5-harshadshirwadkar@gmail.com/ "ext4: improve cr 0 / cr 1 group scanning" is the important part of the series.
Attachments
Issue Links
- is related to
-
LU-8365 Fix mballoc stream allocator to better use free space at start of drive
-
- Open
-
-
LU-15319 Weird mballoc behaviour
-
- Resolved
-
-
LU-17153 Random block allocation policy in ldiskfs
-
- Open
-
-
LU-16162 ldiskfs: use low disk tracks for block allocation on empty or moderately full filesystems.
-
- Open
-
-
LU-16750 optimize ldiskfs internal metadata allocation for hybrid storage LUNs
-
- Open
-
-
LU-12970 improve mballoc for huge filesystems
-
- Open
-
-
LU-16155 allow importing inode/block allocation maps to new ldisks filesystem
-
- Open
-
-
LU-17980 improve ldiskfs "-o discard" performance
-
- Open
-
-
LU-14305 add persistent tuning for mb_c3_threshold
-
- Resolved
-
There are cases where we may want to make empty filesystem performance worse, but the 90% performance better. We could use the new mballoc array lists to spread out allocations across the disk more evenly.
I had previously considered that we might split groups into two arrays (as we are doing with IOPS groups in LU-16750) 80% at the start of the disk and 20% at the end of the disk (or 90/10%) so groups at end of the filesystem are only used when the first groups are mostly full. However, this would mean that performance would suddenly drop once the filesystem hit 80% full.
We could instead do things like split the groups into eg. 16 separate arrays by offset, and then have a clock that rotates allocations around the regions eg. every second, so that groups are not used start-to-end during allocation. We would still want some locality in allocations, so we are not seeking wildly around the disk for files being written concurrently, but are always using the end of the disk some fraction of the time. This would hopefully even out the performance over the filesystem lifetime for uses that demand more consistent performance instead of "best possible".
We could even hint via "lfs ladvise" and/or "ionice" for a file or process to force all file allocations to the slow part of the disk for cases of archiving old files. I don't think it makes sense to allow "improving" allocations because everyone would want that and it would be no different than today.