Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.17.0
    • None
    • 9223372036854775807

    Description

      There is an upstream patch series that is adding improved mballoc handling for efficiently finding suitable allocation groups in a filesystem. In particular, patch
      https://patchwork.ozlabs.org/project/linux-ext4/patch/20210209202857.4185846-5-harshadshirwadkar@gmail.com/ "ext4: improve cr 0 / cr 1 group scanning" is the important part of the series.

      Attachments

        Issue Links

          Activity

            [LU-14438] backport ldiskfs mballoc patches
            pjones Peter Jones added a comment -

            Merged for 2.17

            pjones Peter Jones added a comment - Merged for 2.17

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51472/
            Subject: LU-14438 ldiskfs: backport ldiskfs mballoc patches
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 1534c43ccb034048d8ab0a22cb55635116eebe09

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51472/ Subject: LU-14438 ldiskfs: backport ldiskfs mballoc patches Project: fs/lustre-release Branch: master Current Patch Set: Commit: 1534c43ccb034048d8ab0a22cb55635116eebe09
            bobijam Zhenyu Xu added a comment -

            https://review.whamcloud.com/c/fs/lustre-release/+/51472 has separated patches for ldiskfs series.

            bobijam Zhenyu Xu added a comment - https://review.whamcloud.com/c/fs/lustre-release/+/51472 has separated patches for ldiskfs series.

            There are cases where we may want to make empty filesystem performance worse, but the 90% performance better. We could use the new mballoc array lists to spread out allocations across the disk more evenly.

            I had previously considered that we might split groups into two arrays (as we are doing with IOPS groups in LU-16750) 80% at the start of the disk and 20% at the end of the disk (or 90/10%) so groups at end of the filesystem are only used when the first groups are mostly full. However, this would mean that performance would suddenly drop once the filesystem hit 80% full.

            We could instead do things like split the groups into eg. 16 separate arrays by offset, and then have a clock that rotates allocations around the regions eg. every second, so that groups are not used start-to-end during allocation. We would still want some locality in allocations, so we are not seeking wildly around the disk for files being written concurrently, but are always using the end of the disk some fraction of the time. This would hopefully even out the performance over the filesystem lifetime for uses that demand more consistent performance instead of "best possible".

            We could even hint via "lfs ladvise" and/or "ionice" for a file or process to force all file allocations to the slow part of the disk for cases of archiving old files. I don't think it makes sense to allow "improving" allocations because everyone would want that and it would be no different than today.

            adilger Andreas Dilger added a comment - There are cases where we may want to make empty filesystem performance worse , but the 90% performance better. We could use the new mballoc array lists to spread out allocations across the disk more evenly. I had previously considered that we might split groups into two arrays (as we are doing with IOPS groups in LU-16750 ) 80% at the start of the disk and 20% at the end of the disk (or 90/10%) so groups at end of the filesystem are only used when the first groups are mostly full. However, this would mean that performance would suddenly drop once the filesystem hit 80% full. We could instead do things like split the groups into eg. 16 separate arrays by offset, and then have a clock that rotates allocations around the regions eg. every second, so that groups are not used start-to-end during allocation. We would still want some locality in allocations, so we are not seeking wildly around the disk for files being written concurrently, but are always using the end of the disk some fraction of the time. This would hopefully even out the performance over the filesystem lifetime for uses that demand more consistent performance instead of "best possible". We could even hint via " lfs ladvise " and/or " ionice " for a file or process to force all file allocations to the slow part of the disk for cases of archiving old files. I don't think it makes sense to allow "improving" allocations because everyone would want that and it would be no different than today.
            gerrit Gerrit Updater added a comment - - edited

            I've tried to port some of the upstream mballoc patches in this, but it looks too big for a single patch.

            "Zhenyu Xu <bobijam@hotmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51472
            Subject: LU-14438 ldiskfs: backport ldiskfs mballoc patches
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 2439579001a928714a640ec469a2d833ea5e8337

            gerrit Gerrit Updater added a comment - - edited I've tried to port some of the upstream mballoc patches in this, but it looks too big for a single patch. "Zhenyu Xu <bobijam@hotmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51472 Subject: LU-14438 ldiskfs: backport ldiskfs mballoc patches Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 2439579001a928714a640ec469a2d833ea5e8337

            I've filed LU-16155 to enhance debugfs to allow "importing" the block and inode allocation maps into a newly-formatted filesystem to simplify testing of this problem. We could collect the debugfs information from real filesystems that are having allocation performance issues as needed in order to test changes to mballoc.

            adilger Andreas Dilger added a comment - I've filed LU-16155 to enhance debugfs to allow "importing" the block and inode allocation maps into a newly-formatted filesystem to simplify testing of this problem. We could collect the debugfs information from real filesystems that are having allocation performance issues as needed in order to test changes to mballoc.

            People

              ablagodarenko Artem Blagodarenko
              adilger Andreas Dilger
              Votes:
              2 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: