Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12970

improve mballoc for huge filesystems

Details

    • 9223372036854775807

    Description

      there are number of reports demonstrating a poor behaviour of mballoc on huge filesystems. in one report it was 688TB filesystem with 5.3M groups.
      mballoc tries to allocate large chunks of space, for small allocations it tries to preallocate and share large chunks. while this is good in terms of fragmentation and streaming IO allocation itself may need to scan many groups to find a good candidate.
      mballoc maintains internal in-memory structures (buddy cache) to speed up searching, but that cache is built from regular on-disk bitmaps, meaning IO. and if cache is cold, populating it may take a lot of time.

      there are few ideas how to improve that:

      • skip more groups using less information when possible
      • stop scanning if too many groups have been scanned (loaded) and use best found
      • prefetch bitmaps (use lazy init thread? prefetch at scanning)

      another option for prefetching would be to skip non-initialized groups, but start an async read for the corresponding bitmap.
      also, when mballoc marks the blocks used (allocation has been just made) it could make sense to check/prefetch the subsequent group(s) which is likely a goal for subsequent allocation - while the caller are writting IO to just allocated blocks, the next group(s) will be prefetchted and ready to use.

      Attachments

        Issue Links

          Activity

            [LU-12970] improve mballoc for huge filesystems
            adilger Andreas Dilger made changes -
            Labels Original: ldiskfs New: ldiskfs performance scalability
            lixi_wc Li Xi made changes -
            Link New: This issue is related to EX-7628 [ EX-7628 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-16691 [ LU-16691 ]
            adilger Andreas Dilger made changes -
            Labels New: ldiskfs
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-15319 [ LU-15319 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-16155 [ LU-16155 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to DDN-3110 [ DDN-3110 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-8365 [ LU-8365 ]
            adilger Andreas Dilger added a comment - - edited

            Link to backport of upstream mballoc patches in LU-14438, which may be enough to resolve this issue.

            adilger Andreas Dilger added a comment - - edited Link to backport of upstream mballoc patches in LU-14438 , which may be enough to resolve this issue.
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-14438 [ LU-14438 ]

            People

              wc-triage WC Triage
              bzzz Alex Zhuravlev
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated: