Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17153

Random block allocation policy in ldiskfs

Details

    • New Feature
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.16.0
    • None
    • 3
    • 9223372036854775807

    Description

      There have been the number of mballoc optimizations, but allocator in general allocates blocks from lower block groups to later in sequential order if the filesystem is free.
      It would be nice to have another allocator policy that allows e.g. pseudo-random allocation across block groups in the ldiskfs.

      There are some use cases

      1. Consistent block allocation performance
        Even peak performance is slow in that policy, it can keep consist performance regardless filesystem is empty and full.
      2. Benchmark, test and debug purpose
        In order to optimize mballoc or run benchmark, it would have fragmented filesystem conditions to see how improvements work.
        Today, we are using fallocate (allocate/punch-hole) to make such filesystem conditions. If a random block allocator policy is available, it could help for that debug/benchmark.

      Attachments

        Issue Links

          Activity

            [LU-17153] Random block allocation policy in ldiskfs

            The ticket LU-10946 is intended to allow a fragmentation map to be loaded from an existing filesystem into a test filesystem in order to better simulate slow performance from a customer filesystem.

            For pseudo-random block allocation, this should be relatively straight forward to add an ldiskfs mballoc allocation policy to just try random group numbers to allocate space. The groups could still be checked for "good" status so that we don't try to allocate too-small chunks and increase fragmentation, but this would distribute the write allocation across the whole device uniformly and should average out the bandwidth over the lifetime of the filesystem (i.e. some fraction of "slow" group allocation would happen when the filesystem is empty, and some fraction of "fast" group allocation would happen when the filesystem is nearly full).

            adilger Andreas Dilger added a comment - The ticket LU-10946 is intended to allow a fragmentation map to be loaded from an existing filesystem into a test filesystem in order to better simulate slow performance from a customer filesystem. For pseudo-random block allocation, this should be relatively straight forward to add an ldiskfs mballoc allocation policy to just try random group numbers to allocate space. The groups could still be checked for "good" status so that we don't try to allocate too-small chunks and increase fragmentation, but this would distribute the write allocation across the whole device uniformly and should average out the bandwidth over the lifetime of the filesystem (i.e. some fraction of "slow" group allocation would happen when the filesystem is empty, and some fraction of "fast" group allocation would happen when the filesystem is nearly full).

            People

              wc-triage WC Triage
              sihara Shuichi Ihara
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: