Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16169

parallel e2fsck pass1 balanced group distribution

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • None
    • None
    • 9223372036854775807

    Description

      When running e2fsck with multiple threads (e.g. "-m 32") there are currently an equal number of groups assigned to each thread (groups_count / num_threads). However, since the number of inodes in each group is uneven, this results in some threads doing far more work during pass1, which takes them much longer to complete:

      Pass 1: Checking inodes, blocks, and sizes
      [Thread 0] Scan group range [0, 1328)
      [Thread 1] Scan group range [1328, 2656)
      [Thread 2] Scan group range [2656, 3984)
      :
      :
      [Thread 30] Scan group range [39840, 41168)
      [Thread 31] Scan group range [41168, 42615)
      [Thread 20] Pass 1: Memory used: 17224k/237268k (16059k/1165k), time: 107.31/120.13/345.32
      [Thread 20] Pass 1: I/O read: 2265MB, write: 0MB, rate: 21.11MB/s
      [Thread 20] Scanned group range [26560, 27888), inodes 2318941
      [Thread 12] Pass 1: Memory used: 17224k/237268k (15959k/1266k), time: 107.69/120.49/346.50
      [Thread 12] Pass 1: I/O read: 2248MB, write: 0MB, rate: 20.88MB/s
      [Thread 12] Scanned group range [15936, 17264), inodes 2300847
      :
      :
      [Thread 0] Pass 1: Memory used: 22404k/249936k (18332k/4073k), time: 955.69/318.00/1483.58
      [Thread 0] Pass 1: I/O read: 22356MB, write: 0MB, rate: 23.39MB/s
      [Thread 0] Scanned group range [0, 1328), inodes 22856885
      [Thread 22] Pass 1: Memory used: 23388k/249936k (19317k/4072k), time: 1189.31/359.09/1751.43
      [Thread 22] Pass 1: I/O read: 29900MB, write: 0MB, rate: 25.14MB/s
      [Thread 22] Scanned group range [29216, 30544), inodes 30342690
      [Thread 27] Pass 1: Memory used: 23388k/258768k (19226k/4163k), time: 1567.00/417.52/2140.94
      [Thread 27] Pass 1: I/O read: 36898MB, write: 0MB, rate: 23.55MB/s
      [Thread 27] Scanned group range [35856, 37184), inodes 37782784
      :
      :
      [Thread 26] Pass 1: Memory used: 41720k/53936k (16911k/24810k), time: 1788.72/445.44/2332.17
      [Thread 26] Pass 1: I/O read: 42476MB, write: 0MB, rate: 23.75MB/s
      [Thread 26] Scanned group range [34528, 35856), inodes 43494656
      [Thread 31] Pass 1: Memory used: 42360k/15692k (15264k/27097k), time: 1907.30/446.44/2342.45
      [Thread 31] Pass 1: I/O read: 45931MB, write: 0MB, rate: 24.08MB/s
      [Thread 31] Scanned group range [41168, 42615), inodes 47032901
      

      In the above example, while each thread is assigned 1329 groups, some threads only process ~2.5M inodes and complete in ~100s, while other threads have over 40M inodes assigned and take ~1800s to complete. This works out to be roughly 24k inodes/sec for each of the threads, regardless of how many inodes are processed. If the 545M inodes were evenly distributed across the threads in this case, pass1 could have finished in about 705s instead of 1907s.

      Groups must currently be allocated consecutively to each thread in order to more easily manage in-memory state, so it wouldn't be very easy to have a producer-consumer model where threads process one group at a time on an as-available basis.

      To more evenly distribute inodes across the pass1 threads, one option would be to calculate the average number of inodes per thread (about 545M/32=17M in this case), and then walk groups consecutively and accumulate the inode count until approximately the average number of inodes are assigned to a thread (within average_inodes_per_group / 2 below the average, or if the average is exceeded). This would use the used inodes count in the group descriptors, maybe with some maximum number of groups per thread like 5x total_groups / num_threads to avoid issues if the group descriptors are corrupted, possibly reverting to "equal" group subdivision if this doesn't work out.

      This will more evenly distribute the inodes, and hence runtime, to each thread and should reduce overall pass1 execution time.

      Attachments

        Issue Links

          Activity

            People

              adilger Andreas Dilger
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: