<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:24:37 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-16169] parallel e2fsck pass1 balanced group distribution</title>
                <link>https://jira.whamcloud.com/browse/LU-16169</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;When running e2fsck with multiple threads (e.g. &quot;&lt;tt&gt;-m 32&lt;/tt&gt;&quot;) there are currently an equal number of groups assigned to each thread (&lt;tt&gt;groups_count / num_threads&lt;/tt&gt;).  However, since the number of inodes in each group is uneven, this results in some threads doing far more work during pass1, which takes them much longer to complete:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Pass 1: Checking inodes, blocks, and sizes
[Thread 0] Scan group range [0, 1328)
[Thread 1] Scan group range [1328, 2656)
[Thread 2] Scan group range [2656, 3984)
:
:
[Thread 30] Scan group range [39840, 41168)
[Thread 31] Scan group range [41168, 42615)
[Thread 20] Pass 1: Memory used: 17224k/237268k (16059k/1165k), time: 107.31/120.13/345.32
[Thread 20] Pass 1: I/O read: 2265MB, write: 0MB, rate: 21.11MB/s
[Thread 20] Scanned group range [26560, 27888), inodes 2318941
[Thread 12] Pass 1: Memory used: 17224k/237268k (15959k/1266k), time: 107.69/120.49/346.50
[Thread 12] Pass 1: I/O read: 2248MB, write: 0MB, rate: 20.88MB/s
[Thread 12] Scanned group range [15936, 17264), inodes 2300847
:
:
[Thread 0] Pass 1: Memory used: 22404k/249936k (18332k/4073k), time: 955.69/318.00/1483.58
[Thread 0] Pass 1: I/O read: 22356MB, write: 0MB, rate: 23.39MB/s
[Thread 0] Scanned group range [0, 1328), inodes 22856885
[Thread 22] Pass 1: Memory used: 23388k/249936k (19317k/4072k), time: 1189.31/359.09/1751.43
[Thread 22] Pass 1: I/O read: 29900MB, write: 0MB, rate: 25.14MB/s
[Thread 22] Scanned group range [29216, 30544), inodes 30342690
[Thread 27] Pass 1: Memory used: 23388k/258768k (19226k/4163k), time: 1567.00/417.52/2140.94
[Thread 27] Pass 1: I/O read: 36898MB, write: 0MB, rate: 23.55MB/s
[Thread 27] Scanned group range [35856, 37184), inodes 37782784
:
:
[Thread 26] Pass 1: Memory used: 41720k/53936k (16911k/24810k), time: 1788.72/445.44/2332.17
[Thread 26] Pass 1: I/O read: 42476MB, write: 0MB, rate: 23.75MB/s
[Thread 26] Scanned group range [34528, 35856), inodes 43494656
[Thread 31] Pass 1: Memory used: 42360k/15692k (15264k/27097k), time: 1907.30/446.44/2342.45
[Thread 31] Pass 1: I/O read: 45931MB, write: 0MB, rate: 24.08MB/s
[Thread 31] Scanned group range [41168, 42615), inodes 47032901
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In the above example, while each thread is assigned 1329  groups, some threads only process ~2.5M inodes and complete in ~100s, while other threads have over 40M inodes assigned and take ~1800s to complete.   This works out to be roughly 24k inodes/sec for each of the threads, regardless of how many inodes are processed.  If the 545M inodes were evenly distributed across the threads in this case, pass1 could have finished in about 705s instead of 1907s.&lt;/p&gt;

&lt;p&gt;Groups must currently be allocated consecutively to each thread in order to more easily manage in-memory state, so it wouldn&apos;t be very easy to have a producer-consumer model where threads process one group at a time on an as-available basis.&lt;/p&gt;

&lt;p&gt;To more evenly distribute inodes across the pass1 threads, one option would be to calculate the average number of inodes per thread (about &lt;tt&gt;545M/32=17M&lt;/tt&gt; in this case), and then walk groups consecutively and accumulate the inode count until approximately the average number of inodes are assigned to a thread (within &lt;tt&gt;average_inodes_per_group / 2&lt;/tt&gt; below the average, or if the average is exceeded).   This would use the used inodes count in the group descriptors, maybe with some maximum number of groups per thread like &lt;tt&gt;5x total_groups / num_threads&lt;/tt&gt; to avoid issues if the group descriptors are corrupted, possibly reverting to &quot;equal&quot; group subdivision if this doesn&apos;t work out. &lt;/p&gt;

&lt;p&gt;This will more evenly distribute the inodes, and hence runtime, to each thread and should reduce overall pass1 execution time.&lt;/p&gt;</description>
                <environment></environment>
        <key id="72425">LU-16169</key>
            <summary>parallel e2fsck pass1 balanced group distribution</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="adilger">Andreas Dilger</assignee>
                                    <reporter username="adilger">Andreas Dilger</reporter>
                        <labels>
                            <label>e2fsck</label>
                    </labels>
                <created>Mon, 19 Sep 2022 18:15:18 +0000</created>
                <updated>Thu, 7 Dec 2023 19:18:28 +0000</updated>
                            <resolved>Thu, 7 Dec 2023 19:12:10 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                            <comments>
                            <comment id="349055" author="gerrit" created="Sat, 8 Oct 2022 06:17:03 +0000"  >&lt;p&gt;&quot;Andreas Dilger &amp;lt;adilger@whamcloud.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/tools/e2fsprogs/+/48806&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/tools/e2fsprogs/+/48806&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16169&quot; title=&quot;parallel e2fsck pass1 balanced group distribution&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16169&quot;&gt;&lt;del&gt;LU-16169&lt;/del&gt;&lt;/a&gt; e2fsck: improve parallel thread balance&lt;br/&gt;
Project: tools/e2fsprogs&lt;br/&gt;
Branch: master-lustre&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 2c6ea4f08e99dc13d30052bae21837720b88bd47&lt;/p&gt;</comment>
                            <comment id="384106" author="gerrit" created="Tue, 29 Aug 2023 18:24:16 +0000"  >&lt;p&gt;&quot;Andreas Dilger &amp;lt;adilger@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/c/tools/e2fsprogs/+/48806/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/tools/e2fsprogs/+/48806/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16169&quot; title=&quot;parallel e2fsck pass1 balanced group distribution&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16169&quot;&gt;&lt;del&gt;LU-16169&lt;/del&gt;&lt;/a&gt; e2fsck: improve parallel thread balance&lt;br/&gt;
Project: tools/e2fsprogs&lt;br/&gt;
Branch: master-lustre&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 4e82819edcafbdd3bb21fde9d86b0a6a80dfcf3d&lt;/p&gt;</comment>
                            <comment id="394912" author="adilger" created="Thu, 30 Nov 2023 09:25:44 +0000"  >&lt;p&gt;There is something wrong with the balancing of the groups in e2fsck:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# e2fsck -fn -m 8 /dev/vgmyth/lvmythmdt0.ssd
e2fsck 1.47.0-wc5 (27-Sep-2023)
Warning!  /dev/vgmyth/lvmythmdt0.ssd is in use.
Warning: skipping journal recovery because doing a read-only filesystem check.
Pass 1: Checking inodes, blocks, and sizes
[Thread 0] Scan group range [0, 20), inode_count = 655358/655360
[Thread 1] Scan group range [20, 40), inode_count = 655360/655360
[Thread 2] Scan group range [40, 66), inode_count = 647340/655360
[Thread 3] Scan group range [66, 92), inode_count = 651555/655360
[Thread 4] Scan group range [92, 112), inode_count = 655360/655360
[Thread 5] Scan group range [112, 160), inode_count = 524320/655360
[Thread 6] Scan group range [160, 160), inode_count = 0/655360
[Thread 6] Scanned group range [160, 160), inodes 0/0
[Thread 7] Scan group range [160, 160), inode_count = 0/655360
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="394919" author="gerrit" created="Thu, 30 Nov 2023 10:02:53 +0000"  >&lt;p&gt;&quot;Andreas Dilger &amp;lt;adilger@whamcloud.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/tools/e2fsprogs/+/53292&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/tools/e2fsprogs/+/53292&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16169&quot; title=&quot;parallel e2fsck pass1 balanced group distribution&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16169&quot;&gt;&lt;del&gt;LU-16169&lt;/del&gt;&lt;/a&gt; e2fsck: fix parallel thread balance&lt;br/&gt;
Project: tools/e2fsprogs&lt;br/&gt;
Branch: master-lustre&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 470e360de92c82129637a59c2daff40fd6af0430&lt;/p&gt;</comment>
                            <comment id="395778" author="gerrit" created="Wed, 6 Dec 2023 23:45:28 +0000"  >&lt;p&gt;&quot;Li Dongyang &amp;lt;dongyangli@ddn.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/c/tools/e2fsprogs/+/53292/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/tools/e2fsprogs/+/53292/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16169&quot; title=&quot;parallel e2fsck pass1 balanced group distribution&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16169&quot;&gt;&lt;del&gt;LU-16169&lt;/del&gt;&lt;/a&gt; e2fsck: fix parallel thread balance&lt;br/&gt;
Project: tools/e2fsprogs&lt;br/&gt;
Branch: master-lustre&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: f4ba833854aceb430f4ded14789d55eb4a30b4f6&lt;/p&gt;</comment>
                            <comment id="395901" author="adilger" created="Thu, 7 Dec 2023 19:18:28 +0000"  >&lt;p&gt;Another example of very imbalanced e2fsck distribution with e2fsck-1.47.0-wc4 before the patch was applied:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Pass 1: Checking inodes, blocks, and sizes
[Thread 0] Scan group range [0, 1595904)
[Thread 1] Scan group range [1595904, 3191808)
[Thread 2] Scan group range [3191808, 4787712)
[Thread 3] Scan group range [4787712, 6383616)
[Thread 1] Pass 1: Memory used: 42928k/724200k (28658k/14271k), time:  0.52/ 1.91/ 0.08
[Thread 1] Pass 1: I/O read: 1MB, write: 0MB, rate: 1.92MB/s
[Thread 1] Scanned group range [1595904, 3191808), inodes 1
[Thread 3] Pass 1: Memory used: 42928k/855876k (28647k/14282k), time:  0.68/ 2.23/ 0.11
[Thread 3] Pass 1: I/O read: 1MB, write: 0MB, rate: 1.47MB/s
[Thread 3] Scanned group range [4787712, 6383616), inodes 1
[Thread 2] Pass 1: Memory used: 54884k/864088k (47076k/7809k), time: 19.02/ 4.05/ 0.41
[Thread 2] Pass 1: I/O read: 9MB, write: 0MB, rate: 0.47MB/s
[Thread 2] Scanned group range [3191808, 4787712), inodes 1058
[Thread 0] Pass 1: Memory used: 422932k/864088k (421572k/1361k), time: 1743.93/98.29/11.38
[Thread 0] Pass 1: I/O read: 24030MB, write: 1MB, rate: 13.78MB/s
[Thread 0] Scanned group range [0, 1595904), inodes 46070748
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This had three threads take a total of 20s before completing their (very few) assigned groups, while thread 0 took take 1744s (88x longer).  With proper balancing all of the threads could have finished this in 441s or less (1/4 of the time).  The number of inodes processed by each thread is proportional to the amount of data read, and thread 0 in this case read 2184x as much data as the other 3 threads together.  It may be that the speedup would be even more than 4x since the threads would have been performing overlapping IO and compute, so would not have been idle waiting for IO completion.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="65449">LU-14894</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="61985">LU-14213</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="72426">LU-16170</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i030lz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>