<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:17:18 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-15319] Weird mballoc behaviour</title>
                <link>https://jira.whamcloud.com/browse/LU-15319</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;A weird mballoc behavior in sudden STREAM_ALLOC allocator head jump after a target mount:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# grep -H &quot;&quot; /proc/fs/ldiskfs/md*/mb_last_group
/proc/fs/ldiskfs/md0/mb_last_group:0
/proc/fs/ldiskfs/md2/mb_last_group:0
# echo &amp;gt; /sys/kernel/debug/tracing/trace
# nobjlo=2 nobjhi=2 thrlo=1024 thrhi=1024 size=393216 rszlo=4096 rszhi=4096 tests_str=&quot;write&quot; obdfilter-survey 2&amp;gt;&amp;amp;1 | tee /root/obdfilter-survey.log
Fri Dec  3 12:25:19 UTC 2021 Obdfilter-survey for case=disk from kjlmo1304
ost  2 sz 805306368K rsz 4096K obj    4 thr 2048 write 16552.35 [4580.64, 9382.91] 
/usr/bin/iokit-libecho: line 236: 253095 Killed                  remote_shell $host &quot;vmstat 5 &amp;gt;&amp;gt; $host_vmstatf&quot; &amp;amp;&amp;gt;/dev/null
done!
# grep -H &quot;&quot; /proc/fs/ldiskfs/md*/mb_last_group
/proc/fs/ldiskfs/md0/mb_last_group:114337
/proc/fs/ldiskfs/md2/mb_last_group:130831
#
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The streaming allocator head jumped right to the first non-initialized group and now it is the last inited group (the target fs is almost empty):&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@kjlmo1304 ~]# dumpe2fs /dev/md0 | sed &apos;/BLOCK/q&apos; | tail -24
....
Group 114335: (Blocks 3746529280-3746562047) csum 0x1b7a [INODE_UNINIT, ITABLE_ZEROED]
  Block bitmap at 3741319328 (bg #114176 + 160)
  Inode bitmap at 3741319584 (bg #114176 + 416)
  Inode table at 3741322225-3741322240 (bg #114176 + 3057)
  32768 free blocks, 128 free inodes, 0 directories, 128 unused inodes
  Free blocks: 3746529280-3746562047
  Free inodes: 14634881-14635008
Group 114336: (Blocks 3746562048-3746594815) csum 0x37c1 [INODE_UNINIT, ITABLE_ZEROED]
  Block bitmap at 3741319329 (bg #114176 + 161)
  Inode bitmap at 3741319585 (bg #114176 + 417)
  Inode table at 3741322241-3741322256 (bg #114176 + 3073)
  32768 free blocks, 128 free inodes, 0 directories, 128 unused inodes
  Free blocks: 3746562048-3746594815
  Free inodes: 14635009-14635136
Group 114337: (Blocks 3746594816-3746627583) csum 0xbacd [INODE_UNINIT, ITABLE_ZEROED]
  Block bitmap at 3741319330 (bg #114176 + 162)
  Inode bitmap at 3741319586 (bg #114176 + 418)
  Inode table at 3741322257-3741322272 (bg #114176 + 3089)
  32768 free blocks, 128 free inodes, 0 directories, 128 unused inodes
  Free blocks: 3746594816-3746627583
  Free inodes: 14635137-14635264
Group 114338: (Blocks 3746627584-3746660351) csum 0xca57 [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The above jump is not big enough to cause performance impact, but the same behavior was observed on another system with 2M block group initialized, that mb_last_group jump shifted block allocations on an empty fs over the middle of the disk device with approximately 15% write / read slowdown.&lt;/p&gt;

&lt;p&gt;Looks like it was due to the following checks in ldiksfs_mb_good_group()&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
        &lt;span class=&quot;code-comment&quot;&gt;/* We only &lt;span class=&quot;code-keyword&quot;&gt;do&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; the grp has never been initialized */&lt;/span&gt;
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (unlikely(LDISKFS_MB_GRP_NEED_INIT(grp))) {
                &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; ret;

                /* cr=0/1 is a very optimistic search to find large
                 * good chunks almost &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; free. &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; buddy data is
                 * not ready, then &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; optimization makes no sense */

                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (cr &amp;lt; 2 &amp;amp;&amp;amp; !ldiskfs_mb_uninit_on_disk(ac-&amp;gt;ac_sb, group))
                        &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; 0;
                ret = ldiskfs_mb_init_group(ac-&amp;gt;ac_sb, group);
                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (ret)
                        &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; 0;
        }

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt; introduced by&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;ecb68b8 LU-13291 ldiskfs: mballoc don&apos;t skip uninit-on-disk groups
6a7a700 LU-12988 ldiskfs: skip non-loaded groups at cr=0/1 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
</description>
                <environment></environment>
        <key id="67459">LU-15319</key>
            <summary>Weird mballoc behaviour</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="bzzz">Alex Zhuravlev</assignee>
                                    <reporter username="zam">Alexander Zarochentsev</reporter>
                        <labels>
                    </labels>
                <created>Mon, 6 Dec 2021 15:00:28 +0000</created>
                <updated>Mon, 25 Sep 2023 20:35:30 +0000</updated>
                            <resolved>Mon, 25 Sep 2023 20:35:30 +0000</resolved>
                                                    <fixVersion>Lustre 2.16.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="371844" author="adilger" created="Wed, 10 May 2023 21:50:05 +0000"  >&lt;p&gt;I suspect that this issue could be resolved with the new mballoc allocator from upstream kernels.&lt;/p&gt;</comment>
                            <comment id="387160" author="adilger" created="Mon, 25 Sep 2023 20:35:30 +0000"  >&lt;p&gt;The mballoc array-based group selection is almost ready to land in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14438&quot; title=&quot;backport ldiskfs mballoc patches&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14438&quot;&gt;LU-14438&lt;/a&gt; and I think that any development in that area should first start with backporting the next set of mballoc patches from upstream ext4, which address most of these issues.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="72375">LU-16162</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="57389">LU-12970</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="62900">LU-14438</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i02bnb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>