<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:25:09 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-2432] ptlrpc_alloc_rqbd spinning on vmap_area_lock on MDS</title>
                <link>https://jira.whamcloud.com/browse/LU-2432</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;vmalloc based allocations can potentially take a very long time to complete due to a regression in the kernel. As a result, I&apos;ve seen our MDS &quot;lock up&quot; for certain periods of time while all of the cores spin on the vmap_area_lock down in ptlrpc_alloc_rqbd.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;    2012-11-01 11:34:28 Pid: 34505, comm: mdt02_051
    2012-11-01 11:34:28 
    2012-11-01 11:34:28 Call Trace:
    2012-11-01 11:34:28  [&amp;lt;ffffffff81273155&amp;gt;] ? rb_insert_color+0x125/0x160
    2012-11-01 11:34:28  [&amp;lt;ffffffff81149f1f&amp;gt;] ? __vmalloc_area_node+0x5f/0x190
    2012-11-01 11:34:28  [&amp;lt;ffffffff810609ea&amp;gt;] __cond_resched+0x2a/0x40
    2012-11-01 11:34:28  [&amp;lt;ffffffff814efa60&amp;gt;] _cond_resched+0x30/0x40
    2012-11-01 11:34:28  [&amp;lt;ffffffff8115fa88&amp;gt;] kmem_cache_alloc_node_notrace+0xa8/0x130
    2012-11-01 11:34:28  [&amp;lt;ffffffff8115fc8b&amp;gt;] __kmalloc_node+0x7b/0x100
    2012-11-01 11:34:28  [&amp;lt;ffffffffa05a2a40&amp;gt;] ? cfs_cpt_vmalloc+0x20/0x30 [libcfs]
    2012-11-01 11:34:28  [&amp;lt;ffffffff81149f1f&amp;gt;] __vmalloc_area_node+0x5f/0x190
    2012-11-01 11:34:28  [&amp;lt;ffffffffa05a2a40&amp;gt;] ? cfs_cpt_vmalloc+0x20/0x30 [libcfs]
    2012-11-01 11:34:28  [&amp;lt;ffffffff81149eb2&amp;gt;] __vmalloc_node+0xa2/0xb0
    2012-11-01 11:34:28  [&amp;lt;ffffffffa05a2a40&amp;gt;] ? cfs_cpt_vmalloc+0x20/0x30 [libcfs]
    2012-11-01 11:34:28  [&amp;lt;ffffffff8114a199&amp;gt;] vmalloc_node+0x29/0x30
    2012-11-01 11:34:28  [&amp;lt;ffffffffa05a2a40&amp;gt;] cfs_cpt_vmalloc+0x20/0x30 [libcfs]
    2012-11-01 11:34:28  [&amp;lt;ffffffffa0922ffe&amp;gt;] ptlrpc_alloc_rqbd+0x13e/0x690 [ptlrpc]
    2012-11-01 11:34:28  [&amp;lt;ffffffffa09235b5&amp;gt;] ptlrpc_grow_req_bufs+0x65/0x1b0 [ptlrpc]
    2012-11-01 11:34:28  [&amp;lt;ffffffffa0927fbd&amp;gt;] ptlrpc_main+0xd0d/0x19f0 [ptlrpc]
    2012-11-01 11:34:28  [&amp;lt;ffffffffa09272b0&amp;gt;] ? ptlrpc_main+0x0/0x19f0 [ptlrpc]
    2012-11-01 11:34:28  [&amp;lt;ffffffff8100c14a&amp;gt;] child_rip+0xa/0x20
    2012-11-01 11:34:28  [&amp;lt;ffffffffa09272b0&amp;gt;] ? ptlrpc_main+0x0/0x19f0 [ptlrpc]
    2012-11-01 11:34:28  [&amp;lt;ffffffffa09272b0&amp;gt;] ? ptlrpc_main+0x0/0x19f0 [ptlrpc]
    2012-11-01 11:34:28  [&amp;lt;ffffffff8100c140&amp;gt;] ? child_rip+0x0/0x20
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Here&apos;s a couple links regarding the kernel regression:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;&lt;a href=&quot;http://lkml.indiana.edu/hypermail/linux/kernel/1006.3/00091.html&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://lkml.indiana.edu/hypermail/linux/kernel/1006.3/00091.html&lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;https://github.com/torvalds/linux/commit/89699605fe7cfd8611900346f61cb6cbf179b10a&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/torvalds/linux/commit/89699605fe7cfd8611900346f61cb6cbf179b10a&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
                <environment></environment>
        <key id="16860">LU-2432</key>
            <summary>ptlrpc_alloc_rqbd spinning on vmap_area_lock on MDS</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bobijam">Zhenyu Xu</assignee>
                                    <reporter username="prakash">Prakash Surya</reporter>
                        <labels>
                            <label>sequoia</label>
                    </labels>
                <created>Wed, 5 Dec 2012 18:08:28 +0000</created>
                <updated>Mon, 18 Mar 2013 09:25:06 +0000</updated>
                            <resolved>Mon, 18 Mar 2013 09:25:06 +0000</resolved>
                                    <version>Lustre 2.4.0</version>
                                    <fixVersion>Lustre 2.4.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>9</watches>
                                                                            <comments>
                            <comment id="48835" author="prakash" created="Wed, 5 Dec 2012 18:26:23 +0000"  >&lt;p&gt;See: &lt;a href=&quot;http://review.whamcloud.com/4439&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/4439&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="48862" author="pjones" created="Thu, 6 Dec 2012 10:28:26 +0000"  >&lt;p&gt;Thanks Prakash!&lt;/p&gt;

&lt;p&gt;Bobijam could you please review this patch?&lt;/p&gt;</comment>
                            <comment id="48891" author="bobijam" created="Thu, 6 Dec 2012 21:45:44 +0000"  >&lt;p&gt;svc-&amp;gt;srv_buf_size can be MDS_BUFSIZE = (362 + LOV_MAX_STRIPE_COUNT * 56 + 1024) ~= 110KB for MDS service, could it be problematic?&lt;/p&gt;</comment>
                            <comment id="49061" author="adilger" created="Tue, 11 Dec 2012 13:37:30 +0000"  >&lt;p&gt;We discussed at LAD that one problem with the request buffers is that the incoming LNET buffers (sorry, I don&apos;t have the correct LNET terms here) are allocated only large enough for the largest single request, though most requests are smaller than this.  Unfortunately, as soon as a single RPC is waiting in the incoming buffer, there is no longer enough space in the buffer to receive a maximum-sized incoming request.  This means that each buffer is only ever used for a single message, regardless of how many might fit.&lt;/p&gt;

&lt;p&gt;A solution that was discussed was to make the request buffer be 2x as large as the maximum request size and/or rounded up to the next power-of-two boundary.  That would at least increase the buffer utilization to 50%, and would likely allow tens of requests per LNET buffer.&lt;/p&gt;

&lt;p&gt;It may be that the patch for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2424&quot; title=&quot;add memory limits for ptlrpc service&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2424&quot;&gt;&lt;del&gt;LU-2424&lt;/del&gt;&lt;/a&gt; will already address this issue?&lt;/p&gt;</comment>
                            <comment id="49073" author="prakash" created="Tue, 11 Dec 2012 14:16:48 +0000"  >&lt;p&gt;I wasn&apos;t at LAD, so I&apos;m unaware of that discussion. But, what trade offs are being made between the number of buffers used and the size of each? i.e why can&apos;t we just have one huge buffer, increasing the utilization to &lt;tt&gt;(BUFFER_SIZE-REQUEST_SIZE)/BUFFER_SIZE&lt;/tt&gt; percent (trending towards 100% as &lt;tt&gt;BUFFER_SIZE&lt;/tt&gt; grows large)? Granted I don&apos;t understand the LNET code well, so I must be missing something which makes that obviously the wrong thing to do.&lt;/p&gt;</comment>
                            <comment id="49090" author="adilger" created="Tue, 11 Dec 2012 21:21:45 +0000"  >&lt;p&gt;My (imperfect) understanding is that the receive buffers cannot be re-used until all of the requests therein are processed. That means the buffered are filled from the start, processed, and then returned to the incoming buffer list. If the buffer is too large, then requests sitting in the buffer may wait too long to be processed, or the buffer &lt;em&gt;still&lt;/em&gt; will not be fully utilized if there is an upper limit for how long a request will wait. &lt;/p&gt;
</comment>
                            <comment id="49100" author="liang" created="Wed, 12 Dec 2012 01:19:54 +0000"  >&lt;p&gt;I didn&apos;t realize that we still don&apos;t have the &quot;big request buffer&quot; fix, then this should be the right way to fix this problem and &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2424&quot; title=&quot;add memory limits for ptlrpc service&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2424&quot;&gt;&lt;del&gt;LU-2424&lt;/del&gt;&lt;/a&gt;.&lt;br/&gt;
I would suggest to have 512K or 1M as request buffer size, as Andreas said, a very large request buffer can&apos;t be reused if any of those (thousands or more) requests is pending on something, so it might have some other issues.&lt;br/&gt;
And I still think it&apos;s a nice improvement if we only allow one thread (per CPT) to enter allocating path.&lt;/p&gt;</comment>
                            <comment id="49824" author="liang" created="Tue, 1 Jan 2013 04:09:45 +0000"  >&lt;p&gt;I posted a patch for this: &lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/#change,4939&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,4939&lt;/a&gt;&lt;br/&gt;
and another patch to resolve buffer utilizaiton issue: &lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/#change,4940&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,4940&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="49975" author="prakash" created="Fri, 4 Jan 2013 16:09:55 +0000"  >&lt;p&gt;Liang, Why would limiting the vmalloc calls to a single thread fix the issue? That one thread will still be affected by the regression. Will the other threads still be able to service requests despite needing more request buffers? Or will they all have wait for this single thread to finish the allocations?&lt;/p&gt;</comment>
                            <comment id="49996" author="liang" created="Fri, 4 Jan 2013 22:33:23 +0000"  >&lt;p&gt;I think we might not care one thread (or very few threads) spinning, because each service has tens or even hundreds of threads, and servers normally have many CPU cores, all other threads can serve requests, they will not wait for buffer allocating at all.  &lt;br/&gt;
The key issue of this ticket is vmalloc can&apos;t be parallelized, so it&apos;s a waste if all threads/CPUs try to allocate buffers at the same time. &lt;/p&gt;</comment>
                            <comment id="50063" author="prakash" created="Mon, 7 Jan 2013 12:32:22 +0000"  >&lt;blockquote&gt;
&lt;p&gt;all other threads can serve requests, they will not wait for buffer allocating at all&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Perfect, that&apos;s what I wanted to verify with you. Thanks for the clarification!&lt;/p&gt;</comment>
                            <comment id="54245" author="adilger" created="Mon, 18 Mar 2013 09:24:50 +0000"  >&lt;p&gt;Both &lt;a href=&quot;http://review.whamcloud.com/4939&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/4939&lt;/a&gt; and &lt;a href=&quot;http://review.whamcloud.com/4940&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/4940&lt;/a&gt; have landed, so I think this bug could be closed.  There should only be a single thread calling vmalloc() now.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="17361">LU-2708</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="16844">LU-2424</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvdbr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>5764</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>