<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:23:11 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-9094] OOM caused by huge number of peers in case of INVALID_SERVICE_ID</title>
                <link>https://jira.whamcloud.com/browse/LU-9094</link>
                <project id="10000" key="LU">Lustre</project>
                    <description></description>
                <environment></environment>
        <key id="43686">LU-9094</key>
            <summary>OOM caused by huge number of peers in case of INVALID_SERVICE_ID</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="doug">Doug Oucharek</assignee>
                                    <reporter username="scherementsev">Sergey Cheremencev</reporter>
                        <labels>
                            <label>patch</label>
                    </labels>
                <created>Thu, 9 Feb 2017 15:13:10 +0000</created>
                <updated>Wed, 1 Mar 2017 13:02:24 +0000</updated>
                            <resolved>Wed, 1 Mar 2017 06:41:11 +0000</resolved>
                                                    <fixVersion>Lustre 2.10.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="184135" author="sergey" created="Thu, 9 Feb 2017 15:16:13 +0000"  >&lt;p&gt;Please change the Topic to something like &quot;OOM caused by huge number of peers in case of INVALID_SERVICE_ID&quot;.&lt;/p&gt;</comment>
                            <comment id="184137" author="sergey" created="Thu, 9 Feb 2017 15:56:25 +0000"  >&lt;p&gt;Peer shouldn&apos;t be killed each time in case of INVALID_SERVICE_ID. This produces&lt;br/&gt;
 huge number of peers for the same nid and may cause an OOM.&lt;/p&gt;

&lt;p&gt;Issue could be simple reproduced using lctl ping to the node where lnet is not loaded(ib should be up).&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@pink03 tests]# cat ~/oom.sh
while true; do
	lctl ping 172.18.56.129@o2ib0
done

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The OOM was frequently seen with mlnx-ofa-kernel-2.3 where was used&lt;br/&gt;
 RCU mechanism in mlx4_cq_free. In older mlx4 versions to mitigate&lt;br/&gt;
 the issue mlx4_cq_free is reworked and doesn&apos;t use RCU anymore.&lt;br/&gt;
 Anyway we shouldn&apos;t create and remove tons of peers with the same nid to don&apos;t affect performance and memory.&lt;/p&gt;

&lt;p&gt;Also OOM issue should be reproducible on all mlx5 not depending on mlnx-ofa-kernel version.&lt;br/&gt;
 I reproduced it on mlnx-ofa_kernel-3.4 with mlx5.&lt;/p&gt;

&lt;p&gt;I prepared and tested a set of patches for it. Will send it in the nearest time.&lt;/p&gt;</comment>
                            <comment id="184348" author="gerrit" created="Fri, 10 Feb 2017 15:25:53 +0000"  >&lt;p&gt;Sergey Cheremencev (sergey.cheremencev@seagate.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/25375&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/25375&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9094&quot; title=&quot;OOM caused by huge number of peers in case of INVALID_SERVICE_ID&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9094&quot;&gt;&lt;del&gt;LU-9094&lt;/del&gt;&lt;/a&gt; lnet: remove ni from lnet_finalize&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: a312fcd635e63a79c53dd072ece9a066c1baf342&lt;/p&gt;</comment>
                            <comment id="184349" author="gerrit" created="Fri, 10 Feb 2017 15:27:47 +0000"  >&lt;p&gt;Sergey Cheremencev (sergey.cheremencev@seagate.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/25376&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/25376&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9094&quot; title=&quot;OOM caused by huge number of peers in case of INVALID_SERVICE_ID&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9094&quot;&gt;&lt;del&gt;LU-9094&lt;/del&gt;&lt;/a&gt; o2iblnd: kill timedout txs from ibp_tx_queue&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: fdee92bf4859793dc3fe4911b491ad9d0b21533e&lt;/p&gt;</comment>
                            <comment id="184350" author="gerrit" created="Fri, 10 Feb 2017 15:30:22 +0000"  >&lt;p&gt;Sergey Cheremencev (sergey.cheremencev@seagate.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/25378&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/25378&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9094&quot; title=&quot;OOM caused by huge number of peers in case of INVALID_SERVICE_ID&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9094&quot;&gt;&lt;del&gt;LU-9094&lt;/del&gt;&lt;/a&gt; o2iblnd: reconnect peer for REJ_INVALID_SERVICE_ID&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 496c48a3daa21d0423a387625821de03a57db443&lt;/p&gt;</comment>
                            <comment id="185462" author="gerrit" created="Sat, 18 Feb 2017 23:51:33 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/25378/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/25378/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9094&quot; title=&quot;OOM caused by huge number of peers in case of INVALID_SERVICE_ID&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9094&quot;&gt;&lt;del&gt;LU-9094&lt;/del&gt;&lt;/a&gt; o2iblnd: reconnect peer for REJ_INVALID_SERVICE_ID&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 603aa7a1df6ee6ce6fe0d501a8b2bd1bfdf43bb8&lt;/p&gt;</comment>
                            <comment id="185546" author="sergey" created="Mon, 20 Feb 2017 12:06:09 +0000"  >&lt;p&gt;Please take note that &quot;reconnect peer for REJ_INVALID_SERVICE_ID&quot; without &quot;kill timedout txs from ibp_tx_queue&quot; causes lctl ping hung when lnet is not loaded on the target node(lctl ping waits indefinitely).&lt;br/&gt;
 It was the reason why I pushed all 3 patches together.&lt;/p&gt;</comment>
                            <comment id="186548" author="gerrit" created="Wed, 1 Mar 2017 05:10:22 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/25375/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/25375/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9094&quot; title=&quot;OOM caused by huge number of peers in case of INVALID_SERVICE_ID&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9094&quot;&gt;&lt;del&gt;LU-9094&lt;/del&gt;&lt;/a&gt; lnet: remove ni from lnet_finalize&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: dab78a9efd05e4f22fc83232bdadce347d3dafda&lt;/p&gt;</comment>
                            <comment id="186549" author="gerrit" created="Wed, 1 Mar 2017 05:10:51 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/25376/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/25376/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9094&quot; title=&quot;OOM caused by huge number of peers in case of INVALID_SERVICE_ID&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9094&quot;&gt;&lt;del&gt;LU-9094&lt;/del&gt;&lt;/a&gt; o2iblnd: kill timedout txs from ibp_tx_queue&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 824120da92fe8feb4b4308a136e33ec65fe3b635&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzz3bj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>