<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:00:15 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-6441] OST problems following router node crash, inactive threads, clients continuously reconnecting</title>
                <link>https://jira.whamcloud.com/browse/LU-6441</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;As part of acceptance testing, an lnet router node was deliberately crashed (via sysrq-trigger). Following the crash, a set of OSTS nodes started reporting problems, hung threads, timeouts, clients continually losing connection, reconnecting, etc. Nodes are held on ptlrpc_abort_bulk() function.&lt;/p&gt;

&lt;p&gt;Logs shows messages like this:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;Feb 25 21:17:15 somenode kernel: LustreError: 32278:0:(service.c:3214:ptlrpc_svcpt_health_check()) ost_io: unhealthy - request has been waiting 16940s
Feb 25 21:17:15 somenode kernel: LustreError: 32278:0:(service.c:3214:ptlrpc_svcpt_health_check()) Skipped 5 previous similar messages
Feb 25 21:20:31 somenode kernel: Lustre: 106418:0:(niobuf.c:282:ptlrpc_abort_bulk()) Unexpectedly &lt;span class=&quot;code-object&quot;&gt;long&lt;/span&gt; timeout: desc ffff8802e3970000
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The system uses a fine-grained routing configuration. It was one of the 5 routers in this set that was killed. The expectation is minimal disruption for clients, the other 4 routers are still functioning, clients should detect that one is down and send traffic to the other 4 nodes. &lt;/p&gt;</description>
                <environment>Servers runs 2.5.1, clients run Lustre 2.5.1</environment>
        <key id="29428">LU-6441</key>
            <summary>OST problems following router node crash, inactive threads, clients continuously reconnecting</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="emoly.liu">Emoly Liu</assignee>
                                    <reporter username="artem_blagodarenko">Artem Blagodarenko</reporter>
                        <labels>
                            <label>patch</label>
                    </labels>
                <created>Wed, 8 Apr 2015 09:03:47 +0000</created>
                <updated>Wed, 22 Jun 2022 20:40:19 +0000</updated>
                            <resolved>Fri, 1 May 2015 11:32:18 +0000</resolved>
                                    <version>Lustre 2.5.1</version>
                                    <fixVersion>Lustre 2.8.0</fixVersion>
                    <fixVersion>Lustre 2.9.0</fixVersion>
                                        <due></due>
                            <votes>1</votes>
                                    <watches>9</watches>
                                                                            <comments>
                            <comment id="111716" author="gerrit" created="Wed, 8 Apr 2015 10:51:23 +0000"  >&lt;p&gt;Artem Blagodarenko (artem_blagodarenko@xyratex.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/14399&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/14399&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6441&quot; title=&quot;OST problems following router node crash, inactive threads, clients continuously reconnecting&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6441&quot;&gt;&lt;del&gt;LU-6441&lt;/del&gt;&lt;/a&gt; ptlrpc: ptlrpc_bulk_abort unlink all entries in bd_mds&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 0feb07fa9852dfeb8fa68ea80587cb7f11a7ab04&lt;/p&gt;</comment>
                            <comment id="112485" author="artem_blagodarenko" created="Tue, 21 Apr 2015 08:16:20 +0000"  >&lt;p&gt;This bug happened when 4MB io is enabled. We noticed it already on two clusters. I believe this patch is important for somebody who going to use 4MB io.&lt;/p&gt;</comment>
                            <comment id="112486" author="icostelloddn" created="Tue, 21 Apr 2015 08:28:07 +0000"  >&lt;p&gt;Agreed re 4MB rpc, also doesn&apos;t require an LNET router crash/panic, can easily reproduce it with a similar trigger such as pulling IB cables (or whatever network you are using) on the clients while the clients are doing I/O to the filesystem.&lt;/p&gt;

&lt;p&gt;Patching the server with the above patch I can confirm resolves the problem. Have done this on site at ANU/NCI on the available test kit, i.e. was able to reproduce, patch the server and install then spent a day and a half trying to reproduce (when I could hit this 2/3 attempts on a server without the patch). &lt;/p&gt;</comment>
                            <comment id="113962" author="gerrit" created="Fri, 1 May 2015 03:20:36 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/14399/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/14399/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6441&quot; title=&quot;OST problems following router node crash, inactive threads, clients continuously reconnecting&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6441&quot;&gt;&lt;del&gt;LU-6441&lt;/del&gt;&lt;/a&gt; ptlrpc: ptlrpc_bulk_abort unlink all entries in bd_mds&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 0a6470219a8602d7a56fe1c5171dba4a42244738&lt;/p&gt;</comment>
                            <comment id="113990" author="pjones" created="Fri, 1 May 2015 11:32:18 +0000"  >&lt;p&gt;Landed for 2.8&lt;/p&gt;</comment>
                            <comment id="165435" author="gerrit" created="Fri, 9 Sep 2016 04:35:10 +0000"  >&lt;p&gt;Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/22403&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/22403&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6441&quot; title=&quot;OST problems following router node crash, inactive threads, clients continuously reconnecting&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6441&quot;&gt;&lt;del&gt;LU-6441&lt;/del&gt;&lt;/a&gt; ptlrpc: fix the problem of the patch&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 6bfc8faac211e5c2dd5a369ac23438565d3d16c0&lt;/p&gt;</comment>
                            <comment id="165463" author="pjones" created="Fri, 9 Sep 2016 12:37:55 +0000"  >&lt;p&gt;Jinshan&lt;/p&gt;

&lt;p&gt;It would be better to open a new ticket and link to this one with any changes needed to the original patch&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="165918" author="gerrit" created="Tue, 13 Sep 2016 20:03:29 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/22403/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/22403/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6441&quot; title=&quot;OST problems following router node crash, inactive threads, clients continuously reconnecting&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6441&quot;&gt;&lt;del&gt;LU-6441&lt;/del&gt;&lt;/a&gt; ptlrpc: fix sanity 224c for different RPC sizes&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 6cde14a5df781ae29da88f98a2559eb4342fe1f3&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10120">
                    <name>Blocker</name>
                                            <outwardlinks description="is blocking">
                                                        </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                                        </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                                        </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="29885">LU-6573</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="30988">LU-6808</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzxac7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>