<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:02:54 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-6748] excessive client reconnect to OSS servers under heavy IO work load.</title>
                <link>https://jira.whamcloud.com/browse/LU-6748</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;While testing the last pre-2.8 code I noticed heavy client reconnects to OSS servers. The error on the client side was:&lt;/p&gt;

&lt;p&gt;Lustre: sultan-OST0008-osc-ffff8803ea302800: Connection to sultan-OST0008 (at 10.37.248.69@o2ib1) was lost; in progress operations using this service will wait for recovery to complete&lt;br/&gt;
Lustre: Skipped 55 previous similar messages&lt;br/&gt;
Lustre: 5355:0:(client.c:2009:ptlrpc_expire_one_request()) Skipped 61 previous similar messages&lt;br/&gt;
Lustre: 5350:0:(client.c:2009:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: &lt;span class=&quot;error&quot;&gt;&amp;#91;sent 1434742560/real 1434742560&amp;#93;&lt;/span&gt;  req@ffff8803c23fb6c0 x1504421695570504/t0(0) o8-&amp;gt;sultan-OST0023-osc-ffff8803ea302800@10.37.248.72@o2ib1:28/4 lens 520/544 e 0 to 1 dl 1434742568 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1&lt;br/&gt;
Lustre: 5350:0:(client.c:2009:ptlrpc_expire_one_request()) Skipped 7 previous similar messages&lt;br/&gt;
Lustre: sultan-OST0000-osc-ffff8803ea302800: Connection restored to sultan-OST0000 (at 10.37.248.69@o2ib1)&lt;br/&gt;
Lustre: Skipped 27 previous similar messages&lt;br/&gt;
Lustre: 5356:0:(client.c:2009:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: &lt;span class=&quot;error&quot;&gt;&amp;#91;sent 1434742782/real 1434742782&amp;#93;&lt;/span&gt;  req@ffff8803c1b639c0 x1504421695572244/t0(0) o400-&amp;gt;sultan-OST0034-osc-ffff8803ea302800@10.37.248.69@o2ib1:28/4 lens 224/224 e 0 to 1 dl 1434742789 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1&lt;br/&gt;
Lustre: sultan-OST0000-osc-ffff8803ea302800: Connection to sultan-OST0000 (at 10.37.248.69@o2ib1) was lost; in progress operations using this service will wait for recovery to complete&lt;br/&gt;
Lustre: Skipped 41 previous similar messages&lt;br/&gt;
Lustre: 5356:0:(client.c:2009:ptlrpc_expire_one_request()) Skipped 73 previous similar messages&lt;br/&gt;
Lustre: sultan-OST0000-osc-ffff8803ea302800: Connection restored to sultan-OST0000 (at 10.37.248.69@o2ib1)&lt;br/&gt;
Lustre: Skipped 41 previous similar messages&lt;br/&gt;
Lustre: sultan-OST0003-osc-ffff8803ea302800: Connection restored to sultan-OST0003 (at 10.37.248.72@o2ib1)&lt;br/&gt;
Lustre: Skipped 27 previous similar messages&lt;/p&gt;

&lt;p&gt;and the messages seen on the OSS side are:&lt;/p&gt;

&lt;p&gt;20639.820176] Lustre: sultan-OST0008: Client 57c62113-31f1-f463-ffeb-9d0c7541279d (at 26@gni1) reconnecting&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;20639.829910&amp;#93;&lt;/span&gt; Lustre: Skipped 20 previous similar messages&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;20676.881745&amp;#93;&lt;/span&gt; Lustre: sultan-OST000c: Client 57c62113-31f1-f463-ffeb-9d0c7541279d (at 26@gni1) reconnecting&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;20676.891462&amp;#93;&lt;/span&gt; Lustre: Skipped 29 previous similar messages&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;20868.910972&amp;#93;&lt;/span&gt; Lustre: sultan-OST0004: Client 57c62113-31f1-f463-ffeb-9d0c7541279d (at 26@gni1) reconnecting&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;20868.920682&amp;#93;&lt;/span&gt; Lustre: Skipped 23 previous similar messages&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;20906.993360&amp;#93;&lt;/span&gt; Lustre: sultan-OST0000: Client 57c62113-31f1-f463-ffeb-9d0c7541279d (at 26@gni1) reconnecting&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;20906.993364&amp;#93;&lt;/span&gt; Lustre: sultan-OST0004: Client 57c62113-31f1-f463-ffeb-9d0c7541279d (at 26@gni1) reconnecting&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;20906.993368&amp;#93;&lt;/span&gt; Lustre: Skipped 17 previous similar messages&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;20907.018191&amp;#93;&lt;/span&gt; Lustre: Skipped 11 previous similar messages&lt;/p&gt;

&lt;p&gt;This occured when I ran a file per process IOR job across 20 nodes with 32 threads per client. &lt;/p&gt;</description>
                <environment></environment>
        <key id="30752">LU-6748</key>
            <summary>excessive client reconnect to OSS servers under heavy IO work load.</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="jay">Jinshan Xiong</assignee>
                                    <reporter username="simmonsja">James A Simmons</reporter>
                        <labels>
                    </labels>
                <created>Fri, 19 Jun 2015 20:55:20 +0000</created>
                <updated>Mon, 31 Aug 2015 16:46:42 +0000</updated>
                            <resolved>Mon, 31 Aug 2015 16:46:26 +0000</resolved>
                                    <version>Lustre 2.8.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                            <comments>
                            <comment id="119149" author="simmonsja" created="Fri, 19 Jun 2015 21:00:26 +0000"  >&lt;p&gt;Uploaded logs to ftp.whamcloud.com/uploads/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6748&quot; title=&quot;excessive client reconnect to OSS servers under heavy IO work load.&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6748&quot;&gt;&lt;del&gt;LU-6748&lt;/del&gt;&lt;/a&gt;/*&lt;/p&gt;</comment>
                            <comment id="119165" author="jay" created="Fri, 19 Jun 2015 23:46:19 +0000"  >&lt;p&gt;the client was experiencing slow reply, but OSS even has ever sent early replies.&lt;/p&gt;

&lt;p&gt;Here are some suspicious messages:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;&amp;lt;node_health:5.1&amp;gt; APID:831 (xtcheckhealth) WARNING: Advanced_features and anyapid check are both configured on. Application test could falsely mark nodes unhealthy.

&amp;lt;node_health:5.1&amp;gt; RESID:3043 (xtcheckhealth) WARNING: Advanced_features and anyapid check are both configured on. Application test could falsely mark nodes unhealthy.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I believe they are from GNI lnd, did this message imply anything?&lt;/p&gt;</comment>
                            <comment id="119170" author="simmonsja" created="Sat, 20 Jun 2015 00:43:33 +0000"  >&lt;p&gt;That is the Cray health check warning the file system is sick. The message comes from a user land utility.&lt;/p&gt;</comment>
                            <comment id="119180" author="jay" created="Sat, 20 Jun 2015 17:19:26 +0000"  >&lt;p&gt;When did you notice the regression? Can you identify which patches, or a range of date when the patches were landed?&lt;/p&gt;

&lt;p&gt;From your description, it looks like this is related to ldlm lock handling. Please check the LRU size on the client side and how many locks are cached in the LRU when the problem is reproduced. If possible, please dump the locks so that we can make further investigation.&lt;/p&gt;</comment>
                            <comment id="119243" author="simmonsja" created="Mon, 22 Jun 2015 18:15:50 +0000"  >&lt;p&gt;Just as I thought. I found the source of the regression. I had map_on_demand set on the OSS servers. I unmounted the file system and restarted the o2iblnd layer without map_on_demand set and I stopped seeing reconnecting issues. So the map_on_demand is a problem on normal systems as well.&lt;/p&gt;</comment>
                            <comment id="125373" author="simmonsja" created="Thu, 27 Aug 2015 14:48:52 +0000"  >&lt;p&gt;We can close this as a duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6723&quot; title=&quot;Setting map_on_demand for o2iblnd driver prevents lustre bring up.&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6723&quot;&gt;&lt;del&gt;LU-6723&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                        <issuelink>
            <issuekey id="30667">LU-6723</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="30667">LU-6723</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzxg87:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>