<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:47:10 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-11815] MDT-MDT connection stuck and never restored</title>
                <link>https://jira.whamcloud.com/browse/LU-11815</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;running mdtest on dne2 configuration (two MDS and one MDT per MDS), MDT-MDT connection disconnected seveal times and reconnection fails and aborted.&lt;br/&gt;
As far as I see log, there are some indication of network errors between MDS.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Dec 20 07:02:18 mds13 kernel: LNetError: 71965:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 0 seconds
Dec 20 07:02:18 mds13 kernel: LNetError: 71965:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.11.226@o2ib10 (6): c: 0, oc: 0, rc: 63
Dec 20 07:02:18 mds13 kernel: LNet: 67774:0:(o2iblnd.c:941:kiblnd_create_conn()) peer 10.0.11.226@o2ib10 - queue depth reduced from 128 to 63  to allow for qp creation
Dec 20 07:02:19 mds13 kernel: Lustre: scratch0-MDT0000: Received new LWP connection from 10.0.11.226@o2ib10, removing former export from same NID
Dec 20 07:02:19 mds13 kernel: Lustre: scratch0-MDT0000: Connection restored to 10.0.11.226@o2ib10 (at 10.0.11.226@o2ib10)
Dec 20 07:02:19 mds13 kernel: Lustre: Skipped 29 previous similar messages
Dec 20 07:02:19 mds13 kernel: Lustre: 73687:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1545257025/real 0]  req@ffff91e9f7715700 x1620318877676672/t0(0) o1000-&amp;gt;scratch0-MDT0001-osp-MDT0000@10.0.11.226@o2ib10:24/4 lens 368/4320 e 0 to 1 dl 1545257036 ref 3 fl Rpc:X/0/ffffffff rc 0/-1
Dec 20 07:02:19 mds13 kernel: Lustre: 73687:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Dec 20 07:02:19 mds13 kernel: Lustre: scratch0-MDT0001-osp-MDT0000: Connection to scratch0-MDT0001 (at 10.0.11.226@o2ib10) was lost; in progress operations using this service will wait for recovery to complete
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Dec 20 07:17:42 mds13 kernel: LustreError: Skipped 3 previous similar messages
Dec 20 07:17:47 mds13 kernel: LNet: 71968:0:(o2iblnd_cb.c:408:kiblnd_handle_rx()) PUT_NACK from 10.0.11.226@o2ib10
Dec 20 07:18:07 mds13 kernel: LNet: 67774:0:(o2iblnd.c:941:kiblnd_create_conn()) peer 10.0.11.226@o2ib10 - queue depth reduced from 128 to 63  to allow for qp creation
Dec 20 07:18:07 mds13 kernel: LNet: 67774:0:(o2iblnd.c:941:kiblnd_create_conn()) Skipped 51 previous similar messages
Dec 20 07:20:01 mds13 systemd[1]: Started Session 332 of user root.
Dec 20 07:20:01 mds13 systemd[1]: Starting Session 332 of user root.
Dec 20 07:20:40 mds13 systemd-logind[2938]: Removed session 331.
Dec 20 07:19:40 mds13 kernel: Lustre: scratch0-MDT0001-osp-MDT0000: Connection to scratch0-MDT0001 (at 10.0.11.226@o2ib10) was lost; in progress operations using this service will wait for recovery to complete
Dec 20 07:19:40 mds13 kernel: Lustre: Skipped 1 previous similar message
Dec 20 07:20:06 mds13 kernel: LNet: 71965:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.11.226@o2ib10: connected
Dec 20 07:20:12 mds13 kernel: LNet: 71969:0:(o2iblnd_cb.c:408:kiblnd_handle_rx()) PUT_NACK from 10.0.11.226@o2ib10
Dec 20 07:21:52 mds13 systemd-logind[2938]: New session 333 of user root.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Although it&apos;s simple network configuration (just single switch) and I didn&apos;t see any network errors between MDS and clent/OSS, it suspect still there are network problems between MDSs?  &lt;/p&gt;</description>
                <environment>2.12.0-RC3</environment>
        <key id="54358">LU-11815</key>
            <summary>MDT-MDT connection stuck and never restored</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="sihara">Shuichi Ihara</reporter>
                        <labels>
                    </labels>
                <created>Wed, 19 Dec 2018 22:32:02 +0000</created>
                <updated>Wed, 19 Dec 2018 22:43:46 +0000</updated>
                                                                                <due></due>
                            <votes>0</votes>
                                    <watches>2</watches>
                                                                            <comments>
                            <comment id="238866" author="sihara" created="Wed, 19 Dec 2018 22:43:46 +0000"  >&lt;p&gt;OSSs-MDSs are fine, but MDS-MDS are stuck or fail.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@mds13 ~]# lctl list_nids
10.0.11.225@o2ib10
[root@mds14 ~]#  lctl list_nids
10.0.11.226@o2ib10

[root@mds13 ~]# lctl ping  10.0.11.226@o2ib10
^C
[root@mds14 ~]# lctl ping  10.0.11.225@o2ib10
failed to ping 10.0.11.225@o2ib10: Input/output error
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@mds13 ~]# clush -g oss lctl ping 10.0.11.225@o2ib10
es14k3-vm1: 12345-0@lo
es14k3-vm1: 12345-10.0.11.225@o2ib10
es14k3-vm2: 12345-0@lo
es14k3-vm2: 12345-10.0.11.225@o2ib10
es14k3-vm3: 12345-0@lo
es14k3-vm3: 12345-10.0.11.225@o2ib10
es14k3-vm4: 12345-0@lo
es14k3-vm4: 12345-10.0.11.225@o2ib10
[root@mds13 ~]# clush -g oss lctl ping 10.0.11.226@o2ib10
es14k3-vm1: 12345-0@lo
es14k3-vm1: 12345-10.0.11.226@o2ib10
es14k3-vm2: 12345-0@lo
es14k3-vm2: 12345-10.0.11.226@o2ib10
es14k3-vm3: 12345-0@lo
es14k3-vm3: 12345-10.0.11.226@o2ib10
es14k3-vm4: 12345-0@lo
es14k3-vm4: 12345-10.0.11.226@o2ib10
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="31683" name="messages-mds13.txt" size="42516" author="sihara" created="Wed, 19 Dec 2018 22:34:17 +0000"/>
                            <attachment id="31682" name="messages-mds14.txt" size="29320" author="sihara" created="Wed, 19 Dec 2018 22:34:17 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i008fb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10021"><![CDATA[2]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>