<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:55:16 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-5874] DLC: the ongoing traffic was interrupted after adding a new network interface</title>
                <link>https://jira.whamcloud.com/browse/LU-5874</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;1. setup the system and run sanity&lt;br/&gt;
2. add a new network interface on the client side&lt;br/&gt;
3. the traffic was interrupted and keep showing following messages:&lt;br/&gt;
4. after remove the new added interface, system goes back to normal&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;== sanity test 27B: call setstripe on open unlinked file/rename victim == 12:18:00 (1415218680)
Lustre: DEBUG MARKER: == sanity test 27B: call setstripe on open unlinked file/rename victim == 12:18:00 (1415218680)
LNet: Added LNI 192.168.4.74@o2ib [8/256/0/180]
LNet: No route to 192.168.4.47@o2ib via from 10.2.4.74@tcp
Lustre: 4806:0:(client.c:1934:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1415218690/real 1415218690]  req@ffff880824129000 x1483963522156376/t0(0) o400-&amp;gt;lustre-MDT0000-mdc-ffff880434a40800@192.168.4.47@o2ib:12/10 lens 224/224 e 0 to 1 dl 1415218753 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 4806:0:(client.c:1934:ptlrpc_expire_one_request()) Skipped 3 previous similar messages
Lustre: lustre-MDT0000-mdc-ffff880434a40800: Connection to lustre-MDT0000 (at 192.168.4.47@o2ib) was lost; in progress operations using this service will wait for recovery to complete
LNet: Skipped 5 previous similar messages
LustreError: 166-1: MGC192.168.4.47@o2ib: Connection to MGS (at 192.168.4.47@o2ib) was lost; in progress operations using this service will fail
LNet: Removed LNI 192.168.4.74@o2ib
Lustre: lustre-OST0000-osc-ffff880434a40800: Connection restored to lustre-OST0000 (at 192.168.4.47@o2ib)
Lustre: Skipped 2 previous similar messages
LL_IOC_LOV_SETSTRIPE: No such file or directory
LL_IOC_LOV_SETSTRIPE: No such file or directory
Resetting fail_loc on all nodes...done.
PASS 27B (26s)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment></environment>
        <key id="27475">LU-5874</key>
            <summary>DLC: the ongoing traffic was interrupted after adding a new network interface</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="ashehata">Amir Shehata</assignee>
                                    <reporter username="sarah">Sarah Liu</reporter>
                        <labels>
                    </labels>
                <created>Wed, 5 Nov 2014 20:23:29 +0000</created>
                <updated>Mon, 19 Jan 2015 18:12:02 +0000</updated>
                            <resolved>Mon, 19 Jan 2015 18:12:02 +0000</resolved>
                                    <version>Lustre 2.7.0</version>
                                    <fixVersion>Lustre 2.7.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="98551" author="jlevi" created="Thu, 6 Nov 2014 18:31:30 +0000"  >&lt;p&gt;Amir,&lt;br/&gt;
Can you please have a look at this one?&lt;br/&gt;
Thank you!&lt;/p&gt;</comment>
                            <comment id="98558" author="ashehata" created="Thu, 6 Nov 2014 18:38:07 +0000"  >&lt;p&gt;Can you please exact steps used to reproduce this issue?  &lt;/p&gt;

&lt;p&gt;What would be ideal is the set of lnetctl commands used, and any show output to see the change.&lt;/p&gt;</comment>
                            <comment id="98714" author="sarah" created="Sat, 8 Nov 2014 00:31:08 +0000"  >&lt;p&gt;1. setup a lustre filesystem with 1 MDT and 1 OST, servers use o2ib; a router; 1 client uses tcp; mount the system and run sanity on the client side&lt;br/&gt;
2. adding the new network, I only use this command on the client side&lt;br/&gt;
lnetctl net add --net o2ib --if ib0&lt;/p&gt;

&lt;p&gt;then you can see the above errors on the client side&lt;/p&gt;</comment>
                            <comment id="99987" author="ashehata" created="Mon, 24 Nov 2014 22:01:54 +0000"  >&lt;p&gt;The issue here is that both the client and the servers are on ib0 with the addition of the network dynamically.  this makes the configuration invalid due to the presence of the route that bridges the ib0 and tcp.&lt;/p&gt;

&lt;p&gt;There are a couple of options to fix this:&lt;br/&gt;
1. reject the addition of a network which makes the configuration inalid&lt;br/&gt;
2. remove the route&lt;/p&gt;

&lt;p&gt;Currently investigating the best solution.&lt;/p&gt;</comment>
                            <comment id="100008" author="isaac" created="Tue, 25 Nov 2014 05:52:00 +0000"  >&lt;p&gt;If ib0 is added on the client, I thought lnet would automatically switch to ib0 to talk with the servers in ib0. There shouldn&apos;t be those &quot;No route to ......&quot; errors. Maybe I missed something?&lt;/p&gt;

&lt;p&gt;It&apos;s not a good idea to remove the route, because if the ib0 interface is brought down later then the TCP client would not be able to talk to the servers any more (as the route was removed).&lt;/p&gt;</comment>
                            <comment id="100178" author="ashehata" created="Thu, 27 Nov 2014 01:03:56 +0000"  >&lt;p&gt;If I&apos;m reading the following code correctly, from lnet_send()&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;        &lt;span class=&quot;code-comment&quot;&gt;/* Is &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; someone on a local network? */&lt;/span&gt;
	local_ni = lnet_net2ni_locked(LNET_NIDNET(dst_nid), cpt);

        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (local_ni != NULL) {
                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (src_ni == NULL) {
                        src_ni = local_ni;
                        src_nid = src_ni-&amp;gt;ni_nid;
                } &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (src_ni == local_ni) {
			lnet_ni_decref_locked(local_ni, cpt);
		} &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt; {
			lnet_ni_decref_locked(local_ni, cpt);
			lnet_ni_decref_locked(src_ni, cpt);
			lnet_net_unlock(cpt);
			LCONSOLE_WARN(&lt;span class=&quot;code-quote&quot;&gt;&quot;No route to %s via from %s\n&quot;&lt;/span&gt;,
				      libcfs_nid2str(dst_nid),
				      libcfs_nid2str(src_nid));
			&lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; -EINVAL;
		}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It seems to say if you&apos;re trying to send to a local ni, which is the case in this test case with the addition of ib0, then it is expecting that the local_ni and src_ni be the same.  However, the src_nid is still the @tcp nid, which is as far as I could tell is stored in ptlrpc (not 100% sure yet).  But if so, then what would trigger it to update the src_nid?&lt;/p&gt;

&lt;p&gt;I&apos;m also leaning towards rejecting the network addition.  Basically, what I&apos;m doing is before adding an NI dynamically, I check if it&apos;s a remote net, and if so, I reject adding the NI.&lt;/p&gt;</comment>
                            <comment id="100343" author="isaac" created="Mon, 1 Dec 2014 19:29:00 +0000"  >&lt;p&gt;Agree, and I&apos;d suggest:&lt;br/&gt;
1. The command line should print an error message, saying something like &quot;the NI conflicts with route ***, remove the route before adding the NI&quot;. This message should be printed to stderr, rather than from the kernel to dmesg, because that&apos;s the user issued the command.&lt;br/&gt;
2. Document it in the Lustre manual.&lt;/p&gt;</comment>
                            <comment id="100368" author="ashehata" created="Mon, 1 Dec 2014 22:51:39 +0000"  >&lt;p&gt;Some more details:&lt;br/&gt;
mdc_setup() calls client_obd_setup() calls client_import_add_conn() calls import_set_conn calls() ptlrpc_uuid_to_connection() calls LNetDist()&lt;/p&gt;

&lt;p&gt;mdc_setup() is only triggered on startup.  This essentially picks the src_nid and sticks with that through out.  So the addition of a new &quot;closer NI&quot; doesn&apos;t retrigger updating the connection hash maintained in ptlrpc.&lt;/p&gt;

&lt;p&gt;Another option (but not as part of this bug) when a network is added connections should be evaluated and updated if there exists an NI which creates a more preferred path to the destination.&lt;/p&gt;

&lt;p&gt;This will allow LNet to take advantage of going directly over the shortest path when updates occur.  &lt;/p&gt;

&lt;p&gt;Ideally, however, it seems that NIDs shouldn&apos;t be visible outside of LNet.  I realize however, that this would be a major change.&lt;/p&gt;</comment>
                            <comment id="100529" author="gerrit" created="Wed, 3 Dec 2014 01:25:28 +0000"  >&lt;p&gt;Amir Shehata (amir.shehata@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/12912&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/12912&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5874&quot; title=&quot;DLC: the ongoing traffic was interrupted after adding a new network interface&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5874&quot;&gt;&lt;del&gt;LU-5874&lt;/del&gt;&lt;/a&gt; lnet: reject invalid net configuration&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: d4b6589173915723dfebf4d5ce64ce2591180c41&lt;/p&gt;</comment>
                            <comment id="103875" author="gerrit" created="Mon, 19 Jan 2015 17:51:40 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/12912/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/12912/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5874&quot; title=&quot;DLC: the ongoing traffic was interrupted after adding a new network interface&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5874&quot;&gt;&lt;del&gt;LU-5874&lt;/del&gt;&lt;/a&gt; lnet: reject invalid net configuration&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 4aebc21514f49453aeeca83c6f6279d473d61617&lt;/p&gt;</comment>
                            <comment id="103881" author="pjones" created="Mon, 19 Jan 2015 18:12:02 +0000"  >&lt;p&gt;Landed for 2.7&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="27476">LU-5875</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="15616">LU-2456</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzx0bz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>16426</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>