<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:41:03 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-11112] lnet: improve error msg in lnet_sock_create()</title>
                <link>https://jira.whamcloud.com/browse/LU-11112</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;The kernel_bind() call in lnet_sock_create() may fail either due to&lt;br/&gt;
 problem with the local port, or the local IP address, but the error message currently only includes the port. It would be helpful if the message included both items when indicating a fatal error.&lt;/p&gt;

&lt;p&gt;Background: We&apos;ve encoutered an issue where LNET had picked a virtual IP address (used for non-Lustre services) for its &lt;tt&gt;local_ip&lt;/tt&gt;, and lnet_sock_create would fail once the IP address was migrated to another node. The error message only included the port, but not the IP address, and so it took a while to correlate the events. Why LNET chose to pick this particular source address is a separate question we need to investigate, but for starters, improving the error message to include all relevant content seems to be a good idea to me.&lt;/p&gt;</description>
                <environment></environment>
        <key id="52629">LU-11112</key>
            <summary>lnet: improve error msg in lnet_sock_create()</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="sharmaso">Sonia Sharma</assignee>
                                    <reporter username="kobras">Daniel Kobras</reporter>
                        <labels>
                    </labels>
                <created>Mon, 2 Jul 2018 13:59:38 +0000</created>
                <updated>Tue, 3 Jul 2018 15:19:47 +0000</updated>
                                                                                <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="229857" author="pjones" created="Mon, 2 Jul 2018 14:01:50 +0000"  >&lt;p&gt;Sonia&lt;/p&gt;

&lt;p&gt;Could you please investigate?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="229858" author="pjones" created="Mon, 2 Jul 2018 14:06:38 +0000"  >&lt;p&gt;Daniel&lt;/p&gt;

&lt;p&gt;Could you please push your proposed patch into Gerrit so it can be reviewed/landed?&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="229859" author="kobras" created="Mon, 2 Jul 2018 14:06:54 +0000"  >&lt;p&gt;It seems Gerrit has moved to a different IP address, and I cannot access it due to local firewall restrictions. Attaching the patch here while I try to sort things out.&lt;/p&gt;</comment>
                            <comment id="229860" author="knweiss" created="Mon, 2 Jul 2018 14:20:09 +0000"  >&lt;p&gt;FWIW: This is the original lnet error message:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;LNetError: 2099:0:(lib-socket.c:455:lnet_sock_create()) Error trying to bind to port 1023: -99
LNetError: 2099:0:(lib-socket.c:455:lnet_sock_create()) Skipped 8 previous similar messages
LNetError: 11e-e: Unexpected error -99 connecting to 192.168.10.6@tcp at host 192.168.10.6 on port 988
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="229872" author="sharmaso" created="Mon, 2 Jul 2018 20:57:33 +0000"  >&lt;p&gt;Hi Daniel&lt;/p&gt;

&lt;p&gt;In the lnet_sock_connect function, I see &quot;INADDR_ANY&quot; is assigned if the local_ip == 0. With &quot;INDDR_ANY&quot; in the bind call,&#160;the socket will be bound to all the&#160;local interfaces.&lt;/p&gt;


&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
439 &#160; &#160; &#160; &#160; &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (local_ip != 0 || local_port != 0) {
440 &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; memset(&amp;amp;locaddr, 0, sizeof(locaddr));
441 &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; locaddr.sin_family = AF_INET;
442 &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; locaddr.sin_port = htons(local_port);
443 &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; locaddr.sin_addr.s_addr = (local_ip == 0) ?
444 &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; INADDR_ANY : htonl(local_ip);&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Was this virtual address assigned to one of the interface on the node? It would help to understand if you know what particular action/command execution is resulting in this error.&lt;/p&gt;

&lt;p&gt;Thanks&lt;br/&gt;
Sonia&lt;/p&gt;</comment>
                            <comment id="229881" author="kobras" created="Tue, 3 Jul 2018 14:37:41 +0000"  >&lt;p&gt;Hi Sonia!&lt;/p&gt;

&lt;p&gt;For the scope of this LU, my line of argument just goes:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;lnet_sock_connect() can fail due to either of two arguments (local_ip and local_port);&lt;/li&gt;
	&lt;li&gt;the resulting error message just includes one of the arguments (local_port);&lt;/li&gt;
	&lt;li&gt;there exists at least one real-world case where it fails due to the other argument (local_ip);&lt;/li&gt;
	&lt;li&gt;hence the error message should be improved to include both arguments.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;The case given was only meant as an example to show that having the full information in the error output occasionally really matters.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Why LNET chose to open a connection with a fixed source IP address rather than just using INADDR_ANY isn&apos;t clear to me, yet. One should be able to reproduce it with&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;assign IP alias to interface;&lt;/li&gt;
	&lt;li&gt;start LNET/Lustre;&lt;/li&gt;
	&lt;li&gt;remove IP alias from interface;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;but that&apos;s a topic probably more suited to a separate LU (once we&apos;ve collected more information about it).&lt;/p&gt;</comment>
                            <comment id="229882" author="gerrit" created="Tue, 3 Jul 2018 15:19:47 +0000"  >&lt;p&gt;Daniel Kobras (d.kobras@science-computing.de) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/32758&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/32758&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11112&quot; title=&quot;lnet: improve error msg in lnet_sock_create()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11112&quot;&gt;LU-11112&lt;/a&gt; lnet: improve error msg in lnet_sock_create()&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: a0b382caf10e590f112030cc528e1c5fdd470390&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="30467" name="0001-LU-11112-lnet-improve-error-msg-in-lnet_sock_create.patch" size="1152" author="kobras" created="Mon, 2 Jul 2018 14:04:24 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzyp3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>