<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:25:21 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-2455] lctl ping takes too long to timeout</title>
                <link>https://jira.whamcloud.com/browse/LU-2455</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;lctl ping in theory takes a timeout parameter, with the default timeout being 1 second. In practice, however, the timeout can be significantly longer. It appears that it changes the timeout to 60s if it needs to UNLINK. Is there any way to eliminate this or is it mandatory that if there is an error, pings could take up to 60s? As it is now, the timeout is not very useful.&lt;/p&gt;</description>
                <environment></environment>
        <key id="16885">LU-2455</key>
            <summary>lctl ping takes too long to timeout</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="4">Incomplete</resolution>
                                        <assignee username="bfaccini">Bruno Faccini</assignee>
                                    <reporter username="ihara">Shuichi Ihara</reporter>
                        <labels>
                    </labels>
                <created>Mon, 10 Dec 2012 13:59:19 +0000</created>
                <updated>Sat, 5 Mar 2016 00:24:56 +0000</updated>
                            <resolved>Sat, 5 Mar 2016 00:24:56 +0000</resolved>
                                    <version>Lustre 2.1.2</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                            <comments>
                            <comment id="49039" author="pjones" created="Tue, 11 Dec 2012 09:05:57 +0000"  >&lt;p&gt;Bruno&lt;/p&gt;

&lt;p&gt;Could you please advise on this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="49051" author="bfaccini" created="Tue, 11 Dec 2012 12:01:10 +0000"  >
&lt;p&gt;I am afraid that&apos;s the way it is coded in lnet_ping() routine, and this stands for both the 60s time-out value and the automatic wait/retry with such time-out.&lt;/p&gt;

&lt;p&gt;BTW, can you provide &quot;lctl ping&quot; output and also if possible enable &quot;echo +neterror +net &amp;gt; /proc/sys/lnet/&lt;span class=&quot;error&quot;&gt;&amp;#91;debug,print&amp;#93;&lt;/span&gt;&quot; when you get this kind of error/situation ?? This will help me to definitely confirm the responsible code path in the sources.&lt;/p&gt;

</comment>
                            <comment id="49055" author="kitwestneat" created="Tue, 11 Dec 2012 12:21:33 +0000"  >&lt;p&gt;Here&apos;s the relevant dk output, I&apos;ll attach the full dk:&lt;/p&gt;

&lt;p&gt;00000400:00000200:0.0:1355245748.810589:0:5818:0:(lib-move.c:2705:LNetGet()) LNetGet -&amp;gt; 12345-10.10.10.179@tcp1&lt;br/&gt;
00000800:00000200:0.0:1355245748.810605:0:5818:0:(socklnd_cb.c:947:ksocknal_send()) sending 0 bytes in 0 frags to 12345-10.10.10.179@tcp1&lt;br/&gt;
00000800:00000200:0.0:1355245748.810810:0:5818:0:(socklnd.c:199:ksocknal_find_peer_locked()) got peer &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88002ee574c0&amp;#93;&lt;/span&gt; -&amp;gt; 12345-10.10.10.179@tcp1 (2)&lt;br/&gt;
00000800:00000200:0.1F:1355245748.810818:0:5818:0:(socklnd.c:199:ksocknal_find_peer_locked()) got peer &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88002ee574c0&amp;#93;&lt;/span&gt; -&amp;gt; 12345-10.10.10.179@tcp1 (2)&lt;br/&gt;
00000400:00000200:0.0:1355245749.810695:0:5818:0:(api-ni.c:1800:lnet_ping()) poll 0(-1 -1)&lt;br/&gt;
00000400:00000200:0.0:1355245749.810710:0:5818:0:(lib-md.c:69:lnet_md_unlink()) Queueing unlink of md ffff88003916cdc0&lt;br/&gt;
00000400:00000200:0.0:1355245769.826898:0:5818:0:(api-ni.c:1800:lnet_ping()) poll 1(4 0) unlinked&lt;/p&gt;

&lt;p&gt;If I am reading it right, the ping times out correctly after 1s, but then the unlinking takes 20s. &lt;/p&gt;

&lt;p&gt;Here is how I reproduced it:&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@traiana-mds0 ~&amp;#93;&lt;/span&gt;# iptables -A OUTPUT -p tcp --destination-port 988 -j DROP&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@traiana-mds0 ~&amp;#93;&lt;/span&gt;# /usr/bin/time lctl ping 10.10.10.179@tcp1 1&lt;br/&gt;
failed to ping 10.10.10.179@tcp1: Input/output error&lt;br/&gt;
0.00user 0.00system 0:21.00elapsed 0%CPU (0avgtext+0avgdata 2768maxresident)k&lt;br/&gt;
0inputs+0outputs (0major+214minor)pagefaults 0swaps&lt;/p&gt;

&lt;p&gt;On an IB network, we have seen it take over 50s.&lt;/p&gt;</comment>
                            <comment id="49056" author="kitwestneat" created="Tue, 11 Dec 2012 12:21:56 +0000"  >&lt;p&gt;full dk from the time period of the ping&lt;/p&gt;</comment>
                            <comment id="49082" author="bfaccini" created="Tue, 11 Dec 2012 19:12:01 +0000"  >&lt;p&gt;Unlink will occur asynchronously (&quot;lnet_md_unlink()) Queueing unlink&quot;) because at least one msg may still references it.&lt;/p&gt;

&lt;p&gt;And the way/timing for msgs to be terminated/discarded seems NAL dependent, so this may explain the differences you&apos;ve seen.&lt;/p&gt;

&lt;p&gt;Will try to fully explain that using your reproducing method.&lt;/p&gt;
</comment>
                            <comment id="49217" author="kitwestneat" created="Thu, 13 Dec 2012 17:25:45 +0000"  >&lt;p&gt;I guess the problem is that it&apos;s difficult to put a timeout on a TCP connect operation. That appears to be what is blocking for 20s. It&apos;s too bad that there is no way to handle the cleanup in the background, and return to userspace before the connect times out. I saw that the MD is created with auto-unlink enabled, but there doesn&apos;t appear to be a corresponding &quot;autofree&quot; for the event queue, is that correct?  &lt;/p&gt;

&lt;p&gt;Would it be possible to add an event handler callback that would do the cleanup? That way the lnet_ping code could enqueue the unlink and then return an error immediately. It&apos;s probably not that straightforward I suppose.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="12090" name="dk1" size="58139" author="kitwestneat" created="Tue, 11 Dec 2012 12:21:56 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvdiv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>5797</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>