<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:35:47 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-17480] lustre_rmmod hangs if a lnet route is down</title>
                <link>https://jira.whamcloud.com/browse/LU-17480</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Here is the following reproducer:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Mount lustre on a RoCE network&lt;/li&gt;
	&lt;li&gt;Add a route with the gateway down&lt;/li&gt;
	&lt;li&gt;Generate lnet traffic (find /mnt/lustre)&lt;/li&gt;
	&lt;li&gt;umount client&lt;/li&gt;
	&lt;li&gt;lustre_rmmod&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;lustre_rmmod hangs around 1 min in &quot;lnetctl net unconfigure&quot;:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;PID: 2995     TASK: &amp;lt;task&amp;gt;  CPU: 4    COMMAND: &quot;lnetctl&quot;
#0 __schedule 
#1 schedule 
#2 schedule_timeout 
#3 kiblnd_shutdown 
#4 lnet_shutdown_lndni 
#5 lnet_shutdown_lndnet 
#6 lnet_shutdown_lndnets 
#7 LNetNIFini 
#8 lnet_ioctl 
#9 notifier_call_chain 
#10 blocking_notifier_call_chain 
#11 libcfs_psdev_ioctl 
#12 do_vfs_ioctl 
#13 ksys_ioctl 
#14 __x64_sys_ioctl 
#15 do_syscall_64 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;dk log from client:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000800:00000200:47.0:1706285707.687699:0:197221:0:(o2iblnd.c:3046:kiblnd_shutdown()) x.y.z.75@o2ib50: waiting for 2 peers to disconnect
00000800:00000100:1.0F:1706285708.135711:0:192402:0:(o2iblnd_cb.c:3265:kiblnd_cm_callback()) x.y.z.90@o2ib50: UNREACHABLE -110
00000800:00000200:1.0:1706285708.135713:0:192402:0:(o2iblnd_cb.c:2345:kiblnd_connreq_done()) x.y.z.90@o2ib50: active(1), version(12), status(-100)
00000800:00000010:1.0:1706285708.135714:0:192402:0:(o2iblnd_cb.c:2353:kiblnd_connreq_done()) kfreed &apos;conn-&amp;gt;ibc_connvars&apos;: 136 at 000000009aa0d65a (tot 19395077).
00000400:00000200:1.0:1706285708.135717:0:192402:0:(router.c:1739:lnet_notify()) x.y.z.75@o2ib50 notifying x.y.z.90@o2ib50: down
00000800:00000200:1.0:1706285708.135920:0:192402:0:(o2iblnd_cb.c:2253:kiblnd_finalise_conn()) abort connection with x.y.z.90@o2ib50
00000800:00000200:1.0:1706285708.135922:0:192402:0:(o2iblnd_cb.c:3267:kiblnd_cm_callback()) conn[00000000f9491194] (19)--
00000800:00000100:1.0:1706285708.135938:0:192402:0:(o2iblnd_cb.c:3265:kiblnd_cm_callback()) x.y.z.99@o2ib50: UNREACHABLE -110
00000800:00000200:1.0:1706285708.135939:0:192402:0:(o2iblnd_cb.c:2345:kiblnd_connreq_done()) x.y.z.99@o2ib50: active(1), version(12), status(-100)
00000800:00000010:1.0:1706285708.135940:0:192402:0:(o2iblnd_cb.c:2353:kiblnd_connreq_done()) kfreed &apos;conn-&amp;gt;ibc_connvars&apos;: 136 at 00000000868f6d6f (tot 19394941).
00000400:00000200:1.0:1706285708.135942:0:192402:0:(router.c:1739:lnet_notify()) xxxx@o2ib50 notifying x.y.z.99@o2ib50: down
00000800:00000200:29.2F:1706285708.135964:0:0:0:(o2iblnd_cb.c:3780:kiblnd_cq_completion()) conn[00000000f9491194] (18)++
00000800:00000200:33.0F:1706285708.135973:0:195209:0:(o2iblnd_cb.c:3894:kiblnd_scheduler()) conn[00000000f9491194] (19)++
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The unconfigure task seems to wait a timeout for the LNet gateway down &quot;x.y.z.99@o2ib50&quot; and x.y.z.90@o2ib50 (UNREACHABLE -110).&lt;/p&gt;

&lt;p&gt;The workarround is to remove LNet routes before the unconfigure.&lt;/p&gt;</description>
                <environment>Lustre server 2.15.3 RoCE&lt;br/&gt;
Lustre MGS 2.15.3 Infiniband&lt;br/&gt;
Lustre client 2.15.3 RoCE&lt;br/&gt;
Lustre router 2.12.9 Infiniband/RoCE</environment>
        <key id="80404">LU-17480</key>
            <summary>lustre_rmmod hangs if a lnet route is down</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="eaujames">Etienne Aujames</assignee>
                                    <reporter username="eaujames">Etienne Aujames</reporter>
                        <labels>
                    </labels>
                <created>Mon, 29 Jan 2024 10:34:16 +0000</created>
                <updated>Fri, 9 Feb 2024 12:20:33 +0000</updated>
                                                                                <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="403375" author="gerrit" created="Fri, 9 Feb 2024 12:20:33 +0000"  >&lt;p&gt;&quot;Etienne AUJAMES &amp;lt;eaujames@ddn.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/53986&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/53986&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-17480&quot; title=&quot;lustre_rmmod hangs if a lnet route is down&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-17480&quot;&gt;LU-17480&lt;/a&gt; o2iblnd: add a timeout for rdma_connect&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 7499c8a3af228c7672acd4f9eb39ac60c77c07b1&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="73036">LU-16283</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i049dj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>