<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:59:15 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-13200] hang in lnet_wait_known_routerstate</title>
                <link>https://jira.whamcloud.com/browse/LU-13200</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We have a hang on production systems&apos; 2.12.3 where lnet never sets up on server if some routers are bad (hanged on modprobe lustre, no lnet service)&lt;/p&gt;

&lt;p&gt;approximative backtrace:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;#0 __schedule
#1 schedule
#2 schedule_timeout
#3 lnet_router_post_mt_start
#4 lnet_monitor_thr_start
#5 LNetNIInit
#6 ptlrpc_ni_init
#7 ptlrpc_init_portals
#8 init_module
#9 do_one_initcall
#10 load_module
#11 sys_finit_module
#12 system_call_fastpath
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;


&lt;p&gt;But it&apos;s really stuck on the loop checking for &lt;tt&gt;rtr-&amp;gt;lpni_alive_count&lt;/tt&gt; to be non-zero on all lnet routers (loop on &lt;tt&gt;&amp;amp;the_lnet.ln_routers&lt;/tt&gt;)&lt;/p&gt;


&lt;p&gt;I&apos;m not sure why we don&apos;t always get stuck (there always are a couple of routers down on), and it got stuck last time. In the little traces I have left (didn&apos;t get a full crash on this one unfortunately), but it looks a lot like &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13001&quot; title=&quot;check_routers_before_use causes LNet to hang indefinitely if any router is down&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13001&quot;&gt;&lt;del&gt;LU-13001&lt;/del&gt;&lt;/a&gt;... except that 2.12.3 doesn&apos;t have &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11297&quot; title=&quot;Align LNet routing with Multi-Rail and LNet health &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11297&quot;&gt;&lt;del&gt;LU-11297&lt;/del&gt;&lt;/a&gt; so the patch for that one doesn&apos;t make sense.&lt;/p&gt;

&lt;p&gt;OTOH &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11298&quot; title=&quot;LNet: Router peer instead of Router Peer NI&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11298&quot;&gt;&lt;del&gt;LU-11298&lt;/del&gt;&lt;/a&gt; changes that to check &lt;tt&gt;(rtr-&amp;gt;lp_state &amp;amp; LNET_PEER_DISCOVERED)&lt;/tt&gt; instead, that sounds like it could be a good idea? I honestly can&apos;t say without a dump at hand unfortunately I will need to try to reproduce somewhere more practical....&lt;/p&gt;</description>
                <environment></environment>
        <key id="57989">LU-13200</key>
            <summary>hang in lnet_wait_known_routerstate</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="pjones">Peter Jones</assignee>
                                    <reporter username="martinetd">Dominique Martinet</reporter>
                        <labels>
                    </labels>
                <created>Tue, 4 Feb 2020 16:40:13 +0000</created>
                <updated>Wed, 5 Feb 2020 14:31:50 +0000</updated>
                                            <version>Lustre 2.12.3</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>2</watches>
                                                                            <comments>
                            <comment id="262627" author="pjones" created="Wed, 5 Feb 2020 14:31:50 +0000"  >&lt;p&gt;Thanks Dominque&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00t67:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>