<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:55:21 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-5884] bad lnet conf causes LBUG</title>
                <link>https://jira.whamcloud.com/browse/LU-5884</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Having a bad lnet config file in /etc/modprobe.d can cause kernel LBUGs.  In particular specifying by name a network interface that doesn&apos;t exist causes LBUG and panic at lnet startup time.  At the very least this sort of thing should fail nicely and report errors that an admin can act on, it shouldn&apos;t panic the node.&lt;/p&gt;

&lt;p&gt;This was seen in our test environment when testing el7.  Our test framework installs an /etc/modprobe.d/lustre-lnet.conf file that says:&lt;/p&gt;

&lt;p&gt;options lnet accept=all networks=&quot;tcp0(eth0)&quot; accept_port=7988&lt;/p&gt;

&lt;p&gt;This has always worked in the past, but in current el7 installs &apos;eth0&apos; is no longer the default name of the primary ethernet interface.  This causes all lnet startups to LBUG and panic, with traces like:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;14:29:49:[  207.450923] LNetError: 2024:0:(linux-tcpip.c:127:libcfs_ipif_query()) Can&apos;t get flags for interface eth0
14:29:49:[  207.451763] LNetError: 2024:0:(socklnd.c:2829:ksocknal_startup()) Can&apos;t get interface eth0 info: -19
14:29:49:[  208.452162] LNetError: 105-4: Error -100 starting up LNI tcp
14:29:49:[  208.453865] LNetError: 2024:0:(api-ni.c:829:lnet_unprepare()) ASSERTION( list_empty(&amp;amp;the_lnet.ln_nis) ) failed: 
14:29:49:[  208.456181] LNetError: 2024:0:(api-ni.c:829:lnet_unprepare()) LBUG
14:29:49:[  208.456661] Pid: 2024, comm: modprobe
14:29:49:[  208.456947] 
14:29:49:[  208.456947] Call Trace:
14:29:49:[  208.457281]  [&amp;lt;ffffffffa0432853&amp;gt;] libcfs_debug_dumpstack+0x53/0x80 [libcfs]
14:29:49:[  208.457816]  [&amp;lt;ffffffffa0432df5&amp;gt;] lbug_with_loc+0x45/0xc0 [libcfs]
14:29:49:[  208.458309]  [&amp;lt;ffffffffa04d4877&amp;gt;] lnet_unprepare+0x297/0x340 [lnet]
14:29:49:[  208.458784]  [&amp;lt;ffffffffa04d749e&amp;gt;] LNetNIInit+0x30e/0xa50 [lnet]
14:29:49:[  208.459271]  [&amp;lt;ffffffffa08dd000&amp;gt;] ? init_module+0x0/0x1000 [ptlrpc]
14:29:49:[  208.459771]  [&amp;lt;ffffffffa07d5f4c&amp;gt;] ptlrpc_ni_init+0x2c/0x1a0 [ptlrpc]
14:29:49:[  208.460301]  [&amp;lt;ffffffffa08dd000&amp;gt;] ? init_module+0x0/0x1000 [ptlrpc]
14:29:49:[  208.460800]  [&amp;lt;ffffffffa07d60d1&amp;gt;] ptlrpc_init_portals+0x11/0xf0 [ptlrpc]
14:29:49:[  208.461346]  [&amp;lt;ffffffffa08dd000&amp;gt;] ? init_module+0x0/0x1000 [ptlrpc]
14:29:49:[  208.461841]  [&amp;lt;ffffffffa08dd187&amp;gt;] init_module+0x187/0x1000 [ptlrpc]
14:29:49:[  208.462338]  [&amp;lt;ffffffff810020e2&amp;gt;] do_one_initcall+0xe2/0x190
14:29:49:[  208.462778]  [&amp;lt;ffffffff810ca9cb&amp;gt;] load_module+0x12ab/0x1aa0
14:29:49:[  208.463229]  [&amp;lt;ffffffff812da1a0&amp;gt;] ? ddebug_dyndbg_module_param_cb+0x0/0x60
14:29:49:[  208.463751]  [&amp;lt;ffffffff810c72f3&amp;gt;] ? copy_module_from_fd.isra.43+0x53/0x150
14:29:49:[  208.464287]  [&amp;lt;ffffffff810cb376&amp;gt;] SyS_finit_module+0xa6/0xd0
14:29:49:[  208.464721]  [&amp;lt;ffffffff815f2b19&amp;gt;] system_call_fastpath+0x16/0x1b
14:29:49:[  208.465189] 
14:29:49:[  208.466970] Kernel panic - not syncing: LBUG
14:29:49:[  208.467019] CPU: 0 PID: 2024 Comm: modprobe Tainted: GF          O--------------   3.10.0-123.9.2.el7.x86_64 #1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;We can probably take action in TEI to avoid this problem for el7 test, but it highlights the fact that a user or admin can crash nodes with reasonable looking but incorrect lnet config options. Such wrong config should return actionable errors, not cause LBUGs.&lt;/p&gt;</description>
                <environment>el7</environment>
        <key id="27506">LU-5884</key>
            <summary>bad lnet conf causes LBUG</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="ashehata">Amir Shehata</assignee>
                                    <reporter username="bogl">Bob Glossman</reporter>
                        <labels>
                    </labels>
                <created>Fri, 7 Nov 2014 16:46:04 +0000</created>
                <updated>Fri, 20 Feb 2015 15:32:33 +0000</updated>
                            <resolved>Mon, 10 Nov 2014 03:24:32 +0000</resolved>
                                    <version>Lustre 2.7.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="98678" author="adilger" created="Fri, 7 Nov 2014 18:15:55 +0000"  >&lt;p&gt;Bob, does this problem exist in Lustre 2.6 or earlier?  Trying to figure out if this is caused by DLC or is an old bug that we&apos;ve never noticed. &lt;/p&gt;</comment>
                            <comment id="98681" author="bogl" created="Fri, 7 Nov 2014 18:55:59 +0000"  >&lt;p&gt;Andreas, I just don&apos;t know the answer.  I suspect that this is a long standing issue, not new.  I will check back on old versions to try to find out.&lt;/p&gt;</comment>
                            <comment id="98695" author="bogl" created="Fri, 7 Nov 2014 20:24:30 +0000"  >&lt;p&gt;Don&apos;t see the problem in b2_6.  If I do a client mount with a bad config as shown above I get an error reported, no panic:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# mount -t lustre -o flock,user_xattr centos2:/lustre /mnt/lustre
mount.lustre: mount centos2:/lustre at /mnt/lustre failed: No such device
Are the lustre modules loaded?
Check /etc/modprobe.conf and /proc/filesystems
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;/var/log/messages says:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Nov  7 12:19:03 centos7-2 kernel: LNetError: 103301:0:(linux-tcpip.c:127:libcfs_ipif_query()) Can&apos;t get flags for interface eth0
Nov  7 12:19:03 centos7-2 kernel: LNetError: 103301:0:(socklnd.c:2826:ksocknal_startup()) Can&apos;t get interface eth0 info: -19
Nov  7 12:19:04 centos7-2 kernel: LNetError: 105-4: Error -100 starting up LNI tcp
Nov  7 12:19:04 centos7-2 kernel: LustreError: 103301:0:(events.c:809:ptlrpc_init_portals()) network initialisation failed
Nov  7 12:19:04 centos7-2 kernel: LustreError: 165-2: Nothing registered for client mount! Is the &apos;lustre&apos; module loaded?
Nov  7 12:19:04 centos7-2 kernel: LustreError: 103272:0:(obd_mount.c:1342:lustre_fill_super()) Unable to mount  (-19)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This suggests the problem went into lnet code recently.&lt;/p&gt;</comment>
                            <comment id="98721" author="liang" created="Sat, 8 Nov 2014 08:44:09 +0000"  >&lt;p&gt;I believe this is just another instance of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5568&quot; title=&quot;kernel crash when when network initialization failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5568&quot;&gt;&lt;del&gt;LU-5568&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="98739" author="adilger" created="Mon, 10 Nov 2014 03:24:32 +0000"  >&lt;p&gt;Close as duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5568&quot; title=&quot;kernel crash when when network initialization failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5568&quot;&gt;&lt;del&gt;LU-5568&lt;/del&gt;&lt;/a&gt;. &lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                        <issuelink>
            <issuekey id="26251">LU-5568</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="24605">LU-5022</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="15616">LU-2456</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzx0h3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>16452</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>