<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:29:36 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-9823] LNet fails to come up when using lctl but works with lnetctl</title>
                <link>https://jira.whamcloud.com/browse/LU-9823</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;On several systems when attempting to bring a lustre system this is reported:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[188273.054578] LNet: Added LNI 10.0.1.22@tcp [8/256/0/180]
[188273.054724] LNet: Accept secure, port 988
[191295.504584] Lustre: Lustre: Build Version: 2.10.0_dirty
[191300.858629] Lustre: 22140:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1501789735/real 1501789735]  req@ffff800fb09cfc80 x1574740673167376/t0(0) o250-&amp;gt;MGC128.219.141.4@tcp@128.219.141.4@tcp:26/25 lens 520/544 e 0 to 1 dl 1501789740 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[191301.858634] LustreError: 22036:0:(mgc_request.c:251:do_config_log_add()) MGC128.219.141.4@tcp: failed processing log, type 1: rc = -5
[191330.858099] Lustre: 22140:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1501789760/real 1501789760]  req@ffff800fb4910980 x1574740673167424/t0(0) o250-&amp;gt;MGC128.219.141.4@tcp@128.219.141.4@tcp:26/25 lens 520/544 e 0 to 1 dl 1501789770 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[191332.858106] LustreError: 15c-8: MGC128.219.141.4@tcp: The configuration from log &apos;legs-client&apos; failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
[191332.858399] Lustre: Unmounted legs-client
[191332.859241] LustreError: 22036:0:(obd_mount.c:1505:lustre_fill_super()) Unable to mount  (-5)

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After investigation this is a symptom of the LNet layer communication failure. This occurs when LNet has been setup with lctl but if one uses lnetctl then this issue appears to go away.&lt;/p&gt;</description>
                <environment>Seen on various systems with Lustre 2.10 and Lustre 2.11.</environment>
        <key id="47644">LU-9823</key>
            <summary>LNet fails to come up when using lctl but works with lnetctl</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="6">Not a Bug</resolution>
                                        <assignee username="ashehata">Amir Shehata</assignee>
                                    <reporter username="simmonsja">James A Simmons</reporter>
                        <labels>
                            <label>IPv6</label>
                    </labels>
                <created>Thu, 3 Aug 2017 20:01:36 +0000</created>
                <updated>Sun, 7 Jan 2024 18:08:34 +0000</updated>
                            <resolved>Mon, 27 Sep 2021 18:25:01 +0000</resolved>
                                    <version>Lustre 2.10.1</version>
                    <version>Lustre 2.11.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>8</watches>
                                                                            <comments>
                            <comment id="204370" author="pjones" created="Thu, 3 Aug 2017 20:33:07 +0000"  >&lt;p&gt;James what version of SLES12 do you mean?&lt;/p&gt;</comment>
                            <comment id="204374" author="simmonsja" created="Thu, 3 Aug 2017 20:40:18 +0000"  >&lt;p&gt;cat /etc/SuSE-release &lt;br/&gt;
SUSE Linux Enterprise Server 12 (aarch64)&lt;br/&gt;
VERSION = 12&lt;br/&gt;
PATCHLEVEL = 2&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;This file is deprecated and will be removed in a future service pack or release.&lt;/li&gt;
	&lt;li&gt;Please check /etc/os-release for details about this release.&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;&amp;gt;uname -r&lt;br/&gt;
4.4.59-92.20-default&lt;/p&gt;</comment>
                            <comment id="204767" author="simmonsja" created="Tue, 8 Aug 2017 14:30:31 +0000"  >&lt;p&gt;Some one reported this problem also on Power8. I gathered a debug log from the client side. I looked on the server side and I saw this bug which is this new to me.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;606005.245327&amp;#93;&lt;/span&gt; LNetError: 2606:0:(acceptor.c:406:lnet_acceptor()) Refusing connection from 128.219.141.3: insecure port 55084&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;606023.489869&amp;#93;&lt;/span&gt; LNetError: 2606:0:(acceptor.c:406:lnet_acceptor()) Refusing connection from 128.219.141.3: insecure port 37668&lt;/p&gt;</comment>
                            <comment id="211699" author="simmonsja" created="Mon, 23 Oct 2017 16:36:21 +0000"  >&lt;p&gt;I just attempted to bring up our regular testing file system on normal RHEL7 x86 with the latest lustre 2.10.1 and I&apos;m seeing this error. Will try lustre 2.10.54 next.&lt;/p&gt;</comment>
                            <comment id="211734" author="simmonsja" created="Mon, 23 Oct 2017 20:58:01 +0000"  >&lt;p&gt;The problem is far worst with the latest master. It takes about 15 minutes to mount any back end disk. Once it does mount after many hours with a 56 OST/16 MDT system the client fails to mount.&lt;/p&gt;</comment>
                            <comment id="211745" author="simmonsja" created="Mon, 23 Oct 2017 22:47:51 +0000"  >&lt;p&gt;So on the MDS I see the following lctl dump:&lt;/p&gt;

&lt;p&gt;00000100:00080000:5.0:1508798556.578627:0:4131:0:(pinger.c:405:ptlrpc_pinger_add_import()) adding pingable import 19afc095-abef-a794-2f84-9099c3e67329-&amp;gt;MGS&lt;br/&gt;
00000020:01000004:5.0:1508798556.578635:0:4131:0:(obd_mount_server.c:1303:server_start_targets()) starting target sultan-MDT0000&lt;br/&gt;
00000020:01000004:5.0:1508798556.578694:0:4131:0:(obd_mount.c:193:lustre_start_simple()) Starting obd MDS (typ=mds)&lt;br/&gt;
00000020:00000080:5.0:1508798556.578696:0:4131:0:(obd_config.c:1144:class_process_config()) processing cmd: cf001&lt;br/&gt;
00000020:00000080:5.0:1508798556.623854:0:4131:0:(genops.c:414:class_newdev()) Allocate new device MDS (ffff8817d14c8000)&lt;br/&gt;
00000020:00000080:5.0:1508798556.623939:0:4131:0:(obd_config.c:431:class_attach()) OBD: dev 2 attached type mds with refcount 1&lt;br/&gt;
00000020:00000080:5.0:1508798556.623945:0:4131:0:(obd_config.c:1144:class_process_config()) processing cmd: cf003&lt;br/&gt;
00000020:00000080:7.0:1508798556.670941:0:4131:0:(obd_config.c:542:class_setup()) finished setup of obd MDS (uuid MDS_uuid)&lt;br/&gt;
00000020:01000004:7.0:1508798556.670956:0:4131:0:(obd_mount_server.c:294:server_mgc_set_fs()) Set mgc disk for /dev/sda&lt;br/&gt;
00000040:01000000:7.0:1508798556.673106:0:4131:0:(llog_obd.c:210:llog_setup()) obd MGC10.37.248.67@o2ib1 ctxt 0 is initialized&lt;br/&gt;
00000020:01000004:7.0:1508798556.673119:0:4131:0:(obd_mount_server.c:1208:server_register_target()) Registration sultan-MDT0000, fs=sultan, 10.37.248.155@o2ib1&lt;br/&gt;
, index=0000, flags=0x1&lt;br/&gt;
10000000:01000000:7.0:1508798556.673122:0:4131:0:(mgc_request.c:1253:mgc_set_info_async()) register_target sultan-MDT0000 0x10000001&lt;br/&gt;
10000000:01000000:7.0:1508798556.673144:0:4131:0:(mgc_request.c:1203:mgc_target_register()) register sultan-MDT0000&lt;br/&gt;
00000100:00080000:7.0:1508798556.673152:0:4131:0:(client.c:1562:ptlrpc_send_new_req()) @@@ req waiting for recovery: (FULL != CONNECTING)  req@ffff8817cc750000&lt;br/&gt;
 x1582089946267680/t0(0) o253-&amp;gt;MGC10.37.248.67@o2ib1@10.37.248.67@o2ib1:26/25 lens 4768/4768 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1&lt;br/&gt;
00000100:00000400:3.0F:1508798561.578402:0:3989:0:(client.c:2113:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1508798556/r&lt;br/&gt;
eal 0]  req@ffff8817d1b40000 x1582089946267664/t0(0) o250-&amp;gt;MGC10.37.248.67@o2ib1@10.37.248.67@o2ib1:26/25 lens 520/544 e 0 to 1 dl 1508798561 ref 2 fl Rpc:XN/0&lt;br/&gt;
/ffffffff rc 0/-1&lt;br/&gt;
00000100:00080000:7.0:1508798567.672382:0:4131:0:(client.c:1170:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff8817cc750000 x1582089946267680/t0(0&lt;br/&gt;
) o253-&amp;gt;MGC10.37.248.67@o2ib1@10.37.248.67@o2ib1:26/25 lens 4768/4768 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1&lt;br/&gt;
00000020:00080000:7.0:1508798567.672401:0:4131:0:(obd_mount_server.c:1233:server_register_target()) sultan-MDT0000: error registering with the MGS: rc = -110 (&lt;br/&gt;
not fatal)&lt;br/&gt;
00000020:01000004:7.0:1508798567.672406:0:4131:0:(obd_mount_server.c:117:server_register_mount()) register mount ffff8817df73f800 from sultan-MDT0000&lt;br/&gt;
10000000:01000000:7.0:1508798567.672412:0:4131:0:(mgc_request.c:2197:mgc_process_config()) parse_log sultan-MDT0000 from 0&lt;br/&gt;
10000000:01000000:7.0:1508798567.672413:0:4131:0:(mgc_request.c:331:config_log_add()) adding config log sultan-MDT0000:          (null)&lt;br/&gt;
10000000:01000000:7.0:1508798567.672416:0:4131:0:(mgc_request.c:211:do_config_log_add()) do adding config log sultan-sptlrpc:          (null)&lt;br/&gt;
10000000:01000000:7.0:1508798567.672419:0:4131:0:(mgc_request.c:90:mgc_name2resid()) log sultan-sptlrpc to resid 0x6e61746c7573/0x0 (sultan)&lt;br/&gt;
10000000:01000000:7.0:1508798567.672425:0:4131:0:(mgc_request.c:2062:mgc_process_log()) Process log sultan-sptlrpc:          (null) from 1&lt;br/&gt;
10000000:01000000:7.0:1508798567.672427:0:4131:0:(mgc_request.c:1130:mgc_enqueue()) Enqueue for sultan-sptlrpc (res 0x6e61746c7573)&lt;br/&gt;
00000100:00080000:7.0:1508798567.672459:0:4131:0:(client.c:1562:ptlrpc_send_new_req()) @@@ req waiting for recovery: (FULL != CONNECTING)  req@ffff8817cc750000&lt;br/&gt;
 x1582089946267696/t0(0) o101-&amp;gt;MGC10.37.248.67@o2ib1@10.37.248.67@o2ib1:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1&lt;br/&gt;
00000800:00000400:6.0:1508798569.567382:0:3842:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Timed out tx for 10.37.248.67@o2ib1: 4295778 seconds&lt;br/&gt;
00000100:00080000:1.0:1508798569.578255:0:3989:0:(import.c:1289:ptlrpc_connect_interpret()) ffff8817e66a2800 MGS: changing import state from CONNECTING to DISC&lt;br/&gt;
ONN&lt;br/&gt;
00000100:00080000:1.0:1508798569.578260:0:3989:0:(import.c:1336:ptlrpc_connect_interpret()) recovery of MGS on MGC10.37.248.67@o2ib1_0 failed (-110)&lt;/p&gt;</comment>
                            <comment id="211983" author="simmonsja" created="Wed, 25 Oct 2017 18:42:16 +0000"  >&lt;p&gt;Okay I git bisect to see when this failure started to happen and its due to the multirail support landing. Currently people moving to lustre 2.10 might find they can&apos;t mount lustre at all when deploying a production system.&lt;/p&gt;</comment>
                            <comment id="211988" author="pjones" created="Wed, 25 Oct 2017 19:03:20 +0000"  >&lt;p&gt;Amir&lt;/p&gt;

&lt;p&gt;Can you please advise?&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="211992" author="ashehata" created="Wed, 25 Oct 2017 20:04:22 +0000"  >&lt;p&gt;Do any of the nodes have multiple interfaces? If so, can you please make sure you follow this general linux routing guideline:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://wiki.hpdd.intel.com/display/LNet/MR+Cluster+Setup&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://wiki.hpdd.intel.com/display/LNet/MR+Cluster+Setup&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="212122" author="simmonsja" created="Thu, 26 Oct 2017 19:34:18 +0000"  >&lt;p&gt;For our sultan OSS nodes it was a configuration issue. We placed the two other IB ports on a different subnet and that seems to have worked. As for the ARM system it does have multiple ethernet interfaces for the computes but only one has been setup with an IP address.&lt;/p&gt;

&lt;p&gt;ip addr show&lt;br/&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1&lt;br/&gt;
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br/&gt;
    inet 127.0.0.1/8 scope host lo&lt;br/&gt;
       valid_lft forever preferred_lft forever&lt;br/&gt;
    inet6 ::1/128 scope host &lt;br/&gt;
       valid_lft forever preferred_lft forever&lt;br/&gt;
2: enP2p1s0f1: &amp;lt;NO-CARRIER,BROADCAST,MULTICAST,UP&amp;gt; mtu 1500 qdisc mq state DOWN group default qlen 1000&lt;br/&gt;
    link/ether 00:22:4d:c8:10:9f brd ff:ff:ff:ff:ff:ff&lt;br/&gt;
3: enP2p1s0f2: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc mq state UP group default qlen 1000&lt;br/&gt;
    link/ether 00:22:4d:c8:10:a0 brd ff:ff:ff:ff:ff:ff&lt;br/&gt;
    inet 10.0.1.22/24 brd 10.0.1.255 scope global enP2p1s0f2&lt;br/&gt;
       valid_lft forever preferred_lft forever&lt;br/&gt;
    inet6 fe80::222:4dff:fec8:10a0/64 scope link &lt;br/&gt;
       valid_lft forever preferred_lft forever&lt;br/&gt;
4: enP6p1s0f1: &amp;lt;BROADCAST,MULTICAST&amp;gt; mtu 1500 qdisc noop state DOWN group default qlen 1000&lt;br/&gt;
    link/ether 00:22:4d:c8:10:a1 brd ff:ff:ff:ff:ff:ff&lt;/p&gt;</comment>
                            <comment id="212131" author="simmonsja" created="Thu, 26 Oct 2017 20:10:34 +0000"  >&lt;p&gt;I think I see why we have a problem. The network interface has both an ipv4 and ipv6 address. How you every tried this setup?&lt;/p&gt;</comment>
                            <comment id="216369" author="adilger" created="Thu, 14 Dec 2017 21:34:37 +0000"  >&lt;p&gt;Lustre doesn&apos;t support IPv6, though it is definitely something that we should keep in mind moving forward (&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10391&quot; title=&quot;LNET: Support IPv6&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10391&quot;&gt;LU-10391&lt;/a&gt;).&lt;/p&gt;</comment>
                            <comment id="314061" author="adilger" created="Mon, 27 Sep 2021 18:25:01 +0000"  >&lt;p&gt;Ended up being a configuration issue.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="46342">LU-9567</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="43655">LU-9086</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="13182">LU-10391</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="27964" name="dump.log" size="6051654" author="simmonsja" created="Tue, 8 Aug 2017 14:29:53 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzhpz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>