<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:32:36 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-10162] mount command does not appear to be using the failover MGS/second MGS</title>
                <link>https://jira.whamcloud.com/browse/LU-10162</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We have just started testing Lustre 2.10.1 (recent git) on one of our RHEL7 clients with our 2.7 based servers. One of the file systems I&apos;m testing on is currently running with the (single) MDT and MGS on the second/failover server and the default mount command doesn&apos;t work. It does work if we swap the order of IPs in the mount command. &lt;/p&gt;

&lt;p&gt;So this command does not work:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt; mount -t lustre 10.144.134.13@o2ib,172.23.134.13@tcp:10.144.134.14@o2ib,172.23.134.14@tcp:/lustre04 /mnt/lustre04
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;But this command works and mounts the file system:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;sudo mount -t lustre 10.144.134.14@o2ib,172.23.134.14@tcp:10.144.134.13@o2ib,172.23.134.13@tcp:/lustre04 /mnt/lustre04
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;On our older clients running 2.7.3, both commands succeed in mounting the file system as I expected it. This has been verified just now.&lt;/p&gt;

&lt;p&gt;This is on clients which have both an NID on tcp and o2ib networks and as far as I can tell, the communication over both IB and ethernet works in general on the client where we&apos;re testing 2.10.1. I can ping 10.144.134.13 and 172.23.134.13 from the os, but lctl ping 10.144.134.13@o2ib times out.&lt;/p&gt;

&lt;p&gt; The first MDS (10.144.134.13 and 172.23.134.13 does not have any lnet or lustre modules loaded.&lt;/p&gt;

&lt;p&gt;Mount attempts:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[bnh65367@cs04r-sc-com99-20 ~]$ sudo mount -t lustre 10.144.134.14@o2ib,172.23.134.14@tcp:10.144.134.13@o2ib,172.23.134.13@tcp:/lustre04 /mnt/lustre04
[bnh65367@cs04r-sc-com99-20 ~]$ echo success | logger
[bnh65367@cs04r-sc-com99-20 ~]$ sudo umount /mnt/lustre04
[bnh65367@cs04r-sc-com99-20 ~]$ echo manually unmounted | logger
[bnh65367@cs04r-sc-com99-20 ~]$ sudo mount -t lustre 10.144.134.13@o2ib,172.23.134.13@tcp:10.144.134.14@o2ib,172.23.134.14@tcp:/lustre04 /mnt/lustre04
mount.lustre: mount 10.144.134.13@o2ib,172.23.134.13@tcp:10.144.134.14@o2ib,172.23.134.14@tcp:/lustre04 at /mnt/lustre04 failed: Input/output error
Is the MGS running?
[bnh65367@cs04r-sc-com99-20 ~]$
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Syslog on the MDS does not seem to show anything, syslog from the clients for both mount attempts is below.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Oct 25 18:37:04 cs04r-sc-com99-20 kernel: Lustre: 5364:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1508953019/real 0]  req@ffff88178fb88300 x1582247825193888/t0(0) o38-&amp;gt;lustre04-MDT0000-mdc-ffff88015e6ef800@10.144.134.13@o2ib:12/10 lens 520/544 e 0 to 1 dl 1508953024 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Oct 25 18:37:53 cs04r-sc-com99-20 kernel: LNet: 5344:0:(o2iblnd_cb.c:3186:kiblnd_check_conns()) Timed out tx for 10.144.134.13@o2ib: 4 seconds
Oct 25 18:38:14 cs04r-sc-com99-20 kernel: Lustre: Mounted lustre04-client
Oct 25 18:39:27 cs04r-sc-com99-20 bnh65367: success
Oct 25 18:39:35 cs04r-sc-com99-20 systemd: Unit mnt-lustre04.mount entered failed state.
Oct 25 18:39:35 cs04r-sc-com99-20 kernel: Lustre: Unmounted lustre04-client
Oct 25 18:39:46 cs04r-sc-com99-20 bnh65367: manually unmounted
Oct 25 18:40:04 cs04r-sc-com99-20 kernel: Lustre: 5364:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1508953199/real 0]  req@ffff88178fb88900 x1582247825198288/t0(0) o250-&amp;gt;MGC10.144.134.13@o2ib@10.144.134.13@o2ib:26/25 lens 520/544 e 0 to 1 dl 1508953204 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Oct 25 18:40:05 cs04r-sc-com99-20 kernel: LustreError: 9064:0:(mgc_request.c:251:do_config_log_add()) MGC10.144.134.13@o2ib: failed processing log, type 1: rc = -5
Oct 25 18:40:36 cs04r-sc-com99-20 kernel: LustreError: 15c-8: MGC10.144.134.13@o2ib: The configuration from log &apos;lustre04-client&apos; failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
Oct 25 18:40:36 cs04r-sc-com99-20 kernel: Lustre: Unmounted lustre04-client
Oct 25 18:40:50 cs04r-sc-com99-20 kernel: LNet: 5344:0:(o2iblnd_cb.c:3186:kiblnd_check_conns()) Timed out tx for 10.144.134.13@o2ib: 6 seconds
Oct 25 18:40:50 cs04r-sc-com99-20 kernel: LustreError: 9064:0:(obd_mount.c:1505:lustre_fill_super()) Unable to mount  (-5)
Oct 25 18:46:18 cs04r-sc-com99-20 kernel: LNet: 5344:0:(o2iblnd_cb.c:3186:kiblnd_check_conns()) Timed out tx for 10.144.134.13@o2ib: 3 seconds
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;For now we are only testing this version but if this doesn&apos;t work, I&apos;m not sure we can update any production machines.&lt;/p&gt;</description>
                <environment></environment>
        <key id="48956">LU-10162</key>
            <summary>mount command does not appear to be using the failover MGS/second MGS</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="emoly.liu">Emoly Liu</assignee>
                                    <reporter username="ferner">Frederik Ferner</reporter>
                        <labels>
                    </labels>
                <created>Wed, 25 Oct 2017 17:54:30 +0000</created>
                <updated>Fri, 1 Dec 2017 05:13:03 +0000</updated>
                                            <version>Lustre 2.10.1</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                            <comments>
                            <comment id="212172" author="emoly.liu" created="Fri, 27 Oct 2017 08:20:52 +0000"  >&lt;p&gt;Frederik,&lt;br/&gt;
 The similar test works for me. I have some questions:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;Can you run &quot;lctl list_nids&quot; on both nodes, and paste the result here? BTW, since as you said &quot;The first MDS (10.144.134.13 and 172.23.134.13 does not have any lnet or lustre modules loaded.&quot;, why you ran &quot;lctl ping 10.144.134.13@o2ib&quot; ?&lt;/li&gt;
	&lt;li&gt;Can you run &quot;tunefs.lustre $mgs_device&quot; and paste the output&#160;here?&lt;/li&gt;
	&lt;li&gt;Can you remove &quot;172.23.134.xx&quot; from the command, and try again? like the following two commands, and see if it still fails.
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;sudo mount -t lustre 10.144.134.14@o2ib:10.144.134.13@o2ib:/lustre04 /mnt/lustre04
sudo mount -t lustre 10.144.134.13@o2ib:10.144.134.14@o2ib:/lustre04 /mnt/lustre04

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Thanks,&lt;br/&gt;
 Emoly&lt;/p&gt;</comment>
                            <comment id="212300" author="ferner" created="Mon, 30 Oct 2017 14:32:42 +0000"  >&lt;p&gt;Emoly,&lt;/p&gt;

&lt;p&gt;I&apos;m not sure which two nodes you wanted to have list_nids from, so I&apos;ll include it from the two different clients and the active MDS.&lt;/p&gt;

&lt;p&gt;Client with Lustre 2.7.3:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[bnh65367@cs04r-sc-com11-01 ~]$ sudo !!
sudo lctl list_nids
10.144.146.61@o2ib
172.23.146.61@tcp
[bnh65367@cs04r-sc-com11-01 ~]$
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Client with lustre 2.10:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[bnh65367@cs04r-sc-com99-20 ~]$ sudo lctl list_nids
10.144.146.67@o2ib
172.23.146.67@tcp
[bnh65367@cs04r-sc-com99-20 ~]$ 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;active MDS:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[bnh65367@cs04r-sc-mds04-02 ~]$ sudo lctl list_nids
10.144.134.14@o2ib
172.23.134.14@tcp
[bnh65367@cs04r-sc-mds04-02 ~]$ 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;


&lt;p&gt;And it also seems I missed one part of my text. On the client with Lustre 2.7.3 installed, the lctl ping to 10.144.134.13@o2ib immediately returns Input/Output error, while on the newer node it times out.&lt;/p&gt;

&lt;p&gt;tunefs.lustre on MGS interface:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[bnh65367@cs04r-sc-mds04-02 ~]$ sudo tunefs.lustre /dev/mapper/vg_lustre04-mgs 
checking for existing Lustre data: found
Reading CONFIGS/mountdata

   Read previous values:
Target:     MGS
Index:      unassigned
Lustre FS:  
Mount type: ldiskfs
Flags:      0x4
              (MGS )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:


   Permanent disk data:
Target:     MGS
Index:      unassigned
Lustre FS:  
Mount type: ldiskfs
Flags:      0x4
              (MGS )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:

[bnh65367@cs04r-sc-mds04-02 ~]$ 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The mount still fails if I list 10.144.134.13@o2ib first.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[bnh65367@cs04r-sc-com99-20 ~]$ sudo mount -t lustre 10.144.134.13@o2ib:10.144.134.14@o2ib:/lustre04 /mnt/lustre04
mount.lustre: mount 10.144.134.13@o2ib:10.144.134.14@o2ib:/lustre04 at /mnt/lustre04 failed: Input/output error
Is the MGS running?
[bnh65367@cs04r-sc-com99-20 ~]$ sudo mount -t lustre 10.144.134.14@o2ib:10.144.134.13@o2ib:/lustre04 /mnt/lustre04
[bnh65367@cs04r-sc-com99-20 ~]$
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Frederik&lt;/p&gt;</comment>
                            <comment id="212500" author="emoly.liu" created="Wed, 1 Nov 2017 08:41:58 +0000"  >&lt;p&gt;Hi Frederik,&lt;/p&gt;

&lt;p&gt;Can you show me the branch name that your 2.10.1 client is using and its top git commit? because I just hit this issue with my two same nodes to last time, but changed their roles, server&amp;lt;-&amp;gt;client. My two nodes are both 2.10, but with different git commit.&#160;I will do more tests to investigate this issue.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
 Emoly&lt;/p&gt;</comment>
                            <comment id="212557" author="ferner" created="Wed, 1 Nov 2017 18:33:32 +0000"  >&lt;p&gt;Hi Emoly,&lt;/p&gt;

&lt;p&gt;the lustre version installed is 2.10.1_13_g2ee62fb, this matches what I believe to be the commit I used to build the modules: 2ee62fbbf14e055d0134eb0859999be394909f8f, the local git branch is a clone of branch b2_10 at git://git.whamcloud.com/fs/lustre-release.git.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Frederik&lt;/p&gt;
</comment>
                            <comment id="215092" author="emoly.liu" created="Fri, 1 Dec 2017 05:13:03 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=ferner&quot; class=&quot;user-hover&quot; rel=&quot;ferner&quot;&gt;ferner&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;Sorry for my late reply. After last time I hit this issue, I can&apos;t reproduce it anymore, even with 2.10 and 2.7. And my log looks a little weird, it printed -111(ECONNREFUSED) from kernel_connect, so I suspect there was something unclean in my lustre environment at that time.&lt;/p&gt;

&lt;p&gt;Can you still hit this issue? Only on one client or all the 2.10 clients? After you upgraded to 2.10, did you restart your client? If no, can you restart your client and retry?&lt;br/&gt;
 If you still hit this, please collect lustre logs per the following command on the client, and upload it here.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;#lctl set_param debug=-1 debug_mb=1024
#lctl dk clear
# run the mount command, e.g. sudo mount -t lustre 10.144.134.13@o2ib:10.144.134.14@o2ib:/lustre04 /mnt/lustre04
#lctl dk &amp;gt; mount_fail.log

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Thanks,&lt;br/&gt;
 Emoly&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="38169">LU-8397</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzmhj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>