<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:15:19 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-1287] MDS refuses client connections</title>
                <link>https://jira.whamcloud.com/browse/LU-1287</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;When configuring a lustre filesystem with the following setup, the MDS infinitely refuses client connections:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;one MDT alone on a node&lt;/li&gt;
	&lt;li&gt;one or more OSTs on other nodes&lt;/li&gt;
	&lt;li&gt;separate MGS&lt;/li&gt;
	&lt;li&gt;filesystem started in the following order (OSTs, MDT)&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Once the filesystem is started, the &apos;mount&apos; command on the client stays blocked forever.&lt;/p&gt;

&lt;p&gt;Please find attached a log from the MDS node.&lt;/p&gt;</description>
                <environment></environment>
        <key id="13890">LU-1287</key>
            <summary>MDS refuses client connections</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="6" iconUrl="https://jira.whamcloud.com/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="laisiyao">Lai Siyao</assignee>
                                    <reporter username="theryf">Florent Thery</reporter>
                        <labels>
                    </labels>
                <created>Thu, 5 Apr 2012 11:27:12 +0000</created>
                <updated>Wed, 27 Feb 2013 00:32:42 +0000</updated>
                            <resolved>Wed, 27 Feb 2013 00:31:14 +0000</resolved>
                                                    <fixVersion>Lustre 2.4.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="33536" author="cliffw" created="Thu, 5 Apr 2012 11:32:29 +0000"  >&lt;p&gt;There does not appear to be a log attached, please provide more details. &lt;/p&gt;</comment>
                            <comment id="33537" author="cliffw" created="Thu, 5 Apr 2012 12:08:14 +0000"  >&lt;p&gt;The Lustre kernel log (lctl dk) is not especially helpful in these situations. Please examine your system logs, (typically /var/log/messages) on all the servers and the impacted clients, there should be some more helpful messages there. Are you certain the servers are all up cleanly? The log mentions a writeconf - did you restart all servers?  &lt;/p&gt;</comment>
                            <comment id="33720" author="pichong" created="Fri, 6 Apr 2012 04:33:07 +0000"  >&lt;p&gt;Actually, the issue arises even when not using the writeconf option.&lt;br/&gt;
The issue has been reproduced from scratch, e.g.:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;mkfs the MGS&lt;/li&gt;
	&lt;li&gt;mkfs the filesystem&lt;/li&gt;
	&lt;li&gt;mount the MGS&lt;/li&gt;
	&lt;li&gt;mount the OSTs&lt;/li&gt;
	&lt;li&gt;mount the MDT&lt;/li&gt;
	&lt;li&gt;try to mount the client&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Attached is a tarball containing the dmesg output on each node.&lt;br/&gt;
Thanks.&lt;/p&gt;</comment>
                            <comment id="33794" author="theryf" created="Fri, 6 Apr 2012 10:45:24 +0000"  >&lt;p&gt;We&apos;ve been investigating this issue today to figure out exactly in which case it arises.&lt;br/&gt;
We&apos;ve found that bugzilla #24050 relates to the same issue.&lt;br/&gt;
So we&apos;ve experienced a problem of start order between the MDT and OSTs.&lt;br/&gt;
Starting the MDT before the OSTs when it is the first time fixes the issue.&lt;/p&gt;

&lt;p&gt;We have a number of questions though:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;will this issue be fixed on the lustre code level ?&lt;/li&gt;
	&lt;li&gt;is there a section addressing this situation in the lustre manual ?&lt;br/&gt;
  (Section 13.2 speaks about start orders but does not seem to address the situation)&lt;/li&gt;
	&lt;li&gt;on the user level, is there an easy way to troubleshoot this situation (e.g. a log line that helps to identify, ...) ?&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Thanks&lt;br/&gt;
Florent.&lt;/p&gt;
</comment>
                            <comment id="33803" author="cliffw" created="Fri, 6 Apr 2012 12:21:13 +0000"  >&lt;p&gt;I am sorry, but your logs are extremely short. You say you have re-created the filesystem, I see no logs showing this.&lt;br/&gt;
I do see the MDS is refusing the connection: Lustre: b10-MDT0000: temporarily refusing client connection from 60.64.2.24@o2ib&lt;br/&gt;
but threre really should be more detail there. Please attach the output of tunefs.lustre --print &amp;lt;device&amp;gt; for the MGT, MDT and all OSTs.  I would suggest a full cold start of the systems, verifying system health at each step.&lt;/p&gt;</comment>
                            <comment id="35629" author="cliffw" created="Sun, 29 Apr 2012 12:23:00 +0000"  >&lt;p&gt;What is your state? Do you have more information on this issue? &lt;/p&gt;</comment>
                            <comment id="38080" author="sebastien.buisson" created="Thu, 3 May 2012 08:38:33 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;From a full cold start system running Lustre 2.1, I reran the test by doing sequentially:&lt;/p&gt;

&lt;p&gt;1. format MGS&lt;/p&gt;

&lt;p&gt;At this point, &apos;tunefs.lustre --print&apos; on the MGT gives:&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@perou2 ~&amp;#93;&lt;/span&gt;# tunefs.lustre --print /dev/disk/by-id/scsi-2003013841aac0025&lt;br/&gt;
checking for existing Lustre data: found CONFIGS/mountdata&lt;br/&gt;
Reading CONFIGS/mountdata&lt;/p&gt;

&lt;p&gt;   Read previous values:&lt;br/&gt;
Target:     MGS&lt;br/&gt;
Index:      unassigned&lt;br/&gt;
Lustre FS:  mgs&lt;br/&gt;
Mount type: ldiskfs&lt;br/&gt;
Flags:      0x74&lt;br/&gt;
              (MGS needs_index first_time update )&lt;br/&gt;
Persistent mount opts: user_xattr,errors=remount-ro&lt;br/&gt;
Parameters: failover.node=10.3.1.2@o2ib&lt;/p&gt;


&lt;p&gt;   Permanent disk data:&lt;br/&gt;
Target:     MGS&lt;br/&gt;
Index:      unassigned&lt;br/&gt;
Lustre FS:  mgs&lt;br/&gt;
Mount type: ldiskfs&lt;br/&gt;
Flags:      0x74&lt;br/&gt;
              (MGS needs_index first_time update )&lt;br/&gt;
Persistent mount opts: user_xattr,errors=remount-ro&lt;br/&gt;
Parameters: failover.node=10.3.1.2@o2ib&lt;/p&gt;

&lt;p&gt;exiting before disk write.&lt;/p&gt;


&lt;p&gt;2. start MGS&lt;/p&gt;

&lt;p&gt;3. format MDT&lt;/p&gt;

&lt;p&gt;At this point, &apos;tunefs.lustre --print&apos; on the MDT gives:&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@perou3 ~&amp;#93;&lt;/span&gt;# tunefs.lustre --print /dev/disk/by-id/scsi-2003013841aac002d&lt;br/&gt;
checking for existing Lustre data: found CONFIGS/mountdata&lt;br/&gt;
Reading CONFIGS/mountdata&lt;/p&gt;

&lt;p&gt;   Read previous values:&lt;br/&gt;
Target:     fs_mdt-MDT0000&lt;br/&gt;
Index:      0&lt;br/&gt;
Lustre FS:  fs_mdt&lt;br/&gt;
Mount type: ldiskfs&lt;br/&gt;
Flags:      0x61&lt;br/&gt;
              (MDT first_time update )&lt;br/&gt;
Persistent mount opts: user_xattr,errors=remount-ro&lt;br/&gt;
Parameters: mgsnode=10.5.1.3@o2ib lov.stripesize=1048576 failover.node=10.5.1.3@o2ib network=o2ib0&lt;/p&gt;


&lt;p&gt;   Permanent disk data:&lt;br/&gt;
Target:     fs_mdt-MDT0000&lt;br/&gt;
Index:      0&lt;br/&gt;
Lustre FS:  fs_mdt&lt;br/&gt;
Mount type: ldiskfs&lt;br/&gt;
Flags:      0x61&lt;br/&gt;
              (MDT first_time update )&lt;br/&gt;
Persistent mount opts: user_xattr,errors=remount-ro&lt;br/&gt;
Parameters: mgsnode=10.5.1.3@o2ib lov.stripesize=1048576 failover.node=10.5.1.3@o2ib network=o2ib0&lt;/p&gt;

&lt;p&gt;exiting before disk write.&lt;/p&gt;


&lt;p&gt;4. format OSTs&lt;/p&gt;

&lt;p&gt;At this point, &apos;tunefs.lustre --print&apos; on the OSTs gives:&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@perou6 ~&amp;#93;&lt;/span&gt;# tunefs.lustre --print /dev/disk/by-id/scsi-2003013841aac0037&lt;br/&gt;
checking for existing Lustre data: found CONFIGS/mountdata&lt;br/&gt;
Reading CONFIGS/mountdata&lt;/p&gt;

&lt;p&gt;   Read previous values:&lt;br/&gt;
Target:     fs_mdt-OST0000&lt;br/&gt;
Index:      0&lt;br/&gt;
Lustre FS:  fs_mdt&lt;br/&gt;
Mount type: ldiskfs&lt;br/&gt;
Flags:      0x62&lt;br/&gt;
              (OST first_time update )&lt;br/&gt;
Persistent mount opts: errors=remount-ro,extents,mballoc&lt;br/&gt;
Parameters: mgsnode=10.5.1.3@o2ib failover.node=10.5.1.6@o2ib network=o2ib0&lt;/p&gt;


&lt;p&gt;   Permanent disk data:&lt;br/&gt;
Target:     fs_mdt-OST0000&lt;br/&gt;
Index:      0&lt;br/&gt;
Lustre FS:  fs_mdt&lt;br/&gt;
Mount type: ldiskfs&lt;br/&gt;
Flags:      0x62&lt;br/&gt;
              (OST first_time update )&lt;br/&gt;
Persistent mount opts: errors=remount-ro,extents,mballoc&lt;br/&gt;
Parameters: mgsnode=10.5.1.3@o2ib failover.node=10.5.1.6@o2ib network=o2ib0&lt;/p&gt;

&lt;p&gt;exiting before disk write.&lt;/p&gt;


&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@perou6 ~&amp;#93;&lt;/span&gt;# tunefs.lustre --print /dev/disk/by-id/scsi-2003013841aac0035&lt;br/&gt;
checking for existing Lustre data: found CONFIGS/mountdata&lt;br/&gt;
Reading CONFIGS/mountdata&lt;/p&gt;

&lt;p&gt;   Read previous values:&lt;br/&gt;
Target:     fs_mdt-OST0001&lt;br/&gt;
Index:      1&lt;br/&gt;
Lustre FS:  fs_mdt&lt;br/&gt;
Mount type: ldiskfs&lt;br/&gt;
Flags:      0x62&lt;br/&gt;
              (OST first_time update )&lt;br/&gt;
Persistent mount opts: errors=remount-ro,extents,mballoc&lt;br/&gt;
Parameters: mgsnode=10.5.1.3@o2ib failover.node=10.5.1.6@o2ib network=o2ib0&lt;/p&gt;


&lt;p&gt;   Permanent disk data:&lt;br/&gt;
Target:     fs_mdt-OST0001&lt;br/&gt;
Index:      1&lt;br/&gt;
Lustre FS:  fs_mdt&lt;br/&gt;
Mount type: ldiskfs&lt;br/&gt;
Flags:      0x62&lt;br/&gt;
              (OST first_time update )&lt;br/&gt;
Persistent mount opts: errors=remount-ro,extents,mballoc&lt;br/&gt;
Parameters: mgsnode=10.5.1.3@o2ib failover.node=10.5.1.6@o2ib network=o2ib0&lt;/p&gt;

&lt;p&gt;exiting before disk write.&lt;/p&gt;


&lt;p&gt;5. start OSTs&lt;/p&gt;

&lt;p&gt;At that point, here is the content of the MGS:&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@perou2 ~&amp;#93;&lt;/span&gt;# ls toto/CONFIGS/&lt;br/&gt;
fs_mdt-client  fs_mdt-OST0000  fs_mdt-OST0001  fs_mdt-sptlrpc _mgs-sptlrpc  mountdata&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@perou2 ~&amp;#93;&lt;/span&gt;# llog_reader toto/CONFIGS/fs_mdt-OST0000&lt;br/&gt;
Header size : 8192&lt;br/&gt;
Time : Fri Apr 13 15:07:03 2012&lt;br/&gt;
Number of records: 4&lt;br/&gt;
Target uuid : config_uuid&lt;br/&gt;
-----------------------&lt;br/&gt;
#01 (224)marker   5 (flags=0x01, v2.1.0.0) fs_mdt-OST0000  &apos;add ost&apos; Fri Apr 13 15:07:03 2012-&lt;br/&gt;
#02 (128)attach    0:fs_mdt-OST0000  1:obdfilter  2:fs_mdt-OST0000_UUID&lt;br/&gt;
#03 (112)setup     0:fs_mdt-OST0000  1:dev  2:type  3:f&lt;br/&gt;
#04 (224)marker   5 (flags=0x02, v2.1.0.0) fs_mdt-OST0000  &apos;add ost&apos; Fri Apr 13 15:07:03 2012-&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@perou2 ~&amp;#93;&lt;/span&gt;#&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@perou2 ~&amp;#93;&lt;/span&gt;# llog_reader toto/CONFIGS/fs_mdt-OST0001&lt;br/&gt;
Header size : 8192&lt;br/&gt;
Time : Fri Apr 13 15:07:03 2012&lt;br/&gt;
Number of records: 4&lt;br/&gt;
Target uuid : config_uuid&lt;br/&gt;
-----------------------&lt;br/&gt;
#01 (224)marker   1 (flags=0x01, v2.1.0.0) fs_mdt-OST0001  &apos;add ost&apos; Fri Apr 13 15:07:03 2012-&lt;br/&gt;
#02 (128)attach    0:fs_mdt-OST0001  1:obdfilter  2:fs_mdt-OST0001_UUID&lt;br/&gt;
#03 (112)setup     0:fs_mdt-OST0001  1:dev  2:type  3:f&lt;br/&gt;
#04 (224)marker   1 (flags=0x02, v2.1.0.0) fs_mdt-OST0001  &apos;add ost&apos; Fri Apr 13 15:07:03 2012-&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@perou2 ~&amp;#93;&lt;/span&gt;#&lt;/p&gt;


&lt;p&gt;6. start MDT&lt;/p&gt;

&lt;p&gt;At that point, the content of the MGS is:&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@perou2 ~&amp;#93;&lt;/span&gt;# ls toto/CONFIGS/&lt;br/&gt;
fs_mdt-client  fs_mdt-MDT0000  fs_mdt-OST0000  fs_mdt-OST0001 fs_mdt-sptlrpc  _mgs-sptlrpc  mountdata&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@perou2 ~&amp;#93;&lt;/span&gt;#&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@perou2 ~&amp;#93;&lt;/span&gt;# llog_reader toto/CONFIGS/fs_mdt-OST0000&lt;br/&gt;
Header size : 8192&lt;br/&gt;
Time : Fri Apr 13 15:07:03 2012&lt;br/&gt;
Number of records: 4&lt;br/&gt;
Target uuid : config_uuid&lt;br/&gt;
-----------------------&lt;br/&gt;
#01 (224)marker   5 (flags=0x01, v2.1.0.0) fs_mdt-OST0000  &apos;add ost&apos; Fri Apr 13 15:07:03 2012-&lt;br/&gt;
#02 (128)attach    0:fs_mdt-OST0000  1:obdfilter  2:fs_mdt-OST0000_UUID&lt;br/&gt;
#03 (112)setup     0:fs_mdt-OST0000  1:dev  2:type  3:f&lt;br/&gt;
#04 (224)marker   5 (flags=0x02, v2.1.0.0) fs_mdt-OST0000  &apos;add ost&apos; Fri Apr 13 15:07:03 2012-&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@perou2 ~&amp;#93;&lt;/span&gt;#&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@perou2 ~&amp;#93;&lt;/span&gt;# llog_reader toto/CONFIGS/fs_mdt-OST0001&lt;br/&gt;
Header size : 8192&lt;br/&gt;
Time : Fri Apr 13 15:07:03 2012&lt;br/&gt;
Number of records: 4&lt;br/&gt;
Target uuid : config_uuid&lt;br/&gt;
-----------------------&lt;br/&gt;
#01 (224)marker   1 (flags=0x01, v2.1.0.0) fs_mdt-OST0001  &apos;add ost&apos; Fri Apr 13 15:07:03 2012-&lt;br/&gt;
#02 (128)attach    0:fs_mdt-OST0001  1:obdfilter  2:fs_mdt-OST0001_UUID&lt;br/&gt;
#03 (112)setup     0:fs_mdt-OST0001  1:dev  2:type  3:f&lt;br/&gt;
#04 (224)marker   1 (flags=0x02, v2.1.0.0) fs_mdt-OST0001  &apos;add ost&apos; Fri Apr 13 15:07:03 2012-&lt;/p&gt;


&lt;p&gt;And in the syslog of the MDS node we have:&lt;/p&gt;

&lt;p&gt;1334322067 2012 Apr 13 15:01:07 perou3 kern warning kernel LDISKFS-fs warning (device sdau): ldiskfs_fill_super: extents feature not enabled on this filesystem, use tune2fs.&lt;br/&gt;
1334322068 2012 Apr 13 15:01:08 perou3 kern info kernel LDISKFS-fs (sdau): barriers disabled&lt;br/&gt;
1334322068 2012 Apr 13 15:01:08 perou3 kern info kernel LDISKFS-fs (sdau): mounted filesystem with ordered data mode&lt;br/&gt;
1334322545 2012 Apr 13 15:09:05 perou3 kern warning kernel LDISKFS-fs warning (device sdau): ldiskfs_fill_super: extents feature not enabled on this filesystem, use tune2fs.&lt;br/&gt;
1334322545 2012 Apr 13 15:09:05 perou3 kern info kernel LDISKFS-fs (sdau): barriers disabled&lt;br/&gt;
1334322545 2012 Apr 13 15:09:05 perou3 kern info kernel LDISKFS-fs (sdau): mounted filesystem with ordered data mode&lt;br/&gt;
1334322545 2012 Apr 13 15:09:05 perou3 kern warning kernel LDISKFS-fs warning (device sdau): ldiskfs_fill_super: extents feature not enabled on this filesystem, use tune2fs.&lt;br/&gt;
1334322545 2012 Apr 13 15:09:05 perou3 kern info kernel LDISKFS-fs (sdau): barriers disabled&lt;br/&gt;
1334322545 2012 Apr 13 15:09:05 perou3 kern info kernel LDISKFS-fs (sdau): mounted filesystem with ordered data mode&lt;br/&gt;
1334322545 2012 Apr 13 15:09:05 perou3 kern warning kernel Lustre: 7750:0:(sec.c:1474:sptlrpc_import_sec_adapt()) import MGC10.5.1.3@o2ib-&amp;gt;MGC10.5.1.3@o2ib_0 netid 50000: select flavor null&lt;br/&gt;
1334322545 2012 Apr 13 15:09:05 perou3 kern warning kernel Lustre: MGC10.5.1.3@o2ib: Reactivating import&lt;br/&gt;
1334322545 2012 Apr 13 15:09:05 perou3 kern info kernel Lustre: Enabling ACL&lt;br/&gt;
1334322545 2012 Apr 13 15:09:05 perou3 kern info kernel Lustre: Enabling user_xattr&lt;br/&gt;
1334322545 2012 Apr 13 15:09:05 perou3 kern warning kernel Lustre: fs_mdt-MDT0000: new disk, initializing&lt;br/&gt;
1334322545 2012 Apr 13 15:09:05 perou3 kern warning kernel Lustre: 7783:0:(mds_lov.c:1004:mds_notify()) MDS mdd_obd-fs_mdt-MDT0000: add target fs_mdt-OST0001_UUID&lt;br/&gt;
1334322545 2012 Apr 13 15:09:05 perou3 kern warning kernel Lustre: 7783:0:(mds_lov.c:1004:mds_notify()) Skipped 1 previous similar message&lt;br/&gt;
1334322545 2012 Apr 13 15:09:05 perou3 kern warning kernel Lustre: 620:0:(client.c:1778:ptlrpc_expire_one_request()) @@@ Request x1398844479800069 sent from fs_mdt-OST0001-osc-MDT0000 to NID 10.5.1.6@o2ib has failed due to network error: [&lt;br/&gt;
sent 1334322545] &lt;span class=&quot;error&quot;&gt;&amp;#91;real_sent 1334322545&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;current 1334322545&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;deadline 5s&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;delay -5s&amp;#93;&lt;/span&gt;  req@ffff88030a0dc000 x1398844479800069/t0(0) o-1-&amp;gt;fs_mdt-OST0001_UUID@10.5.1.5@o2ib:28/4 lens 368/512 e 0 to 1 dl 1334322550 ref 1 fl Rpc:XN/ffffffff/ffffffff rc 0/-1&lt;br/&gt;
1334322545 2012 Apr 13 15:09:05 perou3 kern warning kernel Lustre: 620:0:(client.c:1778:ptlrpc_expire_one_request()) @@@ Request x1398844479800070 sent from fs_mdt-OST0000-osc-MDT0000 to NID 10.5.1.6@o2ib has failed due to network error: &lt;span class=&quot;error&quot;&gt;&amp;#91;sent 1334322545&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;real_sent 1334322545&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;current 1334322545&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;deadline 5s&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;delay -5s&amp;#93;&lt;/span&gt; req@ffff88030a0f0000 x1398844479800070/t0(0) o-1-&amp;gt;fs_mdt-OST0000_UUID@10.5.1.5@o2ib:28/4 lens 368/512 e 0 to 1 dl 1334322550 ref 1 fl Rpc:XN/ffffffff/ffffffff rc 0/-1&lt;br/&gt;
1334322570 2012 Apr 13 15:09:30 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) fs_mdt-OST0001-osc-MDT0000: tried all connections, increasing latency to 5s&lt;br/&gt;
1334322570 2012 Apr 13 15:09:30 perou3 kern warning kernel Lustre: 620:0:(client.c:1778:ptlrpc_expire_one_request()) @@@ Request x1398844479800072 sent from fs_mdt-OST0001-osc-MDT0000 to NID 10.5.1.6@o2ib has failed due to network error: &lt;span class=&quot;error&quot;&gt;&amp;#91;sent 1334322570&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;real_sent 1334322570&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;current 1334322570&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;deadline 10s&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;delay -10s&amp;#93;&lt;/span&gt;  req@ffff8803313d3400 x1398844479800072/t0(0) o-1-&amp;gt;fs_mdt-OST0001_UUID@10.5.1.5@o2ib:28/4 lens 368/512 e 0 to 1 dl 1334322580 ref 1 fl Rpc:XN/ffffffff/ffffffff rc 0/-1&lt;br/&gt;
1334322577 2012 Apr 13 15:09:37 perou3 kern warning kernel Lustre: 619:0:(client.c:1778:ptlrpc_expire_one_request()) @@@ Request x1398844479800071 sent from MGC10.5.1.3@o2ib to NID 10.5.1.3@o2ib has timed out for slow reply: &lt;span class=&quot;error&quot;&gt;&amp;#91;sent 1334322570&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;real_sent 1334322570&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;current 1334322577&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;deadline 7s&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;delay 0s&amp;#93;&lt;/span&gt;  req@ffff88031fdf4000 x1398844479800071/t0(0) o-1-&amp;gt;MGS@MGC10.5.1.3@o2ib_0:26/25 lens 192/192 e 0 to 1 dl 1334322577 ref 1 fl Rpc:XN/ffffffff/ffffffff rc 0/-1&lt;br/&gt;
1334322577 2012 Apr 13 15:09:37 perou3 kern warning kernel Lustre: 619:0:(client.c:1778:ptlrpc_expire_one_request()) Skipped 1 previous similar message&lt;br/&gt;
1334322577 2012 Apr 13 15:09:37 perou3 kern err kernel LustreError: 166-1: MGC10.5.1.3@o2ib: Connection to service MGS via nid 10.5.1.3@o2ib was lost; in progress operations using this service will fail.&lt;br/&gt;
1334322583 2012 Apr 13 15:09:43 perou3 kern warning kernel Lustre: 620:0:(client.c:1778:ptlrpc_expire_one_request()) @@@ Request x1398844479800074 sent from MGC10.5.1.3@o2ib to NID 10.5.1.3@o2ib has timed out for slow reply: &lt;span class=&quot;error&quot;&gt;&amp;#91;sent 1334322577&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;real_sent 1334322577&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;current 1334322583&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;deadline 6s&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;delay 0s&amp;#93;&lt;/span&gt;  req@ffff88031fdf4000 x1398844479800074/t0(0) o-1-&amp;gt;MGS@MGC10.5.1.3@o2ib_0:26/25 lens 368/512 e 0 to 1 dl 1334322583 ref 1 fl Rpc:XN/ffffffff/ffffffff rc 0/-1&lt;br/&gt;
1334322602 2012 Apr 13 15:10:02 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) MGC10.5.1.3@o2ib: tried all connections, increasing latency to 6s&lt;br/&gt;
1334322602 2012 Apr 13 15:10:02 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) Skipped 1 previous similar message&lt;br/&gt;
1334322602 2012 Apr 13 15:10:02 perou3 kern warning kernel Lustre: 620:0:(client.c:1778:ptlrpc_expire_one_request()) @@@ Request x1398844479800076 sent from fs_mdt-OST0001-osc-MDT0000 to NID 10.5.1.6@o2ib has failed due to network error: &lt;span class=&quot;error&quot;&gt;&amp;#91;sent 1334322602&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;real_sent 1334322602&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;current 1334322602&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;deadline 15s&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;delay -15s&amp;#93;&lt;/span&gt;  req@ffff88017cafa800 x1398844479800076/t0(0) o-1-&amp;gt;fs_mdt-OST0001_UUID@10.5.1.5@o2ib:28/4 lens 368/512 e 0 to 1 dl 1334322617 ref 1 fl Rpc:XN/ffffffff/ffffffff rc 0/-1&lt;br/&gt;
1334322613 2012 Apr 13 15:10:13 perou3 kern warning kernel Lustre: 620:0:(client.c:1778:ptlrpc_expire_one_request()) @@@ Request x1398844479800075 sent from MGC10.5.1.3@o2ib to NID 10.5.1.3@o2ib has timed out for slow reply: &lt;span class=&quot;error&quot;&gt;&amp;#91;sent 1334322602&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;real_sent 1334322602&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;current 1334322613&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;deadline 11s&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;delay 0s&amp;#93;&lt;/span&gt;  req@ffff88030a074800 x1398844479800075/t0(0) o-1-&amp;gt;MGS@MGC10.5.1.3@o2ib_0:26/25 lens 368/512 e 0 to 1 dl 1334322613 ref 1 fl Rpc:XN/ffffffff/ffffffff rc 0/-1&lt;br/&gt;
1334322613 2012 Apr 13 15:10:13 perou3 kern warning kernel Lustre: 620:0:(client.c:1778:ptlrpc_expire_one_request()) Skipped 1 previous similar message&lt;br/&gt;
1334322627 2012 Apr 13 15:10:27 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) MGC10.5.1.3@o2ib: tried all connections, increasing latency to 11s&lt;br/&gt;
1334322627 2012 Apr 13 15:10:27 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) Skipped 2 previous similar messages&lt;br/&gt;
1334322643 2012 Apr 13 15:10:43 perou3 kern warning kernel Lustre: 620:0:(client.c:1778:ptlrpc_expire_one_request()) @@@ Request x1398844479800078 sent from MGC10.5.1.3@o2ib to NID 10.5.1.3@o2ib has timed out for slow reply: &lt;span class=&quot;error&quot;&gt;&amp;#91;sent 1334322627&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;real_sent 1334322627&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;current 1334322643&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;deadline 16s&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;delay 0s&amp;#93;&lt;/span&gt;  req@ffff88032908c000 x1398844479800078/t0(0) o-1-&amp;gt;MGS@MGC10.5.1.3@o2ib_0:26/25 lens 368/512 e 0 to 1 dl 1334322643 ref 1 fl Rpc:XN/ffffffff/ffffffff rc 0/-1&lt;br/&gt;
1334322643 2012 Apr 13 15:10:43 perou3 kern warning kernel Lustre: 620:0:(client.c:1778:ptlrpc_expire_one_request()) Skipped 1 previous similar message&lt;br/&gt;
1334322652 2012 Apr 13 15:10:52 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) MGC10.5.1.3@o2ib: tried all connections, increasing latency to 16s                             1334322652 2012 Apr 13 15:10:52 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) Skipped 1 previous similar message&lt;br/&gt;
1334322677 2012 Apr 13 15:11:17 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) MGC10.5.1.3@o2ib: tried all connections, increasing latency to 21s&lt;br/&gt;
1334322677 2012 Apr 13 15:11:17 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) Skipped 2 previous similar messages&lt;br/&gt;
1334322677 2012 Apr 13 15:11:17 perou3 kern warning kernel Lustre: 620:0:(client.c:1778:ptlrpc_expire_one_request()) @@@ Request x1398844479800084 sent from fs_mdt-OST0001-osc-MDT0000 to NID 10.5.1.6@o2ib has failed due to network error: &lt;span class=&quot;error&quot;&gt;&amp;#91;sent 1334322677&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;real_sent 1334322677&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;current 1334322677&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;deadline 30s&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;delay -30s&amp;#93;&lt;/span&gt;  req@ffff8801c35a6c00 x1398844479800084/t0(0) o-1-&amp;gt;fs_mdt-OST0001_UUID@10.5.1.5@o2ib:28/4 lens 368/512 e 0 to 1 dl 1334322707 ref 1 fl Rpc:XN/ffffffff/ffffffff rc 0/-1&lt;br/&gt;
1334322677 2012 Apr 13 15:11:17 perou3 kern warning kernel Lustre: 620:0:(client.c:1778:ptlrpc_expire_one_request()) Skipped 3 previous similar messages&lt;br/&gt;
1334322677 2012 Apr 13 15:11:17 perou3 kern warning kernel Lustre: 620:0:(import.c:852:ptlrpc_connect_interpret()) MGS@MGC10.5.1.3@o2ib_0 changed server handle from 0x555b88f8bfb49318 to 0x555b88f8bfb49373&lt;br/&gt;
1334322677 2012 Apr 13 15:11:17 perou3 kern warning kernel Lustre: MGC10.5.1.3@o2ib: Reactivating import&lt;br/&gt;
1334322677 2012 Apr 13 15:11:17 perou3 kern info kernel Lustre: MGC10.5.1.3@o2ib: Connection restored to service MGS using nid 10.5.1.3@o2ib.&lt;br/&gt;
1334322702 2012 Apr 13 15:11:42 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) fs_mdt-OST0001-osc-MDT0000: tried all connections, increasing latency to 30s&lt;br/&gt;
1334322702 2012 Apr 13 15:11:42 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) Skipped 2 previous similar messages&lt;br/&gt;
1334322727 2012 Apr 13 15:12:07 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) fs_mdt-OST0001-osc-MDT0000: tried all connections, increasing latency to 35s&lt;br/&gt;
1334322727 2012 Apr 13 15:12:07 perou3 kern err kernel LustreError: 11-0: an error occurred while communicating with 10.5.1.3@o2ib. The obd_ping operation failed with -107&lt;br/&gt;
1334322727 2012 Apr 13 15:12:07 perou3 kern err kernel LustreError: 166-1: MGC10.5.1.3@o2ib: Connection to service MGS via nid 10.5.1.3@o2ib was lost; in progress operations using this service will fail.&lt;br/&gt;
1334322727 2012 Apr 13 15:12:07 perou3 kern warning kernel Lustre: 620:0:(import.c:852:ptlrpc_connect_interpret()) MGS@MGC10.5.1.3@o2ib_0 changed server handle from 0x555b88f8bfb49373 to 0x555b88f8bfb493dc&lt;br/&gt;
1334322727 2012 Apr 13 15:12:07 perou3 kern warning kernel Lustre: MGC10.5.1.3@o2ib: Reactivating import&lt;br/&gt;
1334322727 2012 Apr 13 15:12:07 perou3 kern info kernel Lustre: MGC10.5.1.3@o2ib: Connection restored to service MGS using nid 10.5.1.3@o2ib.&lt;br/&gt;
1334322752 2012 Apr 13 15:12:32 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) fs_mdt-OST0001-osc-MDT0000: tried all connections, increasing latency to 40s&lt;br/&gt;
1334322752 2012 Apr 13 15:12:32 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) Skipped 1 previous similar message&lt;br/&gt;
1334322752 2012 Apr 13 15:12:32 perou3 kern warning kernel Lustre: 620:0:(client.c:1778:ptlrpc_expire_one_request()) @@@ Request x1398844479800103 sent from fs_mdt-OST0001-osc-MDT0000 to NID 10.5.1.6@o2ib has failed due to network error: &lt;span class=&quot;error&quot;&gt;&amp;#91;sent 1334322752&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;real_sent 1334322752&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;current 1334322752&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;deadline 45s&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;delay -45s&amp;#93;&lt;/span&gt;  req@ffff88030a0dc800 x1398844479800103/t0(0) o-1-&amp;gt;fs_mdt-OST0001_UUID@10.5.1.5@o2ib:28/4 lens 368/512 e 0 to 1 dl 1334322797 ref 1 fl Rpc:XN/ffffffff/ffffffff rc 0/-1&lt;br/&gt;
1334322752 2012 Apr 13 15:12:32 perou3 kern warning kernel Lustre: 620:0:(client.c:1778:ptlrpc_expire_one_request()) Skipped 4 previous similar messages&lt;br/&gt;
1334322802 2012 Apr 13 15:13:22 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) fs_mdt-OST0001-osc-MDT0000: tried all connections, increasing latency to 50s&lt;br/&gt;
1334322802 2012 Apr 13 15:13:22 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) Skipped 2 previous similar messages&lt;br/&gt;
1334322877 2012 Apr 13 15:14:37 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) fs_mdt-OST0001-osc-MDT0000: tried all connections, increasing latency to 50s&lt;br/&gt;
1334322877 2012 Apr 13 15:14:37 perou3 kern warning kernel Lustre: 621:0:(import.c:526:import_select_connection()) Skipped 5 previous similar messages&lt;/p&gt;



&lt;p&gt;On the MDS, we can see:&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@perou3 ~&amp;#93;&lt;/span&gt;# lctl dl&lt;br/&gt;
  0 UP mgc MGC10.5.1.3@o2ib f2d4e47f-96b1-e539-f0e1-e125d27e617f 5&lt;br/&gt;
  1 UP lov fs_mdt-MDT0000-mdtlov fs_mdt-MDT0000-mdtlov_UUID 4&lt;br/&gt;
  2 UP mdt fs_mdt-MDT0000 fs_mdt-MDT0000_UUID 3&lt;br/&gt;
  3 UP mds mdd_obd-fs_mdt-MDT0000 mdd_obd_uuid-fs_mdt-MDT0000 3&lt;br/&gt;
  4 UP osc fs_mdt-OST0001-osc-MDT0000 fs_mdt-MDT0000-mdtlov_UUID 5&lt;br/&gt;
  5 UP osc fs_mdt-OST0000-osc-MDT0000 fs_mdt-MDT0000-mdtlov_UUID 5&lt;/p&gt;


&lt;p&gt;7. Mount client&lt;/p&gt;

&lt;p&gt;In the client syslog, we have:&lt;/p&gt;

&lt;p&gt;1336046968 2012 May  3 14:09:28 perou7 kern info kernel Lustre: OBD class driver, &lt;a href=&quot;http://wiki.whamcloud.com/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://wiki.whamcloud.com/&lt;/a&gt;&lt;br/&gt;
1336046968 2012 May  3 14:09:28 perou7 kern info kernel Lustre:         Lustre Version: 2.1.0&lt;br/&gt;
1336046968 2012 May  3 14:09:28 perou7 kern info kernel Lustre:         Build Version: B-2_1_0_0-lustrebull-20120404161806-CHANGED-2.6.32-71.24.1.bl6.Bull.23.x86_64&lt;br/&gt;
1336046968 2012 May  3 14:09:28 perou7 kern info kernel Lustre: Lustre LU module (ffffffffa053c2c0).&lt;br/&gt;
1336046968 2012 May  3 14:09:28 perou7 kern info kernel Lustre: Register global MR array, MR size: 0xffffffffffffffff, array size: 1&lt;br/&gt;
1336046968 2012 May  3 14:09:28 perou7 kern info kernel Lustre: Added LNI 10.5.1.6@o2ib &lt;span class=&quot;error&quot;&gt;&amp;#91;8/64/0/180&amp;#93;&lt;/span&gt;&lt;br/&gt;
1336046968 2012 May  3 14:09:28 perou7 kern info kernel Lustre: Lustre OSC module (ffffffffa09780c0).&lt;br/&gt;
1336046968 2012 May  3 14:09:28 perou7 kern info kernel Lustre: Lustre LOV module (ffffffffa09e3e40).&lt;br/&gt;
1336046968 2012 May  3 14:09:28 perou7 kern info kernel Lustre: Lustre client module (ffffffffa0d392e0).&lt;br/&gt;
1336046968 2012 May  3 14:09:28 perou7 user info logger lustre-tune: 0 devices have been tuned.&lt;br/&gt;
1336046968 2012 May  3 14:09:28 perou7 kern warning kernel Lustre: 21809:0:(sec.c:1474:sptlrpc_import_sec_adapt()) import MGC10.5.1.3@o2ib-&amp;gt;MGC10.5.1.3@o2ib_0 netid 50000: select flavor null&lt;br/&gt;
1336046968 2012 May  3 14:09:28 perou7 kern err kernel LustreError: 152-6: Ignoring deprecated mount option &apos;acl&apos;.&lt;br/&gt;
1336046968 2012 May  3 14:09:28 perou7 kern warning kernel Lustre: MGC10.5.1.3@o2ib: Reactivating import&lt;br/&gt;
1336046968 2012 May  3 14:09:28 perou7 kern warning kernel Lustre: 21809:0:(sec.c:1474:sptlrpc_import_sec_adapt()) import fs_mdt-MDT0000-mdc-ffff8802d12c9000-&amp;gt;10.5.1.4@o2ib netid 50000: select flavor null&lt;br/&gt;
1336046968 2012 May  3 14:09:28 perou7 kern err kernel LustreError: 11-0: an error occurred while communicating with 10.5.1.4@o2ib. The mds_connect operation failed with -11&lt;br/&gt;
1336046993 2012 May  3 14:09:53 perou7 kern err kernel LustreError: 11-0: an error occurred while communicating with 10.5.1.4@o2ib. The mds_connect operation failed with -11&lt;br/&gt;
1336047018 2012 May  3 14:10:18 perou7 kern err kernel LustreError: 11-0: an error occurred while communicating with 10.5.1.4@o2ib. The mds_connect operation failed with -11&lt;/p&gt;


&lt;p&gt;At the same time, in the MDS log we have:&lt;/p&gt;

&lt;p&gt;1336046968 2012 May  3 14:09:28 perou3 kern warning kernel Lustre: fs_mdt-MDT0000: temporarily refusing client connection from 10.5.1.6@o2ib&lt;br/&gt;
1336046968 2012 May  3 14:09:28 perou3 kern err kernel LustreError: 26684:0:(ldlm_lib.c:2137:target_send_reply_msg()) @@@ processing error (&lt;del&gt;11)  req@ffff88032beafc00 x1400946785517577/t0(0) o-1&lt;/del&gt;&amp;gt;&amp;lt;?&amp;gt;@&amp;lt;?&amp;gt;:0/0 lens 368/0 e 0 to 0 dl 1336047068 ref 1 fl Interpret:/ffffffff/ffffffff rc -11/-1&lt;br/&gt;
1336046993 2012 May  3 14:09:53 perou3 kern warning kernel Lustre: fs_mdt-MDT0000: temporarily refusing client connection from 10.5.1.6@o2ib&lt;br/&gt;
1336046993 2012 May  3 14:09:53 perou3 kern err kernel LustreError: 26684:0:(ldlm_lib.c:2137:target_send_reply_msg()) @@@ processing error (&lt;del&gt;11)  req@ffff88032bee3000 x1400946785517580/t0(0) o-1&lt;/del&gt;&amp;gt;&amp;lt;?&amp;gt;@&amp;lt;?&amp;gt;:0/0 lens 368/0 e 0 to 0 dl 1336047093 ref 1 fl Interpret:/ffffffff/ffffffff rc -11/-1&lt;/p&gt;



&lt;p&gt;This issue looks like &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-350&quot; title=&quot;port bug24050 to master(&amp;quot;lustre_start&amp;quot; caused client nodes failed to mount.)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-350&quot;&gt;&lt;del&gt;LU-350&lt;/del&gt;&lt;/a&gt;, but the fact is the fix from this ticket is landed into Lustre 2.1.&lt;br/&gt;
The wierd thing here is that when the MDT is started, it tries to reach the failover node of the OSTs (NID 10.5.1.6@o2ib) and apparently not their primary node.&lt;br/&gt;
Of course, when starting the MDT before the OSTs, the MDT connects directly to the OSTs with the right NID, ie the primary one.&lt;/p&gt;


&lt;p&gt;Regards,&lt;br/&gt;
Sebastien.&lt;/p&gt;</comment>
                            <comment id="38082" author="pjones" created="Thu, 3 May 2012 08:49:49 +0000"  >&lt;p&gt;Lai&lt;/p&gt;

&lt;p&gt;Could you please analyze this situation?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="38375" author="laisiyao" created="Wed, 9 May 2012 04:01:37 +0000"  >&lt;p&gt;Hi Florent, what&apos;s the output of `llog_reader toto/CONFIGS/fs_mdt-MDT0000` after you started MDS? The nid of OST (MDS connects to) should be written in this config. Could you print this config for the case of MDS starting first also?&lt;/p&gt;</comment>
                            <comment id="38390" author="sebastien.buisson" created="Wed, 9 May 2012 07:36:23 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;When we start OSTs first and then MDT (case when we hit the bug):&lt;/p&gt;

&lt;ol&gt;
	&lt;li&gt;llog_reader toto/CONFIGS/fs_mdt-MDT0000&lt;br/&gt;
Header size : 8192&lt;br/&gt;
Time : Wed May  9 13:08:57 2012&lt;br/&gt;
Number of records: 32&lt;br/&gt;
Target uuid : config_uuid &lt;br/&gt;
-----------------------&lt;br/&gt;
#01 (224)marker   7 (flags=0x01, v2.1.0.0) fs_mdt-MDT0000-mdtlov &apos;lov setup&apos; Wed May  9 13:08:57 2012-&lt;br/&gt;
#02 (136)attach    0:fs_mdt-MDT0000-mdtlov  1:lov  2:fs_mdt-MDT0000-mdtlov_UUID  &lt;br/&gt;
#03 (176)lov_setup 0:fs_mdt-MDT0000-mdtlov  1:(struct lov_desc)&lt;br/&gt;
                uuid=fs_mdt-MDT0000-mdtlov_UUID  stripe:cnt=1 size=1048576 offset=18446744073709551615 pattern=0x1&lt;br/&gt;
#04 (224)marker   7 (flags=0x02, v2.1.0.0) fs_mdt-MDT0000-mdtlov &apos;lov setup&apos; Wed May  9 13:08:57 2012-&lt;br/&gt;
#05 (224)marker   8 (flags=0x01, v2.1.0.0) fs_mdt-MDT0000  &apos;add mdt&apos; Wed May  9 13:08:57 2012-&lt;br/&gt;
#06 (120)attach    0:fs_mdt-MDT0000  1:mdt  2:fs_mdt-MDT0000_UUID  &lt;br/&gt;
#07 (112)mount_option 0:  1:fs_mdt-MDT0000  2:fs_mdt-MDT0000-mdtlov  &lt;br/&gt;
#08 (160)setup     0:fs_mdt-MDT0000  1:fs_mdt-MDT0000_UUID  2:0  3:fs_mdt-MDT0000-mdtlov  4:f  &lt;br/&gt;
#09 (224)marker   8 (flags=0x02, v2.1.0.0) fs_mdt-MDT0000  &apos;add mdt&apos; Wed May  9 13:08:57 2012-&lt;br/&gt;
#10 (224)marker   9 (flags=0x01, v2.1.0.0) fs_mdt-MDT0000  &apos;add osc(copied)&apos; Wed May  9 13:08:57 2012-&lt;br/&gt;
#11 (224)marker  10 (flags=0x01, v2.1.0.0) fs_mdt-OST0000  &apos;add osc&apos; Wed May  9 13:08:57 2012-&lt;br/&gt;
#12 (080)add_uuid  nid=10.5.1.5@o2ib(0x500000a050105)  0:  1:10.5.1.5@o2ib  &lt;br/&gt;
#13 (080)add_uuid  nid=10.5.1.6@o2ib(0x500000a050106)  0:  1:10.5.1.5@o2ib  &lt;br/&gt;
#14 (144)attach    0:fs_mdt-OST0000-osc-MDT0000  1:osc  2:fs_mdt-MDT0000-mdtlov_UUID  &lt;br/&gt;
#15 (144)setup     0:fs_mdt-OST0000-osc-MDT0000  1:fs_mdt-OST0000_UUID  2:10.5.1.5@o2ib  &lt;br/&gt;
#16 (136)lov_modify_tgts add 0:fs_mdt-MDT0000-mdtlov  1:fs_mdt-OST0000_UUID  2:0  3:1  &lt;br/&gt;
#17 (224)marker  10 (flags=0x02, v2.1.0.0) fs_mdt-OST0000  &apos;add osc&apos; Wed May  9 13:08:57 2012-&lt;br/&gt;
#18 (224)marker  10 (flags=0x02, v2.1.0.0) fs_mdt-MDT0000  &apos;add osc(copied)&apos; Wed May  9 13:08:57 2012-&lt;br/&gt;
#19 (224)marker  11 (flags=0x01, v2.1.0.0) fs_mdt-MDT0000  &apos;add osc(copied)&apos; Wed May  9 13:08:57 2012-&lt;br/&gt;
#20 (224)marker  12 (flags=0x01, v2.1.0.0) fs_mdt-OST0001  &apos;add osc&apos; Wed May  9 13:08:57 2012-&lt;br/&gt;
#21 (080)add_uuid  nid=10.5.1.5@o2ib(0x500000a050105)  0:  1:10.5.1.5@o2ib  &lt;br/&gt;
#22 (080)add_uuid  nid=10.5.1.6@o2ib(0x500000a050106)  0:  1:10.5.1.5@o2ib  &lt;br/&gt;
#23 (080)add_uuid  nid=10.5.1.5@o2ib(0x500000a050105)  0:  1:10.5.1.5@o2ib  &lt;br/&gt;
#24 (080)add_uuid  nid=10.5.1.6@o2ib(0x500000a050106)  0:  1:10.5.1.5@o2ib  &lt;br/&gt;
#25 (144)attach    0:fs_mdt-OST0001-osc-MDT0000  1:osc  2:fs_mdt-MDT0000-mdtlov_UUID  &lt;br/&gt;
#26 (144)setup     0:fs_mdt-OST0001-osc-MDT0000  1:fs_mdt-OST0001_UUID  2:10.5.1.5@o2ib  &lt;br/&gt;
#27 (136)lov_modify_tgts add 0:fs_mdt-MDT0000-mdtlov  1:fs_mdt-OST0001_UUID  2:1  3:1  &lt;br/&gt;
#28 (224)marker  12 (flags=0x02, v2.1.0.0) fs_mdt-OST0001  &apos;add osc&apos; Wed May  9 13:08:57 2012-&lt;br/&gt;
#29 (224)marker  12 (flags=0x02, v2.1.0.0) fs_mdt-MDT0000  &apos;add osc(copied)&apos; Wed May  9 13:08:57 2012-&lt;br/&gt;
#30 (224)marker  15 (flags=0x01, v2.1.0.0) fs_mdt-MDT0000-mdtlov &apos;lov.stripesize&apos; Wed May  9 13:08:57 2012-&lt;br/&gt;
#31 (112)param 0:fs_mdt-MDT0000-mdtlov  1:lov.stripesize=1048576  &lt;br/&gt;
#32 (224)marker  15 (flags=0x02, v2.1.0.0) fs_mdt-MDT0000-mdtlov &apos;lov.stripesize&apos; Wed May  9 13:08:57 2012-&lt;/li&gt;
&lt;/ol&gt;



&lt;p&gt;When we start MDT first and then the OSTs (no bug):&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;right after MDT start:&lt;/li&gt;
&lt;/ul&gt;


&lt;ol&gt;
	&lt;li&gt;llog_reader toto/CONFIGS/fs_mdt-MDT0000&lt;br/&gt;
Header size : 8192&lt;br/&gt;
Time : Wed May  9 13:21:35 2012&lt;br/&gt;
Number of records: 12&lt;br/&gt;
Target uuid : config_uuid &lt;br/&gt;
-----------------------&lt;br/&gt;
#01 (224)marker   1 (flags=0x01, v2.1.0.0) fs_mdt-MDT0000-mdtlov &apos;lov setup&apos; Wed May  9 13:21:35 2012-&lt;br/&gt;
#02 (136)attach    0:fs_mdt-MDT0000-mdtlov  1:lov  2:fs_mdt-MDT0000-mdtlov_UUID  &lt;br/&gt;
#03 (176)lov_setup 0:fs_mdt-MDT0000-mdtlov  1:(struct lov_desc)&lt;br/&gt;
                uuid=fs_mdt-MDT0000-mdtlov_UUID  stripe:cnt=1 size=1048576 offset=18446744073709551615 pattern=0x1&lt;br/&gt;
#04 (224)marker   1 (flags=0x02, v2.1.0.0) fs_mdt-MDT0000-mdtlov &apos;lov setup&apos; Wed May  9 13:21:35 2012-&lt;br/&gt;
#05 (224)marker   2 (flags=0x01, v2.1.0.0) fs_mdt-MDT0000  &apos;add mdt&apos; Wed May  9 13:21:35 2012-&lt;br/&gt;
#06 (120)attach    0:fs_mdt-MDT0000  1:mdt  2:fs_mdt-MDT0000_UUID  &lt;br/&gt;
#07 (112)mount_option 0:  1:fs_mdt-MDT0000  2:fs_mdt-MDT0000-mdtlov  &lt;br/&gt;
#08 (160)setup     0:fs_mdt-MDT0000  1:fs_mdt-MDT0000_UUID  2:0  3:fs_mdt-MDT0000-mdtlov  4:f  &lt;br/&gt;
#09 (224)marker   2 (flags=0x02, v2.1.0.0) fs_mdt-MDT0000  &apos;add mdt&apos; Wed May  9 13:21:35 2012-&lt;br/&gt;
#10 (224)marker   7 (flags=0x01, v2.1.0.0) fs_mdt-MDT0000-mdtlov &apos;lov.stripesize&apos; Wed May  9 13:21:35 2012-&lt;br/&gt;
#11 (112)param 0:fs_mdt-MDT0000-mdtlov  1:lov.stripesize=1048576  &lt;br/&gt;
#12 (224)marker   7 (flags=0x02, v2.1.0.0) fs_mdt-MDT0000-mdtlov &apos;lov.stripesize&apos; Wed May  9 13:21:35 2012-&lt;/li&gt;
&lt;/ol&gt;



&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;then after OSTs start:&lt;/li&gt;
&lt;/ul&gt;


&lt;ol&gt;
	&lt;li&gt;llog_reader toto/CONFIGS/fs_mdt-MDT0000&lt;br/&gt;
Header size : 8192&lt;br/&gt;
Time : Wed May  9 13:21:35 2012&lt;br/&gt;
Number of records: 28&lt;br/&gt;
Target uuid : config_uuid &lt;br/&gt;
-----------------------&lt;br/&gt;
#01 (224)marker   1 (flags=0x01, v2.1.0.0) fs_mdt-MDT0000-mdtlov &apos;lov setup&apos; Wed May  9 13:21:35 2012-&lt;br/&gt;
#02 (136)attach    0:fs_mdt-MDT0000-mdtlov  1:lov  2:fs_mdt-MDT0000-mdtlov_UUID  &lt;br/&gt;
#03 (176)lov_setup 0:fs_mdt-MDT0000-mdtlov  1:(struct lov_desc)&lt;br/&gt;
                uuid=fs_mdt-MDT0000-mdtlov_UUID  stripe:cnt=1 size=1048576 offset=18446744073709551615 pattern=0x1&lt;br/&gt;
#04 (224)marker   1 (flags=0x02, v2.1.0.0) fs_mdt-MDT0000-mdtlov &apos;lov setup&apos; Wed May  9 13:21:35 2012-&lt;br/&gt;
#05 (224)marker   2 (flags=0x01, v2.1.0.0) fs_mdt-MDT0000  &apos;add mdt&apos; Wed May  9 13:21:35 2012-&lt;br/&gt;
#06 (120)attach    0:fs_mdt-MDT0000  1:mdt  2:fs_mdt-MDT0000_UUID  &lt;br/&gt;
#07 (112)mount_option 0:  1:fs_mdt-MDT0000  2:fs_mdt-MDT0000-mdtlov  &lt;br/&gt;
#08 (160)setup     0:fs_mdt-MDT0000  1:fs_mdt-MDT0000_UUID  2:0  3:fs_mdt-MDT0000-mdtlov  4:f  &lt;br/&gt;
#09 (224)marker   2 (flags=0x02, v2.1.0.0) fs_mdt-MDT0000  &apos;add mdt&apos; Wed May  9 13:21:35 2012-&lt;br/&gt;
#10 (224)marker   7 (flags=0x01, v2.1.0.0) fs_mdt-MDT0000-mdtlov &apos;lov.stripesize&apos; Wed May  9 13:21:35 2012-&lt;br/&gt;
#11 (112)param 0:fs_mdt-MDT0000-mdtlov  1:lov.stripesize=1048576  &lt;br/&gt;
#12 (224)marker   7 (flags=0x02, v2.1.0.0) fs_mdt-MDT0000-mdtlov &apos;lov.stripesize&apos; Wed May  9 13:21:35 2012-&lt;br/&gt;
#13 (224)marker  10 (flags=0x01, v2.1.0.0) fs_mdt-OST0001  &apos;add osc&apos; Wed May  9 13:22:26 2012-&lt;br/&gt;
#14 (080)add_uuid  nid=10.5.1.5@o2ib(0x500000a050105)  0:  1:10.5.1.5@o2ib  &lt;br/&gt;
#15 (144)attach    0:fs_mdt-OST0001-osc-MDT0000  1:osc  2:fs_mdt-MDT0000-mdtlov_UUID  &lt;br/&gt;
#16 (144)setup     0:fs_mdt-OST0001-osc-MDT0000  1:fs_mdt-OST0001_UUID  2:10.5.1.5@o2ib  &lt;br/&gt;
#17 (080)add_uuid  nid=10.5.1.6@o2ib(0x500000a050106)  0:  1:10.5.1.6@o2ib  &lt;br/&gt;
#18 (112)add_conn  0:fs_mdt-OST0001-osc-MDT0000  1:10.5.1.6@o2ib  &lt;br/&gt;
#19 (136)lov_modify_tgts add 0:fs_mdt-MDT0000-mdtlov  1:fs_mdt-OST0001_UUID  2:1  3:1  &lt;br/&gt;
#20 (224)marker  10 (flags=0x02, v2.1.0.0) fs_mdt-OST0001  &apos;add osc&apos; Wed May  9 13:22:26 2012-&lt;br/&gt;
#21 (224)marker  13 (flags=0x01, v2.1.0.0) fs_mdt-OST0000  &apos;add osc&apos; Wed May  9 13:22:27 2012-&lt;br/&gt;
#22 (080)add_uuid  nid=10.5.1.5@o2ib(0x500000a050105)  0:  1:10.5.1.5@o2ib  &lt;br/&gt;
#23 (144)attach    0:fs_mdt-OST0000-osc-MDT0000  1:osc  2:fs_mdt-MDT0000-mdtlov_UUID  &lt;br/&gt;
#24 (144)setup     0:fs_mdt-OST0000-osc-MDT0000  1:fs_mdt-OST0000_UUID  2:10.5.1.5@o2ib  &lt;br/&gt;
#25 (080)add_uuid  nid=10.5.1.6@o2ib(0x500000a050106)  0:  1:10.5.1.6@o2ib  &lt;br/&gt;
#26 (112)add_conn  0:fs_mdt-OST0000-osc-MDT0000  1:10.5.1.6@o2ib  &lt;br/&gt;
#27 (136)lov_modify_tgts add 0:fs_mdt-MDT0000-mdtlov  1:fs_mdt-OST0000_UUID  2:0  3:1  &lt;br/&gt;
#28 (224)marker  13 (flags=0x02, v2.1.0.0) fs_mdt-OST0000  &apos;add osc&apos; Wed May  9 13:22:27 2012-&lt;/li&gt;
&lt;/ol&gt;



&lt;p&gt;For the record, the OSS hosting the 2 OSTs has NID 10.5.1.5@o2ib.&lt;/p&gt;

&lt;p&gt;Cheers,&lt;br/&gt;
Sebastien.&lt;/p&gt;</comment>
                            <comment id="38994" author="laisiyao" created="Thu, 17 May 2012 11:08:05 +0000"  >&lt;p&gt;The llog config looks fine. And I did the same test, but can&apos;t reproduce with master code. Could you `lctl set_param debug=-1` on MDS node, and dump debug log after MDS mounted. I&apos;ll test 2.1 later.&lt;/p&gt;</comment>
                            <comment id="39051" author="laisiyao" created="Fri, 18 May 2012 05:23:47 +0000"  >&lt;p&gt;2.1 test can pass in my test env too.&lt;/p&gt;</comment>
                            <comment id="39123" author="sebastien.buisson" created="Mon, 21 May 2012 08:54:36 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;Here is the requested debug information:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;lustre_mds_ko.gz is the debug log when the OSTs are mounted first, which means we git the error;&lt;/li&gt;
	&lt;li&gt;lustre_mds_ok.gz is the debug log when the MDT is mounted first, so that we have no error.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;HTH&lt;br/&gt;
Sebastien.&lt;/p&gt;</comment>
                            <comment id="40416" author="laisiyao" created="Tue, 12 Jun 2012 05:17:24 +0000"  >&lt;p&gt;It looks like the config generated for MDS(for the failure case) is wrong: both nid 10.5.1.5@o2ib and 10.5.1.6@o2ib are mapped to uuid 10.5.1.5@o2ib, and during MDS osc connect, it tends to use the last nid, which is 10.5.1.6@o2ib, thus connection always fail. I&apos;ll dig into config code to see why this happens.&lt;/p&gt;</comment>
                            <comment id="40556" author="laisiyao" created="Thu, 14 Jun 2012 06:33:16 +0000"  >&lt;p&gt;Once OSTs start before MDT, MDT needs steal OSC config from client config, however both target nids and failover nids are written in config as the same type: LCFG_ADD_UUID. Previously, all these nids are treated as target nids, so wrong config file is generated for MDT. And ptlrpc tends to use the last nid of target (if all nids in the same subnet) to connect to target, so the error happens.&lt;/p&gt;

&lt;p&gt;The patch for master is at: &lt;a href=&quot;http://review.whamcloud.com/#change,3107&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,3107&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It should be able to be patched to 2.1 branch. Sebastien, could you help verify that it works for you?&lt;/p&gt;</comment>
                            <comment id="40659" author="sebastien.buisson" created="Fri, 15 Jun 2012 10:26:41 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;I gave a try to the patch.&lt;br/&gt;
The good news is it fixes the issue when we mount OSTs before the MDT, having the &apos;first_time&apos; ldd flag set (meaning it is the first time this Lustre file system is mounted).&lt;br/&gt;
But the bad news is it does not fix the issue when we explicitly set the &apos;writeconf&apos; flag (and same starting order: OSTs then MDT).&lt;/p&gt;

&lt;p&gt;On the MGS, the logs are:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;when we start OSTs:&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;1339769487 2012 Jun 15 16:11:27 perou2 kern warning kernel Lustre: 956:0:(ldlm_lib.c:876:target_handle_connect()) MGS: connection from 527e8357-0c75-491a-738c-18026d6ec94c@10.5.1.5@o2ib t0 exp (null) cur 1339769487 last 0&lt;br/&gt;
1339769487 2012 Jun 15 16:11:27 perou2 kern warning kernel Lustre: 956:0:(sec.c:1474:sptlrpc_import_sec_adapt()) import MGS-&amp;gt;NET_0x500000a050105_UUID netid 50000: select flavor null&lt;br/&gt;
1339769487 2012 Jun 15 16:11:27 perou2 kern warning kernel Lustre: 955:0:(ldlm_lib.c:791:target_handle_connect()) MGS: exp ffff8802eb2ad800 already connecting&lt;br/&gt;
1339769487 2012 Jun 15 16:11:27 perou2 kern err kernel LustreError: 955:0:(mgs_handler.c:783:mgs_handle()) MGS handle cmd=250 rc=-114&lt;br/&gt;
1339769487 2012 Jun 15 16:11:27 perou2 kern err kernel LustreError: 955:0:(ldlm_lib.c:2137:target_send_reply_msg()) @@@ processing error (&lt;del&gt;114)  req@ffff8803179b0850 x1404849096753257/t0(0) o-1&lt;/del&gt;&amp;gt;&amp;lt;?&amp;gt;@&amp;lt;?&amp;gt;:0/0 lens 368/264 e 0 to 0 dl 1339769587 ref 1 fl Interpret:/ffffffff/ffffffff rc -114/-1&lt;br/&gt;
1339769512 2012 Jun 15 16:11:52 perou2 kern warning kernel Lustre: MGS: Regenerating fs_mdt-OST0001 log by user request.&lt;/p&gt;

&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;when we then start the MDT:&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;1339769542 2012 Jun 15 16:12:22 perou2 kern warning kernel Lustre: 956:0:(ldlm_lib.c:876:target_handle_connect()) MGS: connection from 35c02d1f-5920-b87f-bc64-358fbcf00f05@10.5.1.4@o2ib t0 exp (null) cur 1339769542 last 0&lt;br/&gt;
1339769542 2012 Jun 15 16:12:22 perou2 kern warning kernel Lustre: 956:0:(ldlm_lib.c:876:target_handle_connect()) Skipped 1 previous similar message&lt;br/&gt;
1339769542 2012 Jun 15 16:12:22 perou2 kern warning kernel Lustre: MGS: Logs for fs fs_mdt were removed by user request.  All servers must be restarted in order to regenerate the logs.&lt;br/&gt;
1339769542 2012 Jun 15 16:12:22 perou2 kern info kernel Lustre: Setting parameter fs_mdt-MDT0000-mdtlov.lov.stripesize in log fs_mdt-MDT0000&lt;br/&gt;
1339769542 2012 Jun 15 16:12:22 perou2 kern info kernel Lustre: Skipped 1 previous similar message&lt;/p&gt;


&lt;p&gt;So half of the problem is solved ;-(&lt;/p&gt;

&lt;p&gt;HTH,&lt;br/&gt;
Sebastien.&lt;/p&gt;</comment>
                            <comment id="40721" author="laisiyao" created="Sun, 17 Jun 2012 23:52:19 +0000"  >&lt;p&gt;Sebastien, could you tell me how did you set &apos;writeconf&apos;? In my understanding &apos;writeconf&apos; implies removing all existed config for the specified fs, so in your test, when MDT started, it removes all configs for &apos;fs_mdt&apos; (including OSTs). Could you explain why you set &apos;writeconf&apos;, and normally how you use it?&lt;/p&gt;</comment>
                            <comment id="40920" author="sebastien.buisson" created="Wed, 20 Jun 2012 09:33:16 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;I use &apos;writeconf&apos; when I need to reformat a file system.&lt;/p&gt;

&lt;p&gt;For instance, consider I have a &apos;fs1&apos; file system formated and running, talking to my MGS. Now I stop this &apos;fs1&apos; file system, and build a new file system (different OSTs, and/or different MDT) while keeping the same &apos;fs1&apos; name. Then, if I format this new &apos;fs1&apos; file system, there will be a problem on the MGS, because the configuration information for &apos;fs1&apos; will be there already. This is why I need to use the &apos;writeconf&apos; parameter.&lt;/p&gt;

&lt;p&gt;I always set the &apos;writeconf&apos; parameter on the OSTs as well as the MDT:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;mkfs.lustre --reformat --fsname=fs_mdt --mdt --index=0 --mgsnode=perou2-ib0@o2ib0 --param=lov.stripesize=1048576 --network=o2ib0 --writeconf --mkfsoptions=&quot;-j -J device=/dev/disk/by-id/scsi-2003013841aac000d -m 0&quot; /dev/disk/by-id/scsi-2003013841aac002d&lt;/li&gt;
	&lt;li&gt;mkfs.lustre --reformat --fsname=fs_mdt --ost --index=0 --mgsnode=perou2-ib0@o2ib0 --failnode=perou7-ib0@o2ib0 --network=o2ib0 --writeconf --mkfsoptions=&quot;-j -J device=/dev/disk/by-id/scsi-2003013841aac0017 -m 0&quot; /dev/disk/by-id/scsi-2003013841aac0037&lt;/li&gt;
	&lt;li&gt;mkfs.lustre --reformat --fsname=fs_mdt --ost --index=1 --mgsnode=perou2-ib0@o2ib0 --failnode=perou7-ib0@o2ib0 --network=o2ib0 --writeconf --mkfsoptions=&quot;-j -J device=/dev/disk/by-id/scsi-2003013841aac0015 -m 0&quot; /dev/disk/by-id/scsi-2003013841aac0035&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;If I set &apos;writeconf&apos; on all targets, I do not understand why the configuration should not be regenerated on the MGS, even if I start the OSTs first.&lt;/p&gt;

&lt;p&gt;Cheers,&lt;br/&gt;
Sebastien.&lt;/p&gt;</comment>
                            <comment id="40969" author="laisiyao" created="Thu, 21 Jun 2012 01:54:45 +0000"  >&lt;p&gt;Lustre Manual 14.4 Regenerating Lustre Configuration Logs:&lt;br/&gt;
Run the writeconf command on all servers (MDT first, then OSTs)&lt;br/&gt;
Start the file system in this order:&lt;br/&gt;
 MGS (or the combined MGS/MDT)&lt;br/&gt;
 MDT&lt;br/&gt;
 OSTs&lt;br/&gt;
 Lustre clients&lt;/p&gt;

&lt;p&gt;And the code also shows that writeconf MDT will erase all fs log (MDT, OSTs and clients), so there is message like this in your log:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;1339769542 2012 Jun 15 16:12:22 perou2 kern warning kernel Lustre: MGS: Logs for fs fs_mdt were removed by user request. All servers must be restarted in order to regenerate the logs.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In my understanding this is a special case, and I&apos;m not clear how &apos;writeconf&apos; is originally designed, but this doesn&apos;t look to be an issue.&lt;/p&gt;</comment>
                            <comment id="40970" author="sebastien.buisson" created="Thu, 21 Jun 2012 02:51:00 +0000"  >&lt;p&gt;Hum, sorry. You are absolutely right, one is supposed to invert target start order when &apos;writeconf&apos; flag is set.&lt;/p&gt;

&lt;p&gt;So, in the end, I think I can say your patch is working great!&lt;br/&gt;
I have changed my review to &apos;+1&apos; in &lt;a href=&quot;http://review.whamcloud.com/3107&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/3107&lt;/a&gt; &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;

&lt;p&gt;Cheers,&lt;br/&gt;
Sebastien.&lt;/p&gt;</comment>
                            <comment id="53076" author="laisiyao" created="Wed, 27 Feb 2013 00:31:14 +0000"  >&lt;p&gt;Patch landed.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="11086" name="lu-1287.dmesg.tar" size="30720" author="pichong" created="Fri, 6 Apr 2012 04:33:07 +0000"/>
                            <attachment id="11414" name="lustre_mds_ko.gz" size="344849" author="sebastien.buisson" created="Mon, 21 May 2012 08:54:36 +0000"/>
                            <attachment id="11415" name="lustre_mds_ok.gz" size="314577" author="sebastien.buisson" created="Mon, 21 May 2012 08:54:36 +0000"/>
                            <attachment id="11084" name="mdt_failure.log" size="4914" author="pichong" created="Thu, 5 Apr 2012 11:38:23 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvqa7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>8134</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>