<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:24:18 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-2330] OST unable to complete registration with MGS</title>
                <link>https://jira.whamcloud.com/browse/LU-2330</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We created a new filesystem with 768 zfs-osd OSTs.  The OSTs were started out-of-index order, in parallel.  There was a failure in the initial registration of about 35 of the OSTs:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2012-11-14 12:30:07 Lustre: Lustre: Build Version: 2.3.54-6chaos-6chaos--PRISTINE-2.6.32-220.23.1.2chaos.ch5.x86_64
2012-11-14 12:30:19 LustreError: 41374:0:(client.c:1123:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff881011707000 x1418644692140034/t0(0) o253-&amp;gt;MGC172.20.5.1@o2ib500@172.20.5.1@o2ib500:26/25 lens 4768/4768 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
2012-11-14 12:30:19 Lustre: Error -5 communicating with the MGS, is the MGS running?
2012-11-14 12:30:25 LustreError: 41374:0:(client.c:1123:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff881011707000 x1418644692140035/t0(0) o101-&amp;gt;MGC172.20.5.1@o2ib500@172.20.5.1@o2ib500:26/25 lens 328/384 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
2012-11-14 12:30:31 LustreError: 41374:0:(client.c:1123:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff881011707000 x1418644692140036/t0(0) o101-&amp;gt;MGC172.20.5.1@o2ib500@172.20.5.1@o2ib500:26/25 lens 328/384 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
2012-11-14 12:30:31 LustreError: 15c-8: MGC172.20.5.1@o2ib500: The configuration from log &apos;lsfull-OST0062&apos; failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
2012-11-14 12:30:31 LustreError: 41374:0:(obd_mount.c:1851:server_start_targets()) failed to start server lsfull-OST0062: -5
2012-11-14 12:30:31 Lustre: lsfull-OST0062: Unable to start target: -5
2012-11-14 12:30:31 LustreError: 41374:0:(obd_mount.c:1350:lustre_disconnect_osp()) Can&apos;t end config log lsfull
2012-11-14 12:30:31 LustreError: 41374:0:(obd_mount.c:2113:server_put_super()) lsfull-OST0062: failed to disconnect osp-on-ost (rc=-2)!
2012-11-14 12:30:31 LustreError: 41374:0:(obd_mount.c:2143:server_put_super()) no obd lsfull-OST0062
2012-11-14 12:30:31 LustreError: 41374:0:(obd_mount.c:1418:lustre_stop_osp()) Can not find osp-on-ost lsfull-MDT0000-osp-OST0062
2012-11-14 12:30:31 LustreError: 41374:0:(obd_mount.c:2158:server_put_super()) lsfull-OST0062: Fail to stop osp-on-ost!
2012-11-14 12:30:58 Lustre: server umount lsfull-OST0062 complete
2012-11-14 12:30:58 LustreError: 41374:0:(obd_mount.c:2990:lustre_fill_super()) Unable to mount  (-5)
2012-11-14 12:32:54 LustreError: 42061:0:(mgc_request.c:248:do_config_log_add()) failed processing sptlrpc log: -2
2012-11-14 12:32:54 Lustre: lsfull-OST0062: Initializing new disk
2012-11-14 12:34:10 LustreError: 166-1: MGC172.20.5.1@o2ib500: Connection to MGS (at 172.20.5.1@o2ib500) was lost; in progress operations using this service will fail
2012-11-14 12:50:50 Lustre: Evicted from MGS (at MGC172.20.5.1@o2ib500_0) after server handle changed from 0x2ca8c28d7be253b7 to 0xa10ed509d011108e
2012-11-14 12:50:50 Lustre: MGC172.20.5.1@o2ib500: Connection restored to MGS (at 172.20.5.1@o2ib500)
2012-11-14 12:56:27 LustreError: 137-5: UUID &apos;ls1-OST0062_UUID&apos; is not available for connect (no target)
2012-11-14 12:56:27 LustreError: Skipped 21 previous similar messages
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;df shows the OST mounted, and the lustre:svname property value no longer has the &quot;:&quot;, however, but it has no exports:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# grove98 /root &amp;gt; df -t lustre
Filesystem           1K-blocks      Used Available Use% Mounted on
grove98/lsfull-ost0  67554518656      1152 67554515456   1% /mnt/lustre/local/lsfull-OST0062
# grove98 /root &amp;gt; zfs get lustre:svname grove98/lsfull-ost0
NAME                 PROPERTY       VALUE           SOURCE
grove98/lsfull-ost0  lustre:svname  lsfull-OST0062  local
# grove98 /root &amp;gt; ls /proc/fs/lustre/obdfilter/lsfull-OST0062/exports/
clear
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If I try to restart the OST, the MGS complains it isn&apos;t registered:&lt;/p&gt;

&lt;p&gt;MGS:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;LustreError: 142-7: The target lsfull-OST0062 has not registered yet. It must be started before failnids can be added.
LustreError: 527:0:(mgs_llog.c:2956:mgs_write_log_param()) err -2 on param &apos;failover.node=172.20.1.97@o2ib500&apos;
LustreError: 527:0:(mgs_handler.c:393:mgs_handle_target_reg()) Failed to write lsfull-OST0062 log (-2)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;OSS:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2012-11-14 15:52:56 Lustre: Error -2 communicating with the MGS, is the MGS running?
2012-11-14 15:52:56 LustreError: 45074:0:(mgc_request.c:248:do_config_log_add()) failed processing sptlrpc log: -2
2012-11-14 15:52:56 LustreError: 15c-8: MGC172.20.5.1@o2ib500: The configuration from log &apos;lsfull-OST0062&apos; failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
2012-11-14 15:52:56 LustreError: 45074:0:(obd_mount.c:1851:server_start_targets()) failed to start server lsfull-OST0062: -2
2012-11-14 15:52:56 Lustre: lsfull-OST0062: Unable to start target: -2
2012-11-14 15:52:56 LustreError: 45074:0:(obd_mount.c:1350:lustre_disconnect_osp()) Can&apos;t end config log lsfull
2012-11-14 15:52:56 LustreError: 45074:0:(obd_mount.c:2113:server_put_super()) lsfull-OST0062: failed to disconnect osp-on-ost (rc=-2)!
2012-11-14 15:52:56 LustreError: 45074:0:(obd_mount.c:2143:server_put_super()) no obd lsfull-OST0062
2012-11-14 15:52:56 LustreError: 45074:0:(obd_mount.c:1418:lustre_stop_osp()) Can not find osp-on-ost lsfull-MDT0000-osp-OST0062
2012-11-14 15:52:56 LustreError: 45074:0:(obd_mount.c:2158:server_put_super()) lsfull-OST0062: Fail to stop osp-on-ost!
2012-11-14 15:52:57 Lustre: server umount lsfull-OST0062 complete
2012-11-14 15:52:57 LustreError: 45074:0:(obd_mount.c:2990:lustre_fill_super()) Unable to mount  (-2)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;


&lt;p&gt;Finally, if put the &quot;:&quot; back in the lustre:svname property to try to force re-registration, the MGS complains the index is already in use:&lt;/p&gt;

&lt;p&gt;OSS:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# grove98 /root &amp;gt; zfs set lustre:svname=lsfull:OST0062 grove98/lsfull-ost0
# grove98 /root &amp;gt; zfs get lustre:svname grove98/lsfull-ost0
NAME                 PROPERTY       VALUE           SOURCE
grove98/lsfull-ost0  lustre:svname  lsfull:OST0062  local
# grove98 /root &amp;gt; /etc/init.d/lustre start
Mounting grove98/lsfull-ost0 on /mnt/lustre/local/lsfull-OST0062
mount.lustre: mount grove98/lsfull-ost0 at /mnt/lustre/local/lsfull-OST0062 failed: No such file or directory
Is the MGS specification correct?
Is the filesystem name correct?
If upgrading, is the copied client log valid? (see upgrade docs)
# grove98 /root &amp;gt; dmesg | tail
Lustre: Error -98 communicating with the MGS, is the MGS running?
LustreError: 45333:0:(mgc_request.c:248:do_config_log_add()) failed processing sptlrpc log: -2
LustreError: 15c-8: MGC172.20.5.1@o2ib500: The configuration from log &apos;lsfull-OST0062&apos; failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
LustreError: 45333:0:(obd_mount.c:1851:server_start_targets()) failed to start server lsfull-OST0062: -2
Lustre: lsfull-OST0062: Unable to start target: -2
LustreError: 45333:0:(obd_mount.c:1350:lustre_disconnect_osp()) Can&apos;t end config log lsfull
LustreError: 45333:0:(obd_mount.c:2113:server_put_super()) lsfull-OST0062: failed to disconnect osp-on-ost (rc=-2)!
LustreError: 45333:0:(obd_mount.c:2143:server_put_super()) no obd lsfull-OST0062
LustreError: 45333:0:(obd_mount.c:1418:lustre_stop_osp()) Can not find osp-on-ost lsfull-MDT0000-osp-OST0062
LustreError: 45333:0:(obd_mount.c:2158:server_put_super()) lsfull-OST0062: Fail to stop osp-on-ost!
Lustre: server umount lsfull-OST0062 complete
LustreError: 45333:0:(obd_mount.c:2990:lustre_fill_super()) Unable to mount  (-2)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;MGS:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;LustreError: 140-5: Server lsfull-OST0062 requested index 98, but that index is already in use. Use --writeconf to force
LustreError: 534:0:(mgs_llog.c:3005:mgs_write_log_target()) Can&apos;t get index (-98)
LustreError: 534:0:(mgs_handler.c:393:mgs_handle_target_reg()) Failed to write lsfull-OST0062 log (-98)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment>lustre-orion-2.3.54-6chaos</environment>
        <key id="16683">LU-2330</key>
            <summary>OST unable to complete registration with MGS</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="liwei">Li Wei</assignee>
                                    <reporter username="nedbass">Ned Bass</reporter>
                        <labels>
                            <label>llnl</label>
                    </labels>
                <created>Wed, 14 Nov 2012 19:11:49 +0000</created>
                <updated>Sat, 12 Jan 2019 03:48:14 +0000</updated>
                            <resolved>Sat, 12 Jan 2019 03:48:14 +0000</resolved>
                                    <version>Lustre 2.4.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="47850" author="pjones" created="Thu, 15 Nov 2012 11:05:56 +0000"  >&lt;p&gt;Alex will triage this one&lt;/p&gt;</comment>
                            <comment id="48061" author="bzzz" created="Tue, 20 Nov 2012 05:34:12 +0000"  >&lt;p&gt;I&apos;m going to reproduce this locally.. it seems due to high load MGS was not able to respond in time, so OST got a timeout and error, but MGS finally processed the request and registered the target.&lt;/p&gt;</comment>
                            <comment id="48470" author="bzzz" created="Wed, 28 Nov 2012 01:08:13 +0000"  >&lt;p&gt;Ned, can you clarify a bit.. after the very first try to mount OST0062 which did not succeed you found OST0062 in df output ?&lt;/p&gt;

&lt;p&gt;the first log claims &quot;Lustre: server umount lsfull-OST0062 complete&quot; which implies no remainings in df/mount output.&lt;/p&gt;</comment>
                            <comment id="48499" author="nedbass" created="Wed, 28 Nov 2012 14:39:51 +0000"  >&lt;p&gt;Alex, I updated the OST log in the description to include timestamps to give a better idea of the timing.&lt;/p&gt;

&lt;p&gt;Yes that was the confusing part of this bug.  df definitely showed OST0062 mounted.  It looks like a second attempt was made to start the OST at 12:32:54, after the initial attempt failed. The OST was then evicted by the MGS, and reconnected.  It was at this point that I logged in and ran df.&lt;/p&gt;

</comment>
                            <comment id="52664" author="sarah" created="Mon, 18 Feb 2013 21:02:10 +0000"  >&lt;p&gt;It seems this is another instance seen in zfs testing:&lt;br/&gt;
&lt;a href=&quot;https://maloo.whamcloud.com/test_sets/07920122-7788-11e2-987d-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/07920122-7788-11e2-987d-52540035b04c&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="239860" author="pjones" created="Sat, 12 Jan 2019 03:48:14 +0000"  >&lt;p&gt;Closing ancient ticket&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10490" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>End date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Thu, 30 Oct 2014 19:11:49 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvcaf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>5562</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10493" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>Start date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Wed, 14 Nov 2012 19:11:49 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    </customfields>
    </item>
</channel>
</rss>