<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:47:23 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4966] handle server registration errors gracefully</title>
                <link>https://jira.whamcloud.com/browse/LU-4966</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;If some server registered successfully on MGS, but it got an error registration reply (MGS revoking config locks timeout or other networking problems), then the server will always get -EADDRINUSE error when it try to register next time, because the server index has been occupied on MGS in the first registration.&lt;/p&gt;

&lt;p&gt;Current solution for above situation is to use writeconf option to force registration.&lt;/p&gt;

&lt;p&gt;We need to get this improved and make MGS able to handle this gracefully.&lt;/p&gt;
</description>
                <environment></environment>
        <key id="24409">LU-4966</key>
            <summary>handle server registration errors gracefully</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="niu">Niu Yawei</reporter>
                        <labels>
                            <label>llnl</label>
                    </labels>
                <created>Mon, 28 Apr 2014 08:11:58 +0000</created>
                <updated>Tue, 5 Dec 2023 00:37:29 +0000</updated>
                                                                                <due></due>
                            <votes>0</votes>
                                    <watches>8</watches>
                                                                            <comments>
                            <comment id="82587" author="niu" created="Mon, 28 Apr 2014 08:25:42 +0000"  >&lt;p&gt;I think if MGS save server UUID along with the server index, the it can tell if the registration (acquire for an occupied index) come from same server.&lt;br/&gt;
And looks MGS now keeps server index bitmap in memory only, it needs be saved in disk as well.&lt;/p&gt;</comment>
                            <comment id="82588" author="niu" created="Mon, 28 Apr 2014 08:27:10 +0000"  >&lt;p&gt;Andreas/Alex, any suggestions? Thanks.&lt;/p&gt;</comment>
                            <comment id="82605" author="bzzz" created="Mon, 28 Apr 2014 14:30:52 +0000"  >&lt;p&gt;this issue was mentioned by Chris in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1257&quot; title=&quot;OST registration snafu&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1257&quot;&gt;&lt;del&gt;LU-1257&lt;/del&gt;&lt;/a&gt; and at the moment it&apos;s not clear whether this specific issue is a major one for LLNL. it&apos;ll be hard to get this fixed in 2.6 due to amount of changes ?&lt;/p&gt;</comment>
                            <comment id="82639" author="adilger" created="Mon, 28 Apr 2014 17:15:38 +0000"  >&lt;p&gt;Alex, I think Niu was asking for ideas on how this might best be fixed.&lt;/p&gt;</comment>
                            <comment id="100010" author="liwei" created="Tue, 25 Nov 2014 06:54:13 +0000"  >&lt;p&gt;Niu&apos;s idea makes sense to me.  I spent some time experimenting with it.&lt;/p&gt;

&lt;p&gt;A real UUID (e.g., &quot;a53bc5ba-687b-4091-fb0b-61489785f247&quot;) could easily be generated by back-end-independent mkfs.lustre code and stored in a back-end-specific way (e.g., in ldiskfs &quot;mountdata&quot; or as a ZFS dataset property).  (ZFS has pool IDs, but those are pool properties and are only 64-bit.)&lt;/p&gt;

&lt;p&gt;Current master code always passes empty strings in mti_uuid.  It would be nice if real UUIDs could be packed into that field.  However, experiments showed:&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;MDT OSPs for OSTs would send MDS_CONNECTs to OSTs, because client_obd_setup() depends on &quot;OST&quot; in UUIDs to determine whether an OSP is for an OST or an MDT.&lt;/li&gt;
	&lt;li&gt;Clients would not be able to connect to MDTs, because mgs would put fake UUIDs (e.g., &quot;lustre-MDT0000_UUID&quot;) into MDT logs but real UUIDs into the client log.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;A possible solution is:&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;Start generating real UUIDs for any newly formatted targets.&lt;/li&gt;
	&lt;li&gt;Send real UUIDs via mti_uuid.&lt;/li&gt;
	&lt;li&gt;MGT checks and stores real UUIDs somewhere, but keeps putting fake UUIDs into logs.&lt;/li&gt;
	&lt;li&gt;&lt;b&gt;Newly formatted targets must talk with new mgs code.&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="100012" author="adilger" created="Tue, 25 Nov 2014 08:17:57 +0000"  >&lt;p&gt;There is space in the last_rcvd file to store the UUID, but that has the potential problem that this file may be deleted if there are problems with recovery. &lt;/p&gt;

&lt;p&gt;As for the OST detection in the connection cide, it would be possible to store the target type in the last byte of the UUID or similar (e.g. the ASCII &quot;O&quot; or &quot;M&quot;) and still make the rest of the UUID random. &lt;/p&gt;</comment>
                            <comment id="172131" author="spiechurski" created="Thu, 3 Nov 2016 08:41:44 +0000"  >&lt;p&gt;Was this issue somehow adressed in latest versions ?&lt;br/&gt;
We are still seeing this problem when installing configurations with large amount of OSTs (or when modifiying it after a writeconf) and have to take care manually of it by starting OSTs one after another.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="13698">LU-1257</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="47674">LU-9838</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="66684">LU-15112</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="65602">LU-14928</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="78648">LU-17240</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwl47:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>13736</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>