<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:02:53 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-14] live replacement of OST</title>
                <link>https://jira.whamcloud.com/browse/LU-14</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Hot replace:&lt;br/&gt;
1 - Disable your OST on MDT (lctl deactivate)&lt;br/&gt;
2 - Empty your OST&lt;br/&gt;
3 - Backup the magic files (last_rcvd, LAST_ID, CONFIG/*)&lt;br/&gt;
4 - Deactivate the OST on all clients also.&lt;br/&gt;
5 - Unmount the OST&lt;br/&gt;
6 - Replace, reformat using same index&lt;br/&gt;
7 - Put back the backup magic files.&lt;br/&gt;
8 - Restart the OST.&lt;br/&gt;
9 - Activate the OST everywhere.&lt;/p&gt;

&lt;p&gt;It probably wouldn&apos;t be impossible to have a new OST gracefully replace an old one, if that is what the administrator wanted.  Some &quot;special&quot; action would need to be taken on the OST and/or MDT to ensure that this is what the admin wanted, instead of e.g. accidentally inserting some other OST with the same index and corrupting the filesystem because of duplicate object IDs, or not being able to access existing objects on the &quot;real&quot; OST at that index.&lt;/p&gt;

&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;the new OST would be best off to start allocating objects at the LAST_ID&lt;br/&gt;
  of the old OST, so that there is no risk of confusion between objects&lt;/li&gt;
	&lt;li&gt;the MDT contains the old LAST_ID in it&apos;s lov_objids file, and it sends this&lt;br/&gt;
  to the OST at connection time, this is no problem&lt;/li&gt;
	&lt;li&gt;currently the new OST will refuse to allow the MDT to connect, because it&lt;br/&gt;
  detects that the old LAST_ID value from the MDT is inconsistent with its&lt;br/&gt;
  own value&lt;/li&gt;
	&lt;li&gt;it would be relatively straight forward to have the OST detect if the local&lt;br/&gt;
  LAST_ID value was &quot;new&quot; and use the MDT value instead&lt;/li&gt;
	&lt;li&gt;the danger is if the LAST_ID file was lost for some reason (e.g. corruption&lt;br/&gt;
  causes e2fsck to erase it).  in that case, the OST startup code should be&lt;br/&gt;
  smart enough to regenerate LAST_ID based on walking the object directories,&lt;br/&gt;
  which would also avoid the need to do this in e2fsck/lfsck (which can only&lt;br/&gt;
  run offline)&lt;/li&gt;
	&lt;li&gt;in cases where the on-disk LAST_ID is much lower than the MDT-supplied&lt;br/&gt;
  value, the OST should just skip precreation of all the intermediate objects&lt;br/&gt;
  and just start using the new MDT value&lt;/li&gt;
	&lt;li&gt;the only other thing is to avoid the case where a &quot;new&quot; OST is accidentally&lt;br/&gt;
  assigned the same index, when that isn&apos;t what is wanted.  There needs to be&lt;br/&gt;
  some way to &quot;prime&quot; the new OST (that is NOT the default for a newly&lt;br/&gt;
  formatted OST), or conversely tell the MDT that it should signal the new&lt;br/&gt;
  OST to take the place of the old one, so that there are not any mistakes&lt;/li&gt;
&lt;/ul&gt;
</description>
                <environment></environment>
        <key id="10110">LU-14</key>
            <summary>live replacement of OST</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="yong.fan">nasf</assignee>
                                    <reporter username="laisiyao">Lai Siyao</reporter>
                        <labels>
                    </labels>
                <created>Tue, 16 Nov 2010 18:29:29 +0000</created>
                <updated>Thu, 24 Sep 2015 09:46:13 +0000</updated>
                            <resolved>Mon, 23 Dec 2013 14:00:58 +0000</resolved>
                                    <version>Lustre 2.5.0</version>
                                    <fixVersion>Lustre 2.6.0</fixVersion>
                    <fixVersion>Lustre 2.4.2</fixVersion>
                    <fixVersion>Lustre 2.5.1</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>11</watches>
                                                                            <comments>
                            <comment id="10196" author="laisiyao" created="Fri, 19 Nov 2010 06:09:57 +0000"  >&lt;p&gt;Did some tests, finished 30% code.&lt;/p&gt;</comment>
                            <comment id="10206" author="bschubert" created="Fri, 19 Nov 2010 15:53:48 +0000"  >&lt;p&gt;I just noticed this here, while it is still easy to browse through all the open issues &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;

&lt;p&gt;Just for your information, the offline approach: &lt;a href=&quot;https://bugzilla.lustre.org/show_bug.cgi?id=22734&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://bugzilla.lustre.org/show_bug.cgi?id=22734&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="10210" author="laisiyao" created="Fri, 19 Nov 2010 19:30:07 +0000"  >&lt;p&gt;Thanks for pointing this out, which explains a lot of details on LAST_ID recovery!&lt;/p&gt;</comment>
                            <comment id="10269" author="laisiyao" created="Sun, 5 Dec 2010 05:36:16 +0000"  >&lt;p&gt;Code is ready, and in inspection.&lt;/p&gt;</comment>
                            <comment id="46289" author="adilger" created="Tue, 9 Oct 2012 19:02:55 +0000"  >&lt;p&gt;It probably makes sense for Fan Yong to implement this as part of the LFSCK project, so that an OST can recover from some common forms of corruption.&lt;/p&gt;

&lt;p&gt;The existing patch is at &lt;a href=&quot;http://review.whamcloud.com/141&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/141&lt;/a&gt;, but needs to be refreshed.&lt;/p&gt;</comment>
                            <comment id="49797" author="yong.fan" created="Sat, 29 Dec 2012 21:53:47 +0000"  >&lt;p&gt;It will be considered in LFSCK phase II.&lt;/p&gt;</comment>
                            <comment id="57865" author="adilger" created="Tue, 7 May 2013 21:51:37 +0000"  >&lt;p&gt;In discussions during &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2886&quot; title=&quot;create local files using local_storage library&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2886&quot;&gt;&lt;del&gt;LU-2886&lt;/del&gt;&lt;/a&gt; patch &lt;a href=&quot;http://review.whamcloud.com/6199&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/6199&lt;/a&gt; inspection, it was proposed to improve the on-disk format of the LAST_ID file:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;struct last_id_ondisk {
        __u64 lio_next_oid;
        __u32 lio_magic;
        __u32 lio_cksum;
};
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and ofd_seq_load() (maybe rename this to ofd_seq_last_oid_read()?), ofd_seq_last_oid_write() and ll_recover_lost_found_objs.c should updated to handle both an old 8-byte LAST_ID file, and this new 16-byte format.  If the on-disk LAST_ID file is corrupted (bad lio_magic, bad lio_cksum, lio_next_oid &amp;gt; OBIF_MAX_OID for seq != 0, lio_next_oid &amp;gt; IDIF_MAX_OID for fid_seq == 0) it would be treated the same as if it where missing, and this LAST_ID recovery code should traverse the object directories for that group and rebuild the LAST_ID file.&lt;/p&gt;

&lt;p&gt;This would avoid the case where the LAST_ID file has some random garbage in it and causes an inconsistency between the MDT&apos;s and OST&apos;s understanding of what the next valid OID is.&lt;/p&gt;</comment>
                            <comment id="60173" author="adilger" created="Fri, 7 Jun 2013 15:26:49 +0000"  >&lt;p&gt;The one missing part of this process is to be able to use a newly formatted OST in place if an old one with the same index if the last_rcvd and mountdata files are not accessible. The last_rcvd file will be recreated at mount time with default parameters (should normally be ok), but mkfs.lustre will create the mountdata file with the LDD_F_VIRGIN flag always set. It should be possible to add a --replace option to mkfs.lustre so that the MGS doesn&apos;t refuse the OST to connect because the index is in use. &lt;/p&gt;</comment>
                            <comment id="65018" author="adilger" created="Fri, 23 Aug 2013 23:02:34 +0000"  >&lt;p&gt;I&apos;ve pushed &lt;a href=&quot;http://review.whamcloud.com/7443&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/7443&lt;/a&gt; for &quot;&lt;tt&gt;mkfs.lustre --replace&lt;/tt&gt;&quot;, and the OST precreating only recent objects if the MDT lov_objid is much larger than the OST LAST_ID.  This replaces the old patch in &lt;a href=&quot;http://review.whamcloud.com/141&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/141&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="67774" author="pjones" created="Thu, 26 Sep 2013 22:18:49 +0000"  >&lt;p&gt;So is there still further work to complete for this ticket or does the recent landing mean that this ticket can be closed?&lt;/p&gt;</comment>
                            <comment id="68484" author="yong.fan" created="Mon, 7 Oct 2013 07:08:50 +0000"  >&lt;p&gt;We still need the patch for rebuilding LAST_ID file:&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/6997&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/6997&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="70617" author="bogl" created="Mon, 4 Nov 2013 15:55:19 +0000"  >&lt;p&gt;backport to b2_4: &lt;a href=&quot;http://review.whamcloud.com/8159&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/8159&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="72105" author="yujian" created="Fri, 22 Nov 2013 08:00:23 +0000"  >&lt;p&gt;Patch landed on Lustre b2_4 branch.&lt;/p&gt;</comment>
                            <comment id="72282" author="yujian" created="Tue, 26 Nov 2013 04:10:31 +0000"  >&lt;blockquote&gt;&lt;p&gt;backport to b2_4: &lt;a href=&quot;http://review.whamcloud.com/8159&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/8159&lt;/a&gt;&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;The new-added conf-sanity test 69 introduced regression failure on interop testing:&lt;br/&gt;
&lt;a href=&quot;https://maloo.whamcloud.com/test_sets/4c4bf322-54af-11e3-9029-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/4c4bf322-54af-11e3-9029-52540035b04c&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The patch also introduced conf-sanity test 72 and 73 regression failures:&lt;br/&gt;
&lt;a href=&quot;https://maloo.whamcloud.com/test_sets/4c4bf322-54af-11e3-9029-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/4c4bf322-54af-11e3-9029-52540035b04c&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://maloo.whamcloud.com/test_sets/1ee70c90-4d17-11e3-9c23-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/1ee70c90-4d17-11e3-9c23-52540035b04c&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://maloo.whamcloud.com/test_sets/acaa49a4-4c5c-11e3-826a-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/acaa49a4-4c5c-11e3-826a-52540035b04c&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://maloo.whamcloud.com/test_sets/e3329b42-473a-11e3-89d8-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/e3329b42-473a-11e3-89d8-52540035b04c&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://maloo.whamcloud.com/test_sets/61470bfa-45f4-11e3-810a-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/61470bfa-45f4-11e3-810a-52540035b04c&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Before Lustre b2_4 build #57 (which contains the patch), conf-sanity test 72 and 73 always passed on Lustre b2_4 branch.&lt;/p&gt;</comment>
                            <comment id="72288" author="yujian" created="Tue, 26 Nov 2013 07:43:32 +0000"  >&lt;p&gt;The new-added conf-sanity test 69 also introduced regression failure on ZFS testing:&lt;br/&gt;
&lt;a href=&quot;https://maloo.whamcloud.com/test_sets/57656c1e-5605-11e3-8e94-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/57656c1e-5605-11e3-8e94-52540035b04c&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="72385" author="yujian" created="Wed, 27 Nov 2013 13:08:19 +0000"  >&lt;p&gt;Patches for adding Lustre version check codes into conf-sanity test 69:&lt;br/&gt;
master branch: &lt;a href=&quot;http://review.whamcloud.com/8411&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/8411&lt;/a&gt;&lt;br/&gt;
b2_5 branch: &lt;a href=&quot;http://review.whamcloud.com/8413&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/8413&lt;/a&gt;&lt;br/&gt;
b2_4 branch: &lt;a href=&quot;http://review.whamcloud.com/8412&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/8412&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="73523" author="adilger" created="Fri, 13 Dec 2013 22:36:42 +0000"  >&lt;p&gt;Patch &lt;a href=&quot;http://review.whamcloud.com/6997&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/6997&lt;/a&gt; is implementing LAST_ID rebuild after corruption, and also handles the case where the MDT and OST are out of sync about the LAST_ID value.&lt;/p&gt;</comment>
                            <comment id="74025" author="pjones" created="Mon, 23 Dec 2013 14:00:58 +0000"  >&lt;p&gt;Closing as remaining work is tracked under &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1267&quot; title=&quot;LFSCK II: MDT-OST consistency check/repair&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1267&quot;&gt;&lt;del&gt;LU-1267&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="21992">LU-4246</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="13763">LU-1267</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="19377">LU-3458</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="16091">LU-2018</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="20092">LU-3668</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="19767">LU-3575</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="26937">LU-5722</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="21832">LU-4204</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="10694">LU-266</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                    <customfield id="customfield_10020" key="com.atlassian.jira.plugin.system.customfieldtypes:float">
                        <customfieldname>Bugzilla ID</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>24128.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvnwn:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>7701</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>