<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:51:27 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-12310] MDT Device-level Replication/Mirroring</title>
                <link>https://jira.whamcloud.com/browse/LU-12310</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;During a discussion a lunch today at LUG we were talking about the work being done on DOM/DNE/PFL/FLR. We were also talking about Lustre becoming more than just a scratch file system and it occurred to me that one thing that really hampers that concept is the vulnerability of metadata in its present state.&#160;&lt;/p&gt;

&lt;p&gt;I don&apos;t recall metadata replication ever being mentioned but I think that I would be a valuable feature to be explored.&#160;&lt;/p&gt;</description>
                <environment></environment>
        <key id="55661">LU-12310</key>
            <summary>MDT Device-level Replication/Mirroring</summary>
                <type id="2" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11311&amp;avatarType=issuetype">New Feature</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="pjones">Peter Jones</assignee>
                                    <reporter username="jamervi">Joe Mervini</reporter>
                        <labels>
                            <label>LMR</label>
                    </labels>
                <created>Thu, 16 May 2019 19:22:06 +0000</created>
                <updated>Sat, 15 Apr 2023 22:40:35 +0000</updated>
                                                                                <due></due>
                            <votes>0</votes>
                                    <watches>9</watches>
                                                                            <comments>
                            <comment id="247298" author="adilger" created="Thu, 16 May 2019 21:02:13 +0000"  >&lt;p&gt;We&apos;ve discussed this internally and worked up an initial design, but ended up deciding against that implementation. We have not had resources to work up a new design and work on the implementation.&lt;/p&gt;

&lt;p&gt;In the meantime, my recommendation would be to use MD-RAID/dm-mirror (or hardware-based mirror) for ldiskfs, or ZFS VDEV mirror to replicate the MDT storage across nodes (assuming non-shared storage is the goal), and continue to use failover to manage which MDS is exporting the MDT. The MD-RAID or ZFS VDEV would use 2x or 3x devices per mirror (each one in a separate node), and a network storage transport like NVMe-over-fabrics, SRP, or iSCSI. With modern storage transports, local storage and remote storage is equally fast, and the upper-layer RAID handles the replication and recovery of the storage devices in the same manner as local disks.&lt;/p&gt;

&lt;p&gt;This approach is described for ldiskfs at &lt;a href=&quot;http://wiki.lustre.org/images/e/e7/LUG-2012-SRP_Mirroring-UFL.pdf&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;High Availability Lustre Using SRP-mirrored LUNs&lt;/a&gt;, and similar approaches have been discussed for ZFS, but I don&apos;t have a link handy.&lt;/p&gt;</comment>
                            <comment id="320899" author="adilger" created="Wed, 15 Dec 2021 01:00:54 +0000"  >&lt;p&gt;I think that parts of this work could be split into some smaller features/implementation tasks in order to reduce the amount of effort needed to get something usable out of the development:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;improvements to performance of distributed transactions (&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7426&quot; title=&quot;DNE3: Current llog format for remote update llog&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7426&quot;&gt;LU-7426&lt;/a&gt;) so that synchronous/ordered disk transactions are not needed. This would be very useful independent of MDT mirroring to improve creation of remote and striped directories, cross-MDT rename/link, etc.&lt;/li&gt;
	&lt;li&gt;improve handling of distributed recovery when an MDT is offline (e.g. save transaction logs, don&apos;t block filesystem access for unrelated MDTs (&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9206&quot; title=&quot;DNE - allow partial access to striped dir if one of the MDTs is unavailable&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9206&quot;&gt;&lt;del&gt;LU-9206&lt;/del&gt;&lt;/a&gt; ++)&lt;/li&gt;
	&lt;li&gt;fault-tolerance for services that run on MDT0000, such as the quota master, FLDB, MGT, etc.&lt;/li&gt;
	&lt;li&gt;scalability of &lt;tt&gt;REMOTE_PARENT_DIR&lt;/tt&gt; to allow handling more disconnected filesystem objects (&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10329&quot; title=&quot;DNE3: REMOTE_PARENT_DIR scalability&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10329&quot;&gt;LU-10329&lt;/a&gt;)&lt;/li&gt;
	&lt;li&gt;mirroring of top-level directories in the filesystem (initiallly &lt;tt&gt;ROOT/&lt;/tt&gt;, and then first level of subdirectories below it, etc.) so that the filesystem is &quot;more&quot; available if MDT0000 or other MDTs in a top-level striped directory are unavailable. This would &lt;em&gt;not&lt;/em&gt; include mirroring of the regular inodes for files, only the directories themselves. Since the top-level directories are changed relatively less often than lower-level subdirectories, some extra overhead creating directories at this level is worthwhile for higher availability.
	&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
		&lt;li&gt;mirrored directories would be similar to striped directories, but each directory entry name could be looked up in at least two different directory shards (e.g. &lt;tt&gt;lmv_locate_tgt_by_name()&lt;/tt&gt;, ...+1, ...+2), depending on replication level, allowing the target to be found even if one MDT is offline (&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9206&quot; title=&quot;DNE - allow partial access to striped dir if one of the MDTs is unavailable&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9206&quot;&gt;&lt;del&gt;LU-9206&lt;/del&gt;&lt;/a&gt;)&lt;/li&gt;
		&lt;li&gt;each mirrored directory entry would contain two or more different FIDs referencing inodes on separate MDTs (for subdirectories), or the same FID (for regular files), similar to how ZFS Block Pointers can be referenced by and directly reference up to 3x different DVAs (block numbers) that have copies of the same data&lt;/li&gt;
		&lt;li&gt;each mirrored directory inode would have the full layout of all shards in the directory, and client can determine which shard to use for lookup&lt;/li&gt;
		&lt;li&gt;updates to the mirrored directory would always need distributed transactions that inserted or removed the redundant dirents together&lt;/li&gt;
		&lt;li&gt;normal DNE distributed transaction recovery would apply to recover incomplete transactions if an MDT is offline during an update&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
&lt;/ul&gt;
</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="49558">LU-10329</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="33134">LU-7426</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="21877">LU-4215</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="32729">LU-7319</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="33135">LU-7427</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="44700">LU-9206</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="75588">LU-16742</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="43434" name="Lustre_Metadata_Redundancy-202112.pptx" size="407483" author="adilger" created="Wed, 27 Apr 2022 20:41:43 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00gf3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>