<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:10:02 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-14470] striped directory layout mismatch after failover</title>
                <link>https://jira.whamcloud.com/browse/LU-14470</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[15965.280047] LustreError: 23882:0:(llite_lib.c:1442:ll_update_lsm_md()) lustre: [0x200008107:0x10653:0x0] dir layout mismatch:
[15965.283219] LustreError: 23882:0:(lustre_lmv.h:99:lsm_md_dump()) magic 0xcd20cd0 stripe count 2 master mdt 0 hash type 0x2 version 0 migrate offset 0 migrate hash 0x0 pool 
[15965.287312] LustreError: 23882:0:(lustre_lmv.h:103:lsm_md_dump()) stripe[0] [0x20000dec0:0x7:0x0]
[15965.289569] LustreError: 23882:0:(lustre_lmv.h:103:lsm_md_dump()) stripe[1] [0x24000e690:0x7:0x0]
[15965.291807] LustreError: 23882:0:(lustre_lmv.h:99:lsm_md_dump()) magic 0xcd20cd0 stripe count 2 master mdt 0 hash type 0x2 version 0 migrate offset 0 migrate hash 0x0 pool 
[15965.295841] LustreError: 23882:0:(lustre_lmv.h:103:lsm_md_dump()) stripe[0] [0x20000e690:0x3:0x0]
[15965.298063] LustreError: 23882:0:(lustre_lmv.h:103:lsm_md_dump()) stripe[1] [0x24000ee60:0x1:0x0]
[15965.310206] LustreError: 23884:0:(lustre_lmv.h:99:lsm_md_dump()) magic 0xcd20cd0 stripe count 2 master mdt 0 hash type 0x2 version 0 migrate offset 0 migrate hash 0x0 pool 
[15965.314355] LustreError: 23884:0:(lustre_lmv.h:103:lsm_md_dump()) stripe[0] [0x20000dec0:0x7:0x0]
[15965.316652] LustreError: 23884:0:(lustre_lmv.h:103:lsm_md_dump()) stripe[1] [0x24000e690:0x7:0x0]
[15965.318881] LustreError: 23884:0:(lustre_lmv.h:99:lsm_md_dump()) magic 0xcd20cd0 stripe count 2 master mdt 0 hash type 0x2 version 0 migrate offset 0 migrate hash 0x0 pool 
[15965.322888] LustreError: 23884:0:(lustre_lmv.h:103:lsm_md_dump()) stripe[0] [0x20000e690:0x3:0x0]
[15965.325121] LustreError: 23884:0:(lustre_lmv.h:103:lsm_md_dump()) stripe[1] [0x24000ee60:0x1:0x0]
[15965.340329] LustreError: 23886:0:(lustre_lmv.h:99:lsm_md_dump()) magic 0xcd20cd0 stripe count 2 master mdt 0 hash type 0x2 version 0 migrate offset 0 migrate hash 0x0 pool 
[15965.344411] LustreError: 23886:0:(lustre_lmv.h:103:lsm_md_dump()) stripe[0] [0x20000dec0:0x7:0x0]
[15965.346655] LustreError: 23886:0:(lustre_lmv.h:103:lsm_md_dump()) stripe[1] [0x24000e690:0x7:0x0]
[15965.348866] LustreError: 23886:0:(lustre_lmv.h:99:lsm_md_dump()) magic 0xcd20cd0 stripe count 2 master mdt 0 hash type 0x2 version 0 migrate offset 0 migrate hash 0x0 pool 
[15965.352827] LustreError: 23886:0:(lustre_lmv.h:103:lsm_md_dump()) stripe[0] [0x20000e690:0x3:0x0]
[15965.355133] LustreError: 23886:0:(lustre_lmv.h:103:lsm_md_dump()) stripe[1] [0x24000ee60:0x1:0x0]
[15965.357439] LustreError: 23886:0:(llite_lib.c:2471:ll_prep_inode()) new_inode -fatal: rc -22
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Create request is replayed but MDS creates striped directory shards with new  fids, so client fails layout check.&lt;br/&gt;
It can be  reproduced by recovery-mds-scale or custom test case, I&apos;ll attach it later.&lt;/p&gt;

&lt;p&gt;For me it looks like a design flaw. Client should replay create request with previously allocated fids and MDS should recreate directory shards using client fids.&lt;/p&gt;</description>
                <environment></environment>
        <key id="63015">LU-14470</key>
            <summary>striped directory layout mismatch after failover</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="laisiyao">Lai Siyao</assignee>
                                    <reporter username="askulysh">Andriy Skulysh</reporter>
                        <labels>
                    </labels>
                <created>Tue, 23 Feb 2021 20:35:03 +0000</created>
                <updated>Thu, 28 Sep 2023 07:59:38 +0000</updated>
                                                            <fixVersion>Lustre 2.16.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="292801" author="gerrit" created="Tue, 23 Feb 2021 20:36:10 +0000"  >&lt;p&gt;Andriy Skulysh (c17819@cray.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/41731&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/41731&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14470&quot; title=&quot;striped directory layout mismatch after failover&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14470&quot;&gt;LU-14470&lt;/a&gt; test: striped dir layout mismatch after failover&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 48d6ff02041dd6d8124d364f784e31ec6517bf7c&lt;/p&gt;</comment>
                            <comment id="294998" author="laisiyao" created="Mon, 15 Mar 2021 13:02:09 +0000"  >&lt;p&gt;Dual or multiple MDT failure recovery is hard to support. &lt;/p&gt;

&lt;p&gt;For striped directory creation, the stripe FIDs are allocated by MDTs, and these FIDs may not be usable in recovery(meta sequence not allocated yet), or they may be used by other objects (FID allocated by other operations in recovery).&lt;/p&gt;

&lt;p&gt;But if we don&apos;t reuse the FIDs, there are other problems, even if we update directory layout after replay (instead of reporting error currently), the following touch under this striped directory can&apos;t be replayed because its parent has changed.&lt;/p&gt;</comment>
                            <comment id="295053" author="laisiyao" created="Tue, 16 Mar 2021 01:51:37 +0000"  >&lt;p&gt;Some distributed transaction replays need all the information stored in update logs, because these transactions allocated FID on MDTs, e.g. striped directory creation and directory migration (to striped directory). Such operation replay can&apos;t be done from client side.&lt;/p&gt;

&lt;p&gt;One way to improve this is to store update logs on more MDTs than those involved, e.g., if a striped directory is created on MDT0 and MDT1, it also stores update logs on MDT2 (this can be configured, and may have more backups), so upon recovery, all the information can be obtained from MDT2 and successfully replayed.&lt;/p&gt;</comment>
                            <comment id="319260" author="adilger" created="Fri, 26 Nov 2021 22:12:20 +0000"  >&lt;p&gt;Since the FIDs for the MDT directory stripes are allocated by the MDS, I think there are two options here:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;have the client replay the mkdir with the original MDT FIDs in the LMV xattr, the same way that client replay of regular files includes the OST object FIDs.  That ensures the recreated directory is exactly the same as the original and avoids failures in later replay operations.  I think this would be my preferred solution, but it potentially exposes the internal filesystem structure to inconsistency if the client specifies wrong FIDs during replay (though see &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15250&quot; title=&quot;RPC Replay Signature&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15250&quot;&gt;LU-15250&lt;/a&gt;).&lt;/li&gt;
	&lt;li&gt;the client should use the FID of the &lt;b&gt;master&lt;/b&gt; directory for the parent FID (not the shard FID) when replaying the file create, and then lookup the shard FID from the directory during replay.  The client doing mkdir originally selected the master directory FID and will use it for replay, so it will not change.  That works around the problem with the shard FIDs changing during replay, but may still have other problems later if the shard FIDs are also used for other operations (though they might be fixed similarly to use the master FID).  I think this adds complexity to the client replay process, but has the benefit that the client doesn&apos;t know as much detail about the shard layout, which &lt;em&gt;might&lt;/em&gt; be helpful in the future.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;I do not think that storing the update logs on other MDTs is a good solution to this problem for several reasons:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;this makes distributed transactions even slower than they currently are, since it involves at least one extra MDT in each mkdir&lt;/li&gt;
	&lt;li&gt;there may not even &lt;b&gt;be&lt;/b&gt; an additional MDT to store the update log (e.g. in a two-MDT system)&lt;/li&gt;
	&lt;li&gt;the additional MDT used to store the update log may &lt;b&gt;also&lt;/b&gt; fail at the same time (e.g. power loss of server rack), so is not really solving the problem&lt;br/&gt;
It is better to have a solution that more closely follows the existing recovery mechanisms that Lustre uses (e.g. replay with LMV xattr) instead of adding a different mechanism.&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="320910" author="laisiyao" created="Wed, 15 Dec 2021 07:52:28 +0000"  >&lt;p&gt;After some tests, I found this should be a test script issue. The sync-on-lock-cancel mechanism has eliminated dependency between striped directory creation and sub file creation:&lt;br/&gt;
1. striped directory creation will hold UPDATE locks of all stripes after creation, the 1st stripe UPDATE lock is a local lock, while 2nd to the last stripe UPDATE locks are remote locks.&lt;br/&gt;
2. sub file creation will getattr on parent directory first, which will revoke UPDATE locks held held in step 1.&lt;br/&gt;
3. since the 1st stripe UPDATE lock is a local lock, it will be silently dropped, while for 2nd to last stripe UPDATE locks, they are remote locks, this will trigger commit-on-sharing, which guarantees 2nd to last stripes creation and update logs are committed to disk, after this the parent directory can always be recovered from update logs.&lt;br/&gt;
4. if all involved MDTs are rebooted at this moment, the striped directory will be recovered from update logs.&lt;br/&gt;
5. if sub file is located on the same MDT of the parent directory (also the 1st stripe), its creation will be replayed after parent directory replay, it&apos;s okay; otherwise the sub file creation replay may be before parent directory replay because they are replayed on different MDTs, however since 2nd to last stripe has been committed to disk already, even if the parent directory is not recovered from update logs yet, the according stripe is there already, and sub file creation replay will succeed.&lt;/p&gt;

&lt;p&gt;In summary, if two operation replays have dependency, but they may be replayed in random order (because they are replayed on different MDTs), this dependency should be eliminated (either by commit-on-sharing or by sync-on-lock-cancel). And in this case, there is no need to implement striped directory client replay, because if there is no subsequent operation depending on it, it can be a fresh new creation, while if there exists dependency, this dependency will be eliminated by sync-on-lock-cancel.&lt;/p&gt;

&lt;p&gt;BTW, I&apos;m afraid this can&apos;t be tested with replay_barrier() which will fail step 2, so the test failed in the end. This should probably be tested on real machines.&lt;/p&gt;

&lt;p&gt;I will add a test to verify sync-on-lock-cancel is triggered for sub file creation under striped directory.&lt;/p&gt;</comment>
                            <comment id="321023" author="gerrit" created="Thu, 16 Dec 2021 09:26:23 +0000"  >&lt;p&gt;&quot;Lai Siyao &amp;lt;lai.siyao@whamcloud.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/45870&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/45870&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14470&quot; title=&quot;striped directory layout mismatch after failover&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14470&quot;&gt;LU-14470&lt;/a&gt; test: add striped directory creation test&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 9d88d35f954ff6cf93d8f53894a54e05fe39efad&lt;/p&gt;</comment>
                            <comment id="335165" author="gerrit" created="Wed, 18 May 2022 08:22:29 +0000"  >&lt;p&gt;&quot;Lai Siyao &amp;lt;lai.siyao@whamcloud.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/47385&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/47385&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14470&quot; title=&quot;striped directory layout mismatch after failover&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14470&quot;&gt;LU-14470&lt;/a&gt; dne: striped mkdir replay&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: f7d4bb673b3cabce4e70029e2ff50f56e94a40ca&lt;/p&gt;</comment>
                            <comment id="387495" author="gerrit" created="Thu, 28 Sep 2023 07:59:38 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/47385/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/47385/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14470&quot; title=&quot;striped directory layout mismatch after failover&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14470&quot;&gt;LU-14470&lt;/a&gt; dne: striped mkdir replay by client request&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: a2e997f0bed0ae4cfdcf6d73f8a79e3d23d28a2f&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="73365">LU-16336</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i01ngf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>