<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:16:25 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-8309]  Checksum/erasure code of EAs to improve recovery of Lustre files</title>
                <link>https://jira.whamcloud.com/browse/LU-8309</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;In order to save some private informations of the file/object belonging to a&lt;br/&gt;
Lustre file, Lustre saves a series of extended attributes on inodes of lower&lt;br/&gt;
file system. All of the EAs are important for correct behavior of Lustre&lt;br/&gt;
functions and features. And some of the EAs are so critical that if these EAs&lt;br/&gt;
are lost or corrupted, all the data/metadata of the Lustre file is no longer&lt;br/&gt;
available.  For example, if the &quot;trusted.lov&quot; EA has an incorrect value, the&lt;br/&gt;
data of the Lustre file might point to a non-exist object or even worse, to&lt;br/&gt;
another file&apos;s data.&lt;/p&gt;

&lt;p&gt;Unfortunately, this situation could happen if a server or storage crashes on&lt;br/&gt;
Lustre. And it makes the situation worse that it is sometimes hard to&lt;br/&gt;
determine which component is the root cause of the inconsistency when&lt;br/&gt;
recovering the system. For example, a &quot;trusted.lov&quot; EA pointing to non-exist&lt;br/&gt;
object could means 1) the value of the EA is corrupted, or 2) the object on OST&lt;br/&gt;
has been removed although it shouldn&apos;t have. And when this happens, the LFSCK&lt;br/&gt;
mechanism which supporses to fix the inconsistency of Lustre file system online&lt;br/&gt;
might need to fix the problem based on wrong values of EAs. This attempt&lt;br/&gt;
obviously won&apos;t help.&lt;/p&gt;

&lt;p&gt;Because of these reasons, I am wondering whether a checksum/erasure code of the&lt;br/&gt;
Lustre EAs could be introduced to improve the situation. Following is the idea:&lt;/p&gt;

&lt;p&gt;1) A checksum/erasure code of the Lustre EAs (e.g. trusted.lov + trusted.lma&lt;br/&gt;
+ ...) will be calculated and saved as a new EA (e.g. &quot;trusted.mdt_checksum&quot;&lt;br/&gt;
and &quot;trusted.ost_checksum&quot;)  when the Lustre file is created. Since most of&lt;br/&gt;
(or all)the Lustre EAs will not be updated by normal file system operations on&lt;br/&gt;
the file, the EAs are almost immutable which means almost no performance&lt;br/&gt;
regression will be introduced (except maybe file creation).&lt;/p&gt;

&lt;p&gt;2) When the OST/MDT objects of a Lustre file is accessed/repaired, the&lt;br/&gt;
checksum/erasure code could be used to check (and fix if using erasure code)&lt;br/&gt;
the EAs.&lt;/p&gt;

&lt;p&gt;3) When the Lustre EAs are updated, the checksum/erasure code will be updated.&lt;br/&gt;
As said before, this won&apos;t happen frequently. And if some Lustre EAs change&lt;br/&gt;
too frequently (e.g. trusted.hsm when HSM is under heavy use), we could&lt;br/&gt;
exclude those EAs from the checksum. Thus, filter flags could be specified to&lt;br/&gt;
include only part of the Lustre EAs.&lt;/p&gt;

&lt;p&gt;4) The checksum/erasure code of the MDT EA (i.e. &quot;trusted.mdt_checksum&quot;) will&lt;br/&gt;
also be saved on OST objects that belongs to the same Lustre file. In this way,&lt;br/&gt;
LFSCK could use the checksum to check the consistency of the file between OSTs&lt;br/&gt;
and MDT. If checksum/erasure code of the MDT EA is inconsistent between MDT and&lt;br/&gt;
OSTs, the LFSCK needs to either smartly determine which one is broken or just&lt;br/&gt;
leave it along to manual decision. And ideally, this file should becomes&lt;br/&gt;
readonly to prevent any further corruption.&lt;/p&gt;

&lt;p&gt;5) A series of ultilities should be provided for better recovering of the&lt;br/&gt;
Lustre files, including checksum/erasure code of EAs. Given the fact that&lt;br/&gt;
Lustre is so complex, and is still evolving rapidly, it is ideal but not&lt;br/&gt;
currently ture that LFSCK is able to fix all of the problems online without&lt;br/&gt;
any manual intervention. It is not a rare condition that the Lustre file&lt;br/&gt;
system needs to be recovered offline directly on lower file system (i.e.&lt;br/&gt;
ldiskfs/zfs). And the checksum/erasure code of EAs would make it harder to fix&lt;br/&gt;
a broken file offline since the changing values of the EAs needs to be&lt;br/&gt;
consistent with the checksum/erasure code. A lot of tools and scripts should&lt;br/&gt;
be provided for this purpose even if LFSCK is doing well, because, as have&lt;br/&gt;
been proven, userspace tools are much more flexible than online mechanism when&lt;br/&gt;
recovering data. Also, for online recover, LFSCK should provide interfaces&lt;br/&gt;
to administrators to make decisions manually on the recovering of the file&lt;br/&gt;
system.&lt;/p&gt;

&lt;p&gt;We could use similar mechanism from lower file system, for example, the&lt;br/&gt;
metadata checksum of ext4. However, the Lustre level checksum of EAs still has&lt;br/&gt;
some advantages. First of all, the selected Lustre EAs are almost constant,&lt;br/&gt;
that means the performance regression is likely to be minimum. And also, this&lt;br/&gt;
design doesn&apos;t depend on any internal feature of the lower file system, thus it&lt;br/&gt;
can be used on both ZFS and ldiskfs.&lt;/p&gt;</description>
                <environment></environment>
        <key id="37690">LU-8309</key>
            <summary> Checksum/erasure code of EAs to improve recovery of Lustre files</summary>
                <type id="2" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11311&amp;avatarType=issuetype">New Feature</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="6" iconUrl="https://jira.whamcloud.com/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="10200">Won&apos;t Do</resolution>
                                        <assignee username="lixi_wc">Li Xi</assignee>
                                    <reporter username="lixi">Li Xi</reporter>
                        <labels>
                    </labels>
                <created>Tue, 21 Jun 2016 03:58:51 +0000</created>
                <updated>Fri, 17 Aug 2018 00:49:16 +0000</updated>
                            <resolved>Fri, 17 Aug 2018 00:49:16 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="163272" author="pjones" created="Fri, 26 Aug 2016 16:40:39 +0000"  >&lt;p&gt;Thanks for the suggestion Li Xi&lt;/p&gt;</comment>
                            <comment id="216531" author="adilger" created="Sun, 17 Dec 2017 09:07:45 +0000"  >&lt;p&gt;ZFS already does metadata checksums, and the &lt;tt&gt;metadata_csum&lt;/tt&gt; feature is now available for ext4.  It would probably make sense to fix up the issues with &lt;tt&gt;metadata_csum&lt;/tt&gt; and &lt;tt&gt;dirdata&lt;/tt&gt;, and enable &lt;tt&gt;metadata_csum&lt;/tt&gt; for ldiskfs once kernels support it.&lt;/p&gt;</comment>
                            <comment id="232099" author="lixi_wc" created="Fri, 17 Aug 2018 00:49:01 +0000"  >&lt;p&gt;Using checksum of ZFS/Ext4 might be better idea, So closing this ticket.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzyf4v:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>