<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:28:12 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-9670] Advise on e2fsck fixing for OST backend</title>
                <link>https://jira.whamcloud.com/browse/LU-9670</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Recently our main production file system experienced a outage which has&lt;br/&gt;
required a run of e2fsck on the back end OSTs. During the run we encountered&lt;br/&gt;
some issues which might lead to data lost. We would like to ask Intel engineers&lt;br/&gt;
who have a better understanding of the ext4 filesystem to look at the logs&lt;br/&gt;
and report back what will be lost and how safe is it to repair. Their are&lt;br/&gt;
two logs attached to this ticket. One is for raw data and the other, bad_luns.out,&lt;br/&gt;
is the one we are most concern about.&lt;/p&gt;</description>
                <environment>Lustre 2.8 servers using ldiskfs in a RHEL6.9 environment. </environment>
        <key id="46728">LU-9670</key>
            <summary>Advise on e2fsck fixing for OST backend</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="yong.fan">nasf</assignee>
                                    <reporter username="simmonsja">James A Simmons</reporter>
                        <labels>
                    </labels>
                <created>Thu, 15 Jun 2017 15:14:45 +0000</created>
                <updated>Fri, 25 Aug 2017 20:38:10 +0000</updated>
                            <resolved>Wed, 23 Aug 2017 15:52:56 +0000</resolved>
                                    <version>Lustre 2.8.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>8</watches>
                                                                            <comments>
                            <comment id="199359" author="pjones" created="Thu, 15 Jun 2017 17:12:20 +0000"  >&lt;p&gt;Fan Yong&lt;/p&gt;

&lt;p&gt;Could you please advise on this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="199514" author="adilger" created="Fri, 16 Jun 2017 21:10:34 +0000"  >&lt;p&gt;James, there isn&apos;t necessarily anything in the e2fsck that looks like it is very unusual, if one assumes that the RAID controllers lost their cache. There may be more or less things to be fixed by e2fsck once the journal is replayed (assuming these e2fsck runs were done on a live filesystem). &lt;/p&gt;</comment>
                            <comment id="199527" author="yong.fan" created="Sat, 17 Jun 2017 01:51:41 +0000"  >&lt;p&gt;I have checked the hundreds of logs, although there are a lot of inconsistency, most of them are Quota accounting inconsistency, that are not fatal and will not cause data lose. In further, you can recheck the quota by force via reset enable quota after the Lustre mounted up:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;lctl conf_param $FSNAME.quota.{mdt,ost}=none
lctl conf_param $FSNAME.quota.{mdt,ost}=ug
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If you are afraid of losing any data, you can make some check in advance before e2fsck repairing. For example:&lt;/p&gt;

&lt;p&gt;1. Deleted inode XXX has zero dtime.&lt;br/&gt;
That means the inode XXX has zero nlink count, but its dtime is not set. You can get the FID-in-LMA via:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;debugfs -c -R &quot;stat &amp;lt;XXX&amp;gt;&quot; $OST_dev
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;With the FID, we can calculate its original namepath on the OST. It is expected that the original namepath should has been removed (because the target OST object was destroyed).&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;debugfs -c -R &quot;stat $namepath&quot; $OST_dev
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;But if unfortunately the original namepath is still there, then we can use &quot;debugfs sif &amp;lt;XXX&amp;gt; inodes_count 1&quot;, then e2fsck will either link it back to the original OST namespace. In fact, it is NOT recommended to fix the inconsistency like that manually, because if the original inode was really destroyed and its blocks have been reassigned to other inodes, then recovering it by force may cause double reference blocks and then cause data crash or lose. So it only can be used for some rare unfortunate corners.&lt;/p&gt;

&lt;p&gt;2. Inode bitmap differences:  -NNN&lt;br/&gt;
That means inode is not used, but marked in inode bitmap. If the indoe NNN was in using before the crash, then it is lost.&lt;/p&gt;

&lt;p&gt;3. Block bitmap differences:  &lt;del&gt;(NNN&lt;/del&gt;-MMM)&lt;br/&gt;
That means blocks are not used, but marked in block bitmap. Since no inode reference the blocks, just let e2fsck to fix the bitmap.&lt;/p&gt;

&lt;p&gt;4. Inode XXX, i_blocks is NNN, should be MMM.&lt;br/&gt;
That means the i_blocks does not match the real blocks usage. Under such case, we have to trust the real block usage even if the i_blocks was right. Just let e2fsck to fix the i_blocks.&lt;/p&gt;

&lt;p&gt;5. Inode XXX, i_size is NNN, should be MMM. &lt;br/&gt;
That the means the i_size is smaller than the real blocks usage. Trust the real blocks usage and let e2fsck to fix the I_size. If the inconsistency was caused by crashed truncated (shrink) operation, then some stale data may be recovered at the tail.&lt;/p&gt;

&lt;p&gt;...&lt;/p&gt;

&lt;p&gt;Anyway, as Andreas commented, most of the inconsistency may be disappear after journal replayed.&lt;/p&gt;</comment>
                            <comment id="204613" author="yong.fan" created="Mon, 7 Aug 2017 05:50:01 +0000"  >&lt;p&gt;Anything can I do for this ticket?&lt;/p&gt;</comment>
                            <comment id="205811" author="bhoagland" created="Fri, 18 Aug 2017 23:26:44 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=simmonsja&quot; class=&quot;user-hover&quot; rel=&quot;simmonsja&quot;&gt;simmonsja&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;Is there anything more you&apos;d like us to do for this ticket?&lt;/p&gt;

&lt;p&gt;Regards,&lt;/p&gt;

&lt;p&gt;Brad&lt;/p&gt;</comment>
                            <comment id="206143" author="hanleyja" created="Wed, 23 Aug 2017 15:49:12 +0000"  >&lt;p&gt;Brad,&lt;/p&gt;

&lt;p&gt;We can close this out.  Thanks for the help.&lt;/p&gt;</comment>
                            <comment id="206146" author="bhoagland" created="Wed, 23 Aug 2017 15:52:56 +0000"  >&lt;p&gt;Thanks, Jesse&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="27041" name="atlas-oss_e2fsck_output.tgz" size="3058990" author="simmonsja" created="Thu, 15 Jun 2017 15:16:11 +0000"/>
                            <attachment id="27040" name="bad_luns.out" size="44782" author="simmonsja" created="Thu, 15 Jun 2017 15:15:58 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzf73:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>