<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:34:49 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-3542] deleted/unused inodes not actually cleared by e2fsck</title>
                <link>https://jira.whamcloud.com/browse/LU-3542</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;e2fsck doesn&apos;t actually clear deleted/unused inodes, though it claims to. I&apos;ve attached a log showing what we are seeing. The customer is CalTech. &lt;/p&gt;</description>
                <environment>Centos5, e2fsprogs-1.42.7.wc1-0redhat</environment>
        <key id="19646">LU-3542</key>
            <summary>deleted/unused inodes not actually cleared by e2fsck</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="niu">Niu Yawei</assignee>
                                    <reporter username="kitwestneat">Kit Westneat</reporter>
                        <labels>
                    </labels>
                <created>Mon, 1 Jul 2013 14:20:16 +0000</created>
                <updated>Fri, 13 Dec 2013 20:53:20 +0000</updated>
                            <resolved>Fri, 13 Dec 2013 20:53:20 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>10</watches>
                                                                            <comments>
                            <comment id="61582" author="pjones" created="Mon, 1 Jul 2013 14:33:35 +0000"  >&lt;p&gt;Nathaniel&lt;/p&gt;

&lt;p&gt;Could you please look into this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="61617" author="kitwestneat" created="Tue, 2 Jul 2013 03:50:09 +0000"  >&lt;p&gt;I need to increase the priority on this one. The OSTs are stopping with &quot;ldiskfs_lookup: deleted inode referenced.&quot; Do you have any ideas for how to fix it? I assume this means that the dentries are corrupted, but it seems weird that the files don&apos;t show up when I try to ls them. Is it possible that it&apos;s something with the HTREE? There were some HTREE messages in the original e2fsck.&lt;/p&gt;

&lt;p&gt;Thanks.&lt;/p&gt;</comment>
                            <comment id="61636" author="bfaccini" created="Tue, 2 Jul 2013 14:00:00 +0000"  >&lt;p&gt;Raised priority to blockern and severity to 1, after Kit&apos;s last update on this problem &lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;
We running into a problem trying to restart Lustre after a Sev1. We have run e2fsck several times on a filesystem hit by catastrophic disk failure and cleaned up most of the corruption. However, there are a bunch of referenced deleted/cleared inodes that are not getting cleaned up. e2fsck claims to clear them, but when you rerun it, it&apos;s still there.

When the OSTs hit these inodes in production, the OSTs go read-only, bringing Lustre down. So due to this, we are in Sev1 until the fs is 100% clean.

I put the most recent e2fsck logs in LU-3542. Anything else I should get?

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="61644" author="kitwestneat" created="Tue, 2 Jul 2013 15:24:11 +0000"  >&lt;p&gt;ok I think I figured out how to workaround the problem. If I use debugfs, I can unlink all the troublesome files and it works ok. Is there anything I should get to try to debug the e2fsprogs issue before I unlink everything?&lt;/p&gt;</comment>
                            <comment id="61646" author="bfaccini" created="Tue, 2 Jul 2013 15:31:34 +0000"  >&lt;p&gt;Yes, I confirm that. I just tested it too and it seems to work fine!!... What puzzle me is that e2fsck does not propose/do it ...&lt;/p&gt;

&lt;p&gt;Would be also interesting if you can provide the 1st e2fsck log it still available ??&lt;/p&gt;</comment>
                            <comment id="61648" author="adilger" created="Tue, 2 Jul 2013 15:33:30 +0000"  >&lt;p&gt;Kit, can you please run:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;debugfs -c -R &quot;htree_dump O/0/d10&quot; /dev/mapper/ost_global_7
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="61650" author="kitwestneat" created="Tue, 2 Jul 2013 15:35:30 +0000"  >&lt;p&gt;Also if I do a clri &amp;lt;inode&amp;gt; with debugfs to simulate the problem, e2fsck seems to do the right thing, so I&apos;m not sure what is weird about this filesystem.&lt;/p&gt;

&lt;p&gt;Very first:&lt;br/&gt;
e2fsck -v -f -p /dev/mapper/ost_global_7MMP interval is 10 seconds and total wait time is 42 seconds. Please wait...&lt;br/&gt;
global-OST0007: recovering journal&lt;br/&gt;
global-OST0007: Entry &apos;29584587&apos; in /O/0/d11 (148471827) has deleted/unused inode 153961358.  CLEARED.&lt;br/&gt;
global-OST0007: Entry &apos;29584575&apos; in /O/0/d31 (148471847) has deleted/unused inode 153961348.  CLEARED.&lt;br/&gt;
global-OST0007: Entry &apos;29584573&apos; in /O/0/d29 (148471845) has deleted/unused inode 153961346.  CLEARED.&lt;br/&gt;
global-OST0007: Entry &apos;29584407&apos; in /O/0/d23 (148471839) has deleted/unused inode 153961200.  CLEARED.&lt;br/&gt;
global-OST0007: Entry &apos;29584406&apos; in /O/0/d22 (148471838) has deleted/unused inode 153961199.  CLEARED.&lt;br/&gt;
global-OST0007: Entry &apos;29584589&apos; in /O/0/d13 (148471829) has deleted/unused inode 153961359.  CLEARED.&lt;br/&gt;
global-OST0007: Directory inode 148471835, block #360, offset 0: directory corrupted&lt;/p&gt;

&lt;p&gt;I will try to find the e2fsck -y log. &lt;/p&gt;</comment>
                            <comment id="61660" author="kitwestneat" created="Tue, 2 Jul 2013 16:11:32 +0000"  >&lt;p&gt;hey Andreas, some how I missed your comment till now, here is the htree dump&lt;/p&gt;</comment>
                            <comment id="61661" author="kitwestneat" created="Tue, 2 Jul 2013 16:12:42 +0000"  >&lt;p&gt;here are the e2fsck -p outputs from the first and second runs on ost_3 (also exhibiting the same behavior) &lt;/p&gt;</comment>
                            <comment id="61681" author="adilger" created="Tue, 2 Jul 2013 18:38:12 +0000"  >&lt;p&gt;Kit, it isn&apos;t clear from your comment whether your use of &lt;tt&gt;clri &amp;lt;inode&amp;gt;&lt;/tt&gt; is intended as a workaround (i.e. this allows e2fsck to correctly clean up the inode), or if you are trying (unsuccessfully) to reproduce the problem on a test filesystem to allow debugging e2fsck?&lt;/p&gt;

&lt;p&gt;It definitely seems possible to use debugfs to to mark the affected inodes as deleted and remove the name entries, e.g. &quot;&lt;tt&gt;clri &amp;lt;153961357&amp;gt;&lt;/tt&gt;&quot; and &quot;&lt;tt&gt;unlink /O/0/d10/29584586&lt;/tt&gt;&quot;.  In theory &quot;&lt;tt&gt;rm /O/0/d10/29584586&lt;/tt&gt;&quot; should do both, but it may be there is some problem with this and maybe safer to do them separately.  I&apos;d try this first on the ost_global_7 target, since it only has a few such objects, and then run &quot;e2fsck -fy&quot; to see if this fixed the problem.&lt;/p&gt;

&lt;p&gt;You could also try running &quot;e2fsck -fD&quot; on ost_global_7, which should rebuild the htree directory structure on the OST, since it seems there may a problem with this as well.  This isn&apos;t a requirement if it is working fine after the first e2fsck, and maybe better left to a scheduled downtime in the future.&lt;/p&gt;</comment>
                            <comment id="61682" author="kitwestneat" created="Tue, 2 Jul 2013 18:48:08 +0000"  >&lt;p&gt;Hi Andreas, I was trying to use clri to simulate the failure. I tested the unlink/rm through debugfs on a snapshot and it seemed to work well. I just saw all the htree corruption and got worried about running it on the real device. &lt;/p&gt;

&lt;p&gt;I&apos;ll try running e2fsck -fD on the snapshot to see how it does. I have been wary of it since there used to be bugs, but it looks like all those have been fixed in this version. &lt;/p&gt;

&lt;p&gt;Thanks. &lt;/p&gt;</comment>
                            <comment id="61689" author="adilger" created="Tue, 2 Jul 2013 20:02:59 +0000"  >&lt;p&gt;Kit, another option, which &lt;em&gt;might&lt;/em&gt; allow you to get the system back up and running, if e2fsck isn&apos;t fixing the problem, is to mount the OST with &quot;-o errors=continue&quot; which would at least avoid the OST from going read-only when it hit this error.&lt;/p&gt;

&lt;p&gt;Unfortunately, it seems that the &quot;-o errors=continue&quot; option in 2.4 is placed &lt;em&gt;before&lt;/em&gt; &quot;errors=remount-ro&quot; in the mount options line, so it is overridden (which is itself a bug).  I&apos;m not sure if this is handled correctly in 2.1, but worthwhile to try (I don&apos;t have a 2.1 system handy to test this right now).&lt;/p&gt;</comment>
                            <comment id="61690" author="adilger" created="Tue, 2 Jul 2013 20:05:19 +0000"  >&lt;p&gt;The previous &quot;e2fsck -fD&quot; problem was only seen on MDT devices, not on OST devices.  That said, it is my understanding that those problems were fixed in the version of e2fsck-1.42.7.wc1 that you are running, but I would have been leery to suggest it at this point if the issue was on an MDT device.&lt;/p&gt;

&lt;p&gt;If you have a snapshot, that is excellent, as it allows some margin for error if e2fsck behaves in a (more) unexpected manner.&lt;/p&gt;</comment>
                            <comment id="61691" author="kitwestneat" created="Tue, 2 Jul 2013 20:15:49 +0000"  >&lt;p&gt;Hi Andreas,&lt;/p&gt;

&lt;p&gt;After running the e2fsck -fD, I am getting this on e2fsck -fvy:&lt;br/&gt;
Interior extent node level 0 of inode 148471837:&lt;br/&gt;
Logical start 980 does not match logical start 981 at next level.  Fix? yes&lt;/p&gt;

&lt;p&gt;Inode 148471837, i_size is 2097152, should be 4022272.  Fix? yes&lt;/p&gt;

&lt;p&gt;Inode 148471837, i_blocks is 4112, should be 2856.  Fix? yes&lt;/p&gt;

&lt;p&gt;Inode 148471838, i_size is 2097152, should be 4005888.  Fix? yes&lt;/p&gt;

&lt;p&gt;Inode 148471838, i_blocks is 4112, should be 2816.  Fix? yes&lt;/p&gt;

&lt;p&gt;Inode 148471839, i_size is 2084864, should be 3952640.  Fix? yes&lt;/p&gt;

&lt;p&gt;Inode 148471839, i_blocks is 4088, should be 2832.  Fix? yes&lt;/p&gt;

&lt;p&gt;Inode 148471840, i_size is 2093056, should be 3948544.  Fix? yes&lt;/p&gt;

&lt;p&gt;Inode 148471840, i_blocks is 4104, should be 2880.  Fix? yes&lt;/p&gt;

&lt;p&gt;Inode 148471841, i_size is 2093056, should be 4001792.  Fix? yes&lt;/p&gt;

&lt;p&gt;Inode 148471841, i_blocks is 4104, should be 2800.  Fix? yes&lt;/p&gt;


&lt;p&gt;In your opinion, is this corruption created by the -fD or is it corruption uncovered by it?&lt;/p&gt;</comment>
                            <comment id="61694" author="adilger" created="Tue, 2 Jul 2013 20:31:45 +0000"  >&lt;p&gt;It looks like a bit of both.  The &quot;-fD&quot; option re-sorts and compacts the htree directories to ensure all of the leaf blocks are valid.  Normally this makes the directory smaller, which is the cause of the reduction in &quot;i_blocks&quot; values.  Conversely, the i_size value is based on the i_blocks count, but it is fixing this before it checks the i_blocks value.  That seems to be a separate bug in e2fsck.&lt;/p&gt;

&lt;p&gt;I don&apos;t think it will be harmful to allow these problems to be fixed, but I suspect a second e2fsck run is needed to re-fix the i_size values after i_blocks has been updated, and that should resolve the problems finally.&lt;/p&gt;</comment>
                            <comment id="61695" author="kitwestneat" created="Tue, 2 Jul 2013 20:33:52 +0000"  >&lt;p&gt;These all appear to be directory inodes. Towards the end of the run, it is non-stop &quot;Unattached inode ...&quot;.&lt;/p&gt;

&lt;p&gt;My snapshot ran out of space, and I was overconfident and ran it live. I&apos;m glad the ll_recover script exists!&lt;/p&gt;</comment>
                            <comment id="61697" author="kitwestneat" created="Tue, 2 Jul 2013 20:38:13 +0000"  >&lt;p&gt;ah I didn&apos;t see your response before posting. I am running this second e2fsck on a snapshot (the one with all the unattached inodes). Do you think there is any way to avoid all the Unattached inodes, or is it a necessary step at this point? Like would some combination of y/n to those i_size/i_blocks question prevent them from being lost+found?&lt;/p&gt;</comment>
                            <comment id="61698" author="adilger" created="Tue, 2 Jul 2013 21:08:56 +0000"  >&lt;p&gt;No, I think the unattached inodes are a consequence of the directory blocks being corrupted, and it is dumping all of the inodes from the corrupt leaf blocks into lost+found.  You&apos;ll need to run ll_recover_lost_found_objs to fix them.  In the not too distant future, online LFSCK in 2.5 (patch &lt;a href=&quot;http://review.whamcloud.com/6857&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/6857&lt;/a&gt;) will be able to do this automatically at mount time, but until then it needs to be run by hand.&lt;/p&gt;</comment>
                            <comment id="61702" author="kitwestneat" created="Tue, 2 Jul 2013 21:43:00 +0000"  >&lt;p&gt;Ah ok, hmm. a few questions:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Should I run the -fD on all the OSTs then?&lt;/li&gt;
	&lt;li&gt;I would have expected e2fsck to uncover any corrupted directories without -fD. Should I file a new bug on that?&lt;/li&gt;
	&lt;li&gt;I was doing a read-only lfsck when the OSTs started going read-only. Is there a possibility that some of the files in the corrupt leaf nodes didn&apos;t get added to the ost DBs due to the directory corruption? Should I rerun the e2fsck object db creation step?&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Thanks for all your help!&lt;/p&gt;</comment>
                            <comment id="61718" author="adilger" created="Tue, 2 Jul 2013 23:18:28 +0000"  >&lt;p&gt;I&apos;m not sure why the original e2fsck didn&apos;t show problems with the directory blocks, but the later ones do.  Typically, e2fsck is very robust about fixing problems on the first pass, or restarting automatically in the rare cases it cannot.&lt;/p&gt;

&lt;p&gt;If the other OSTs are behaving properly, I would avoid e2fsck -fD for now.  While this fixes up the htree directory structure, it also means that the directory will need to allocate new blocks as soon as new files are being allocated there (i.e. immediately for any OST).&lt;/p&gt;</comment>
                            <comment id="61778" author="adilger" created="Wed, 3 Jul 2013 17:58:15 +0000"  >&lt;p&gt;Kit, what is the status of this bug?  Can we lower it from Sev 1?&lt;/p&gt;</comment>
                            <comment id="61780" author="kitwestneat" created="Wed, 3 Jul 2013 18:49:24 +0000"  >&lt;p&gt;Hi Andreas, we are doing the final lfsck to get the list of damaged files, but we can lower the severity of this ticket. There are two e2fsck behaviors that we saw during this that seem like bugs to me:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;first is the not unlinking the deleted/cleared inodes&lt;/li&gt;
	&lt;li&gt;second is the movement of most files to lost+found by e2fsck -D&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;It might not be worth the effort to explore these at any high priority, but I think we should leave the ticket open for documentation at least.&lt;/p&gt;

&lt;p&gt;Thanks again for all your help and advice.&lt;/p&gt;</comment>
                            <comment id="69301" author="pjones" created="Fri, 18 Oct 2013 16:36:22 +0000"  >&lt;p&gt;Niu&lt;/p&gt;

&lt;p&gt;Can you please see what work remains on this ticket?&lt;/p&gt;

&lt;p&gt;THanks&lt;/p&gt;


&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="69384" author="niu" created="Mon, 21 Oct 2013 07:58:05 +0000"  >&lt;p&gt;Peter, the two questions Kit asked are probably e2fsck bugs. The remaining work is: &lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;Search to find out if the same problem was reported in Linux community before, and if there is any patch alreay. (I did an initial searching, but had no luck so far)&lt;/li&gt;
	&lt;li&gt;Try to reproduce the probelm and trace into the e2fsck code to see if it&apos;s really some bug needs be fixed. (that requires e2fsprogs expert and could be time-consuming)&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;I agree with Kit that it&apos;s not high priority job.&lt;/p&gt;</comment>
                            <comment id="69396" author="kitwestneat" created="Mon, 21 Oct 2013 14:18:06 +0000"  >&lt;p&gt;Hi Niu,&lt;/p&gt;

&lt;p&gt;This has become a higher priority for us. The problem is that if deleted inodes are not cleared, the filesystem will go read-only when it encounters the inode. This can lead to a state where the filesystem goes read-only at a random time and only manual intervention with debugfs can bring it back to a healthy state. It has happened to us a couple of times now, so I think we need to explore problem #1 a little more closely. &lt;/p&gt;

&lt;p&gt;Thanks.&lt;/p&gt;</comment>
                            <comment id="69491" author="niu" created="Tue, 22 Oct 2013 03:06:27 +0000"  >&lt;p&gt;Kit, I didn&apos;t know they often run into the problem of &quot;deleted/unused inode&quot;. Which Lustre version did they use? and do you know what kind of operation could possibly caused the problem? If possible, could you collect the log on OST before the problem happen? I think it might be helpful for us to figure out how this happened.&lt;/p&gt;

&lt;p&gt;I&apos;ll look into the e2fsck problem at the same time. Thank you.&lt;/p&gt;</comment>
                            <comment id="69514" author="kitwestneat" created="Tue, 22 Oct 2013 13:00:22 +0000"  >&lt;p&gt;Hi Niu,&lt;/p&gt;

&lt;p&gt;The first customer had a problem with the RAID storage which caused the ldiskfs corruption. The second customer had a power outage that we think corrupted the journal and journal replay (&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4102&quot; title=&quot;lots of multiply-claimed blocks in e2fsck&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4102&quot;&gt;&lt;del&gt;LU-4102&lt;/del&gt;&lt;/a&gt;). Basically when there is some kind of ldiskfs corruption, there is the possibility of getting these delete/unused inode messages, and it seems if the htrees are also corrupt, e2fsck is unable to clear them.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Kit&lt;/p&gt;</comment>
                            <comment id="69612" author="adilger" created="Wed, 23 Oct 2013 05:40:55 +0000"  >&lt;p&gt;I looked through the relevant code in pass2.c::check_dir_block():&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;                /* 
                 * Offer to clear unused inodes; &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; we are going to be
                 * restarting the scan due to bg_itable_unused being
                 * wrong, then don&apos;t clear any inodes to avoid zapping
                 * inodes that were skipped during pass1 due to an
                 * incorrect bg_itable_unused; we&apos;ll get any real
                 * problems after we restart.
                 */
                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (!(ctx-&amp;gt;flags &amp;amp; E2F_FLAG_RESTART_LATER) &amp;amp;&amp;amp;
                    !(ext2fs_test_inode_bitmap2(ctx-&amp;gt;inode_used_map,
                                                dirent-&amp;gt;inode)))
                        problem = PR_2_UNUSED_INODE;

                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (problem) {
                        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (fix_problem(ctx, problem, &amp;amp;cd-&amp;gt;pctx)) {
                                dirent-&amp;gt;inode = 0;
                                dir_modified++;
                                &lt;span class=&quot;code-keyword&quot;&gt;goto&lt;/span&gt; next;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It is easy to trigger the &lt;tt&gt;PR_2_UNUSED_INODE&lt;/tt&gt; problem by setting nlink = 0 in the inode(s) via debugfs.  However, when I run e2fsck against such a filesystem (whether with small directories or large htree directories) e2fsck fixes the problem by clearing the dirent (setting inode = 0 above, and later writing out the directory block) and a second check shows it is fixed.&lt;/p&gt;

&lt;p&gt;To capture a filesystem that has a persistent case of this problem (after &quot;e2fsck -fy&quot; didn&apos;t fix it) so that it can be debugged and fixed, please use e2image to dump the filesystem metadata.  The dense image format can be efficiently compressed and transported, unlike the sparse variant of e2image:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;e2image -Q /dev/OSTnnnn OSTnnnn.qcow
bzip2 -9 OSTnnnn.qcow
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Hopefully the OSTnnnn.qcow.bz2 image size is small enough for transport.  It is possible to reconstitute the (uncompressed) qcow file into a raw ext4 image file that can be tested with e2fsck, debugfs, or mounted via loopback.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;e2image -r OSTnnnn.qcow OSTnnnn.raw
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="69662" author="kitwestneat" created="Wed, 23 Oct 2013 18:45:32 +0000"  >&lt;p&gt;I don&apos;t think any of the OSTs described in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4102&quot; title=&quot;lots of multiply-claimed blocks in e2fsck&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4102&quot;&gt;&lt;del&gt;LU-4102&lt;/del&gt;&lt;/a&gt; currently has the deleted/unused inodes issue. All the ones that reported it on the r/o e2fsck had previously been clean, so I think that it&apos;s just a matter of them being in use. That being said I could get an image of the OST (ost_45) that had the error before.. Do you think that might be useful? I have the e2fsck output as well. &lt;/p&gt;</comment>
                            <comment id="69808" author="adilger" created="Thu, 24 Oct 2013 17:07:14 +0000"  >&lt;p&gt;Even if there isn&apos;t a 100% chance that OST has the problem, it is still worthwhile to make an image of the OST. This will first give us an idea of how long it takes to generate the image, how large it is (uncompressed and compressed), and it can also be used to test the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4102&quot; title=&quot;lots of multiply-claimed blocks in e2fsck&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4102&quot;&gt;&lt;del&gt;LU-4102&lt;/del&gt;&lt;/a&gt; code. &lt;/p&gt;</comment>
                            <comment id="70269" author="kitwestneat" created="Wed, 30 Oct 2013 16:21:10 +0000"  >&lt;p&gt;I got a qcow image with a file exhibiting the corruption, it&apos;s available here:&lt;br/&gt;
&lt;a href=&quot;http://ddntsr.com/ftp/2013-10-30-lustre-ost_lfs2_36.qcow2.bz2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://ddntsr.com/ftp/2013-10-30-lustre-ost_lfs2_36.qcow2.bz2&lt;/a&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;295M&amp;#93;&lt;/span&gt;&lt;/p&gt;

&lt;ol&gt;
	&lt;li&gt;e2fsck -fp /dev/mapper/ost_lfs2_36&lt;br/&gt;
lfs2-OST0024: Entry &apos;62977970&apos; in /O/0/d18 (88080410) has deleted/unused inode 1051496.  CLEARED.&lt;br/&gt;
lfs2-OST0024: 1546929/89620480 files (9.5% non-contiguous), 2367418676/5735710720 blocks&lt;/li&gt;
&lt;/ol&gt;


&lt;ol&gt;
	&lt;li&gt;e2fsck -fp /dev/mapper/ost_lfs2_36&lt;br/&gt;
lfs2-OST0024: Entry &apos;62977970&apos; in /O/0/d18 (88080410) has deleted/unused inode 1051496.  CLEARED.&lt;br/&gt;
lfs2-OST0024: 1546929/89620480 files (9.5% non-contiguous), 2367418676/5735710720 blocks&lt;/li&gt;
&lt;/ol&gt;
</comment>
                            <comment id="71049" author="dvicker" created="Thu, 7 Nov 2013 22:56:54 +0000"  >&lt;p&gt;We ran into this problem as well.  I&apos;ll attach the fsck output to this JIRA.  Email me if you&apos;d like me to send you the qcow image.  &lt;/p&gt;</comment>
                            <comment id="71345" author="dvicker" created="Tue, 12 Nov 2013 17:50:28 +0000"  >&lt;p&gt;I just uploaded my qcow image to ftp.whamcloud.com/uploads/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3542&quot; title=&quot;deleted/unused inodes not actually cleared by e2fsck&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3542&quot;&gt;&lt;del&gt;LU-3542&lt;/del&gt;&lt;/a&gt;/ost000b.qcow.bz2&lt;/p&gt;</comment>
                            <comment id="72099" author="niu" created="Fri, 22 Nov 2013 06:41:09 +0000"  >&lt;p&gt;The raw device of ftp.whamcloud.com/uploads/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3542&quot; title=&quot;deleted/unused inodes not actually cleared by e2fsck&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3542&quot;&gt;&lt;del&gt;LU-3542&lt;/del&gt;&lt;/a&gt;/ost000b.qcow.bz2 is 16TB? It&apos;s hard for me to find a machine with that big drive to reproduce the problem, is there any smaller OST which has the same problem?&lt;/p&gt;</comment>
                            <comment id="72155" author="kitwestneat" created="Fri, 22 Nov 2013 18:06:51 +0000"  >&lt;p&gt;Hi Niu, I was able to convert the qcow image to a raw (sparse) image on an XFS filesystem. It uses 3.6GB, though it reports a size of 28TB:&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@oxonia-mds1 rcvy&amp;#93;&lt;/span&gt;# ls -lh&lt;br/&gt;
total 3.6G&lt;br/&gt;
&lt;del&gt;rw&lt;/del&gt;------ 1 root root 28T Nov 22 10:04 ost000b.raw&lt;/p&gt;
</comment>
                            <comment id="72204" author="niu" created="Mon, 25 Nov 2013 02:54:49 +0000"  >&lt;blockquote&gt;
&lt;p&gt;Hi Niu, I was able to convert the qcow image to a raw (sparse) image on an XFS filesystem. It uses 3.6GB, though it reports a size of 28TB:&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@oxonia-mds1 rcvy&amp;#93;&lt;/span&gt;# ls -lh&lt;br/&gt;
total 3.6G&lt;br/&gt;
rw------ 1 root root 28T Nov 22 10:04 ost000b.raw&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;oh, I didn&apos;t notice it&apos;s sparse file. Then I think it can be converted on ext4 either, however, I got following error while trying to convert it on ext4 (actual size 1.6G, showed 16T):&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;e2image: Invalid argument &lt;span class=&quot;code-keyword&quot;&gt;while&lt;/span&gt; trying to convert qcow2 image (ost000b.qcow) into raw image
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If the 3.6G file you mentioned is &lt;a href=&quot;http://ddntsr.com/ftp/2013-10-30-lustre-ost_lfs2_36.qcow2.bz2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://ddntsr.com/ftp/2013-10-30-lustre-ost_lfs2_36.qcow2.bz2&lt;/a&gt;, could you upload it to whamcloud ftp? cause I have no permission to access the ddn ftp server.&lt;/p&gt;</comment>
                            <comment id="72235" author="kitwestneat" created="Mon, 25 Nov 2013 15:55:21 +0000"  >&lt;p&gt;Hi Niu, &lt;/p&gt;

&lt;p&gt;I don&apos;t think ext4 supports files greater than 16TB, so you&apos;d need to use XFS or ZFS. &lt;/p&gt;

&lt;p&gt;Yeah, the files on the DDN server are temporary.. I&apos;ll upload it to the Intel FTP server.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Kit&lt;/p&gt;</comment>
                            <comment id="72263" author="kitwestneat" created="Mon, 25 Nov 2013 21:34:13 +0000"  >&lt;p&gt;It seems like this doesn&apos;t actually produce a valid raw image:&lt;br/&gt;
e2image -r OSTnnnn.qcow OSTnnnn.raw&lt;/p&gt;

&lt;p&gt;I had to do:&lt;br/&gt;
qemu-img convert -p  -O raw /scratch/ost000b.qcow ost000b.raw&lt;/p&gt;

&lt;p&gt;to get something that worked.&lt;/p&gt;</comment>
                            <comment id="72322" author="kitwestneat" created="Tue, 26 Nov 2013 16:48:03 +0000"  >&lt;p&gt;It looks like the block number is wrapping around during the io_channel write:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;Entry &lt;span class=&quot;code-quote&quot;&gt;&apos;3102500&apos;&lt;/span&gt; in /O/0/d4 (19398664) has deleted/unused inode 26072855.  Clear? yes
Breakpoint 1, check_dir_block (fs=&amp;lt;value optimized out&amp;gt;, db=0x7ffff7f14340, priv_data=0x7fffffffe180) at pass2.c:1219
1219                    cd-&amp;gt;pctx.errcode = ext2fs_write_dir_block(fs, block_nr, buf);
(gdb) p block_nr
$30 = 4966058525

Breakpoint 2, raw_write_blk (channel=0x647570, data=0x648670, block=671091229, count=1, bufv=0x64f060) at unix_io.c:233

(gdb) p (unsigned &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt;)4966058525
$33 = 671091229
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I thought maybe it was the cache node, but that appears to use an unsigned long long to store the block.&lt;/p&gt;

&lt;p&gt;I&apos;ll keep looking but I thought I&apos;d pass that info along in case it helps.&lt;/p&gt;</comment>
                            <comment id="72326" author="kitwestneat" created="Tue, 26 Nov 2013 17:01:34 +0000"  >&lt;p&gt;oh I think it is the definition of ext2fs_write_dir_block:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;errcode_t ext2fs_write_dir_block(ext2_filsys fs, blk_t block,                       
                 void *inbuf)                                                       

typedef __u32       blk_t;                                                          
typedef __u64       blk64_t;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It seems like that should be blk64_t? It looks like ext2fs_write_dir_block3 uses blk64_t, but the call to ext2fs_write_dir_block already casts it down to blk_t&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;Breakpoint 1, check_dir_block (fs=&amp;lt;value optimized out&amp;gt;, db=0x7ffff7f14a00, priv_data=0x7fffffffe180) at pass2.c:1219
1219                    cd-&amp;gt;pctx.errcode = ext2fs_write_dir_block(fs, block_nr, buf);
(gdb) p block_nr
$35 = 4966058603
(gdb) cont
Continuing.

Breakpoint 3, ext2fs_write_dir_block3 (fs=0x647420, block=671091307, inbuf=0x667270, flags=0) at dirblock.c:146
146             &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; io_channel_write_blk64(fs-&amp;gt;io, block, 1, (&lt;span class=&quot;code-object&quot;&gt;char&lt;/span&gt; *) inbuf);
(gdb) p (blk_t)4966058603
$36 = 671091307
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="72369" author="niu" created="Wed, 27 Nov 2013 03:57:22 +0000"  >&lt;p&gt;Yes, I think that&apos;s probably the reason of the entries are not fixed. check_dir_block() should call ext2fs_write_dir_block3() directly.&lt;/p&gt;</comment>
                            <comment id="72399" author="kitwestneat" created="Wed, 27 Nov 2013 15:50:16 +0000"  >&lt;p&gt;ok, I can get a patch for that. &lt;/p&gt;

&lt;p&gt;I ran gcc with -Wconversion on the source code and there are a few other cases where it converts to blk_t from blk64_t. I guess it would be good to go through them all at some point... I am not sure I know enough about ext4 to judge if the conversion is valid or not. For example, pass2.c also has a conversion on line 890:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;struct dx_dirblock_info {                                                           
    &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt;     type;                                                                   
    blk_t       phys;                                                               
    &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt;     flags;                                                                  
    blk_t       parent;                                                             
    ext2_dirhash_t  min_hash;                                                       
    ext2_dirhash_t  max_hash;                                                       
    ext2_dirhash_t  node_min_hash;                                                  
    ext2_dirhash_t  node_max_hash;                                                  
};                                                                                  
                                                                                    
...

        dx_db = &amp;amp;dx_dir-&amp;gt;dx_block[db-&amp;gt;blockcnt];                                    
        dx_db-&amp;gt;type = DX_DIRBLOCK_LEAF;                                             
890&amp;gt;&amp;gt;   dx_db-&amp;gt;phys = block_nr;                                                     
        dx_db-&amp;gt;min_hash = ~0;                                                       
        dx_db-&amp;gt;max_hash = 0;                                                        
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Should those be 64-bit? It seems like it, but I don&apos;t know. There are 103 cases of conversion to blk_t from blk64_t . The real number of conversions is probably higher since there are also some like:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;fileio.c:164: warning: conversion to &#8216;blk_t&#8217; from &#8216;__u64&#8217; may alter its value
res_gdt.c:140: warning: conversion to &#8216;blk_t&#8217; from &#8216;&lt;span class=&quot;code-object&quot;&gt;long&lt;/span&gt; &lt;span class=&quot;code-object&quot;&gt;long&lt;/span&gt; unsigned &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt;&#8217; may alter its value
pass2.c:687: warning: conversion to &#8216;blk_t&#8217; from &#8216;e2_blkcnt_t&#8217; may alter its value
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="72407" author="kitwestneat" created="Wed, 27 Nov 2013 16:35:34 +0000"  >&lt;p&gt;&lt;a href=&quot;http://review.whamcloud.com/#/c/8416/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/8416/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="73510" author="pjones" created="Fri, 13 Dec 2013 20:53:20 +0000"  >&lt;p&gt;This fix has landed for the next e2fsprogs release&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="13099" name="e2fsck.log" size="8712" author="kitwestneat" created="Mon, 1 Jul 2013 14:20:16 +0000"/>
                            <attachment id="13102" name="e2fsck_safe_repair_ost_3.log-1" size="20752" author="kitwestneat" created="Tue, 2 Jul 2013 16:12:42 +0000"/>
                            <attachment id="13101" name="e2fsck_safe_repair_ost_3.log-2" size="17071" author="kitwestneat" created="Tue, 2 Jul 2013 16:12:42 +0000"/>
                            <attachment id="13804" name="fsck.hpfs2-eg3-oss11.ost0.2013_11_07.out1" size="190850" author="dvicker" created="Thu, 7 Nov 2013 22:57:34 +0000"/>
                            <attachment id="13100" name="htree.dump" size="124972" author="kitwestneat" created="Tue, 2 Jul 2013 16:11:32 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvugv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>8914</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10021"><![CDATA[2]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>