<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:43:48 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4557] Negative used block number of OST after OSS crashes and reboots</title>
                <link>https://jira.whamcloud.com/browse/LU-4557</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Durding active I/O on OSS (e.g. IOR from client), if OSS is reset (not umount, but like force reset), and when OSS comes up, the mount all OSTs, it shows strange OST size like below.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@noss01 mount&amp;#93;&lt;/span&gt;# df -h -t lustre&lt;br/&gt;
Filesystem            Size  Used Avail Use% Mounted on&lt;br/&gt;
/dev/mapper/OST00      22T  -17G   22T   0% /mnt/lustre/OST00&lt;br/&gt;
/dev/mapper/OST01      22T  -19G   22T   0% /mnt/lustre/OST01&lt;br/&gt;
/dev/mapper/OST02      22T  -17G   22T   0% /mnt/lustre/OST02&lt;br/&gt;
/dev/mapper/OST03      22T  -19G   22T   0% /mnt/lustre/OST03&lt;br/&gt;
/dev/mapper/OST04      22T  -17G   22T   0% /mnt/lustre/OST04&lt;/p&gt;

&lt;p&gt;It is easy to reproduce the problem. The script &quot;run.sh&quot; is able to reproduce the problme on a server named &quot;server1&quot; and a virtual machine named &quot;vm1&quot;.&lt;/p&gt;

&lt;p&gt;After some investigation, we found some facts about this problem. After the problem happends, the OST file system is corrupted. Following is the fsck result.&lt;br/&gt;
===============================================================================&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;fsck -y /dev/sdb3&lt;br/&gt;
fsck from util-linux-ng 2.17.2&lt;br/&gt;
e2fsck 1.42.7.wc1 (12-Apr-2013)&lt;br/&gt;
server1-OST0002 contains a file system with errors, check forced.&lt;br/&gt;
Pass 1: Checking inodes, blocks, and sizes&lt;br/&gt;
Pass 2: Checking directory structure&lt;br/&gt;
Pass 3: Checking directory connectivity&lt;br/&gt;
Pass 4: Checking reference counts&lt;br/&gt;
Pass 5: Checking group summary information&lt;br/&gt;
Free blocks count wrong (560315, counted=490939).&lt;br/&gt;
Fix? yes&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;QUOTA WARNING&amp;#93;&lt;/span&gt; Usage inconsistent for ID 0:actual (1220608, 253) != expected (0, 32)&lt;br/&gt;
Update quota info for quota type 0? yes&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;QUOTA WARNING&amp;#93;&lt;/span&gt; Usage inconsistent for ID 0:actual (1220608, 253) != expected (0, 32)&lt;br/&gt;
Update quota info for quota type 1? yes&lt;/p&gt;

&lt;p&gt;server1-OST0002: ***** FILE SYSTEM WAS MODIFIED *****&lt;br/&gt;
server1-OST0002: 262/131648 files (0.4% non-contiguous), 35189/526128 blocks&lt;br/&gt;
===============================================================================&lt;br/&gt;
Second, after the OSS crashes and before mounts the OST again, fsck shows the free inode/space in the super block is false. That is not a big problem since fsck is able to fix that problem easily. Somehow Lustre makes the problem bigger if this tiny problem is not fixed.&lt;br/&gt;
===============================================================================&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@vm1 ~&amp;#93;&lt;/span&gt;# fsck -n /dev/sdb3 &lt;br/&gt;
fsck from util-linux-ng 2.17.2&lt;br/&gt;
e2fsck 1.42.7.wc1 (12-Apr-2013)&lt;br/&gt;
Warning: skipping journal recovery because doing a read-only filesystem check.&lt;br/&gt;
server1-OST0002: clean, 13/131648 files, 34900/526128 blocks&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@vm1 ~&amp;#93;&lt;/span&gt;# fsck /dev/sdb3 &lt;br/&gt;
fsck from util-linux-ng 2.17.2&lt;br/&gt;
e2fsck 1.42.7.wc1 (12-Apr-2013)&lt;br/&gt;
server1-OST0002: recovering journal&lt;br/&gt;
Setting free inodes count to 131387 (was 131635)&lt;br/&gt;
Setting free blocks count to 420283 (was 491228)&lt;br/&gt;
server1-OST0002: clean, 261/131648 files, 105845/526128 blocks&lt;br/&gt;
===============================================================================&lt;br/&gt;
What&apos;s more, after the OSS crashes and before mounts the OST again, we have two ways to prevent to problem from happening, fsck that OST or mount/umount that OST using ldiskfs.&lt;br/&gt;
We also found that this problem is not reproducable on Lustre versions before 6a6561972406043efe41ae43b64fd278f360a4b9, simply because versions before that commit do a premount/umount before start OST service.&lt;/p&gt;</description>
                <environment></environment>
        <key id="22913">LU-4557</key>
            <summary>Negative used block number of OST after OSS crashes and reboots</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="hongchao.zhang">Hongchao Zhang</assignee>
                                    <reporter username="lixi">Li Xi</reporter>
                        <labels>
                            <label>ldiskfs</label>
                            <label>patch</label>
                    </labels>
                <created>Wed, 29 Jan 2014 04:31:57 +0000</created>
                <updated>Tue, 12 Aug 2014 19:34:20 +0000</updated>
                            <resolved>Tue, 29 Apr 2014 01:34:15 +0000</resolved>
                                    <version>Lustre 2.4.2</version>
                                    <fixVersion>Lustre 2.6.0</fixVersion>
                    <fixVersion>Lustre 2.5.3</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="75835" author="lixi" created="Wed, 29 Jan 2014 04:37:09 +0000"  >&lt;p&gt;Here is a patch which fixes the problem by pre-mount/umount ldiskfs before OSS starts.&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/9044&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/9044&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Though that patch helps to fix the problem, I don&apos;t think that is a perfect solution. There might be better ways to fix this problem. Any ideas? Thanks!&lt;/p&gt;</comment>
                            <comment id="75836" author="pjones" created="Wed, 29 Jan 2014 05:02:17 +0000"  >&lt;p&gt;Hongchao&lt;/p&gt;

&lt;p&gt;Could you please comment on this patch?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="75876" author="adilger" created="Wed, 29 Jan 2014 18:20:52 +0000"  >&lt;p&gt;I think that the mount/unmount is not a proper fix for the problem.  We need to understand what is actually going wrong and fix that.&lt;/p&gt;

&lt;p&gt;The free blocks/inodes values stored in the superblock should never be used directly by the kernel code, since they are not kept up-to-date.  Instead, there are percpu counters that are loaded from the bitmaps at mount time and kept updated when blocks are allocated or freed.&lt;/p&gt;</comment>
                            <comment id="75971" author="ihara" created="Fri, 31 Jan 2014 02:59:28 +0000"  >&lt;p&gt;BTW, we hit this problem as real situation at the customer site. &lt;br/&gt;
For example, when stonith process kills an OSS via IPMI, then another OSS mounts all OSTs for failover, but all OST&apos;s size are negative numbers which are more critical.&lt;/p&gt;

&lt;p&gt;In order to reproduce this problem in the our lab, we used VMs that was Li Xi posted reproducer script.&lt;/p&gt;</comment>
                            <comment id="76329" author="lixi" created="Thu, 6 Feb 2014 02:24:45 +0000"  >&lt;p&gt;Yeah, Andreas, I agree on that. I am wondering why the inconsistent free block/inode number in the superblock causes further problem of Lustre OSS. That is strange to me because the numbers are not used directly.&lt;/p&gt;</comment>
                            <comment id="76579" author="hongchao.zhang" created="Mon, 10 Feb 2014 11:02:13 +0000"  >&lt;p&gt;I have tested with two different kernel 2.6.32-279.2.1 and 2.6.32-358.23.2, both have the problem.&lt;br/&gt;
and I also test the ext4 and it also show the problem with negative &quot;Used&quot; block space.&lt;/p&gt;

&lt;p&gt;in ext4/ldiskfs, Reset the system during active IO(say, dd), and deleting/truncating some file after rebooting the system, &lt;br/&gt;
the free blocks will become larger than the total disk block space, which cause negative &quot;Used&quot; value.&lt;/p&gt;

&lt;p&gt;more work is needed to check the ext4/ldiskfs more deeply to see where is the problem.&lt;/p&gt;
</comment>
                            <comment id="76592" author="hongchao.zhang" created="Mon, 10 Feb 2014 15:04:25 +0000"  >&lt;p&gt;by printing more debug info during mounting the ldiskfs device, the output of the fsck is incomplete,&lt;br/&gt;
the output of fsck:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;fsck from util-linux-ng 2.17.2
e2fsck 1.42.6.wc2 (10-Dec-2012)
lustre-OST0002: recovering journal
Setting free blocks count to 2318165 (was 2153045)
lustre-OST0002: clean, 192/184320 files, 827563/3145728 blocks
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;the debug output:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;different free blocks(1): stored = 3176 (counted 2918)
different free blocks(3): stored = 766 (counted 3072)
different free blocks(4): stored = 2048 (counted 0)
different free blocks(5): stored = 1024 (counted 3072)
different free blocks(6): stored = 2048 (counted 0)
different free blocks(9): stored = 1024 (counted 3072)
different free blocks(15): stored = 0 (counted 2048)
different free blocks(19): stored = 0 (counted 32768)
different free blocks(20): stored = 2048 (counted 32768)
different free blocks(21): stored = 2048 (counted 32768)
different free blocks(22): stored = 0 (counted 32768)
different free blocks(23): stored = 0 (counted 32768)
different free blocks(24): stored = 0 (counted 32768)
different free blocks(25): stored = 3072 (counted 31744)
different free blocks(26): stored = 2048 (counted 32768)
different free blocks(27): stored = 3072 (counted 31744)
different free blocks(28): stored = 10496 (counted 32768)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;the free blocks count has been incompatible in various block groups.&lt;/p&gt;</comment>
                            <comment id="76951" author="hongchao.zhang" created="Thu, 13 Feb 2014 11:05:50 +0000"  >&lt;p&gt;this problem is in ext4, which initializes the ext4_sb_info-&amp;gt;s_freeblocks_counter, ext4_sb_info-&amp;gt;s_freeinodes_counter before loading journal,&lt;br/&gt;
then it will be fixed by mounting twice in patch &quot;http://review.whamcloud.com/9044&quot;.&lt;/p&gt;

&lt;p&gt;by moving the following codes after journal was loaded, the issue in ext4 is fixed&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;        err = percpu_counter_init(&amp;amp;sbi-&amp;gt;s_freeblocks_counter,
                        ext4_count_free_blocks(sb));
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (!err) {
                err = percpu_counter_init(&amp;amp;sbi-&amp;gt;s_freeinodes_counter,
                                ext4_count_free_inodes(sb));
        }
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (!err) {
                err = percpu_counter_init(&amp;amp;sbi-&amp;gt;s_dirs_counter,
                                ext4_count_dirs(sb));
        }
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (!err) {
                err = percpu_counter_init(&amp;amp;sbi-&amp;gt;s_dirtyblocks_counter, 0);
        }
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (err) {
                ext4_msg(sb, KERN_ERR, &lt;span class=&quot;code-quote&quot;&gt;&quot;insufficient memory&quot;&lt;/span&gt;);
                &lt;span class=&quot;code-keyword&quot;&gt;goto&lt;/span&gt; failed_mount4;
        }
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;the problem in Lustre is also fixed by it.&lt;/p&gt;</comment>
                            <comment id="76969" author="lixi" created="Thu, 13 Feb 2014 15:07:30 +0000"  >&lt;p&gt;Hi Hongchao,&lt;/p&gt;

&lt;p&gt;Thank you very much for investigate on this! Would you please share your Lustre patch which fixes this problem? I&apos;d like to check the result too.&lt;/p&gt;</comment>
                            <comment id="77070" author="hongchao.zhang" created="Fri, 14 Feb 2014 10:00:28 +0000"  >&lt;p&gt;the patch is under test, and will push it to Gerrit soon.&lt;/p&gt;</comment>
                            <comment id="77081" author="hongchao.zhang" created="Fri, 14 Feb 2014 14:07:00 +0000"  >&lt;p&gt;the initial patch is tracked at &lt;a href=&quot;http://review.whamcloud.com/#/c/9277/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/9277/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="77087" author="lixi" created="Fri, 14 Feb 2014 15:22:45 +0000"  >&lt;p&gt;Hi Hongchao,&lt;/p&gt;

&lt;p&gt;I&apos;ve checked that your patch works perfectly to fix this problem. Thanks!&lt;/p&gt;</comment>
                            <comment id="77097" author="simmonsja" created="Fri, 14 Feb 2014 17:16:53 +0000"  >&lt;p&gt;Is this a problem for SLES11 SP3 as well?&lt;/p&gt;</comment>
                            <comment id="77709" author="hongchao.zhang" created="Mon, 24 Feb 2014 12:17:41 +0000"  >&lt;p&gt;the SLES11 SP3 uses ext3 by default, and ext4 will only be used by read-only mode.&lt;br/&gt;
and this problem exists according to the code line of ext4 (kernel version: 3.0.76-0.11.1)&lt;/p&gt;</comment>
                            <comment id="77798" author="bogl" created="Tue, 25 Feb 2014 13:06:34 +0000"  >&lt;p&gt;The problem of ext4 being readonly in SLES has been fixed in our builds for months.  See &lt;a href=&quot;http://review.whamcloud.com/8335&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/8335&lt;/a&gt;, &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4276&quot; title=&quot;make ldiskfs configured for read/write access by default&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4276&quot;&gt;&lt;del&gt;LU-4276&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="78650" author="adilger" created="Fri, 7 Mar 2014 00:09:45 +0000"  >&lt;p&gt;Looking at &lt;a href=&quot;http://review.whamcloud.com/9277&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/9277&lt;/a&gt; more closely, along with the upstream kernel, it seems that this patch is NOT needed for the SLES11SP2, even though it appears the code is the same as RHEL6.  There were two patches applied to the upstream kernel - v2.6.34-rc7-16-g84061e0 was almost the same as 9277, and v2.6.37-rc1-3-gce7e010 that mostly reverted it and loaded the percpu counters both before and after journal replay.  It isn&apos;t yet clear why the ce7e010 patch was landed, but the net result is that we should delete the sles11sp2/ext4-init-statfs-after-journal.patch and remove it from the sles11sp2 and sles11sp3 series files.&lt;/p&gt;

&lt;p&gt;There is also a subtle defect in the 9277 patch, since if ldiskfs is ever mounted with &quot;-o nojournal&quot; the initialization of the percpu counters will be skipped.  We don&apos;t ever run Lustre in that mode, so it isn&apos;t seen during our testing.  The correct approach would probably be to replace the current rhel6.3/ext4-init-statfs-after-journal.patch with copies of the upstream commits 84061e0 and ce7e010, so that when RHEL6 backports these fixes it will be clear that our patch is no longer needed.  Otherwise, our patch does not conflict when both of those patches are applied.&lt;/p&gt;</comment>
                            <comment id="81159" author="hongchao.zhang" created="Tue, 8 Apr 2014 08:07:28 +0000"  >&lt;p&gt;RHEL6 has backported the commits 84061e0 and ce7e010 in 2.6.32-431.5.1, and our patch(&lt;a href=&quot;http://review.whamcloud.com/#/c/9277/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/9277/&lt;/a&gt;)  was ever landed on master,&lt;br/&gt;
then the patch is needed to revert.&lt;/p&gt;

&lt;p&gt;the reverting patch is at &lt;a href=&quot;http://review.whamcloud.com/#/c/9908/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/9908/&lt;/a&gt;&lt;/p&gt;
</comment>
                            <comment id="81411" author="hongchao.zhang" created="Fri, 11 Apr 2014 10:49:09 +0000"  >&lt;p&gt;the patch is updated&lt;/p&gt;</comment>
                            <comment id="81429" author="ihara" created="Fri, 11 Apr 2014 15:40:55 +0000"  >&lt;p&gt;backport patch for b2_5 &lt;a href=&quot;http://review.whamcloud.com/9933&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/9933&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="82606" author="pjones" created="Mon, 28 Apr 2014 14:31:27 +0000"  >&lt;p&gt;The latest patch landed to 2.6. Do I understand correctly that, due to the recent kernel updates, no changes are needed to any maintenance release branches and so this ticket can now be marked as resolved?&lt;/p&gt;</comment>
                            <comment id="82708" author="hongchao.zhang" created="Tue, 29 Apr 2014 01:00:35 +0000"  >&lt;p&gt;Yes, it can be closed now.&lt;/p&gt;</comment>
                            <comment id="82713" author="pjones" created="Tue, 29 Apr 2014 01:34:15 +0000"  >&lt;p&gt;Thanks Hongchao&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="14029" name="run.sh" size="4365" author="lixi" created="Wed, 29 Jan 2014 04:31:57 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwdtb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>12446</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>