<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:43:29 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4523] Need explanation for FS corruption -  ldiskfs_mb_free_metadata: Double free of blocks</title>
                <link>https://jira.whamcloud.com/browse/LU-4523</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;The customer, Yale, encountered file system corruption on one of their OST devices, dm-20 which is &quot;scratch-OST0028&quot;. Customer fan e2fsck on that device, which fixed the corruption, but now they would like to have a RCA to prevent it from happening in future.&lt;/p&gt;

&lt;p&gt;The corruption was first reported on Jan-11, but there aren&apos;t any irregular events on the storage side that would have caused such corruption, which could indicate the corruption happened sometime before and was only reported on the 11th.&lt;/p&gt;

&lt;p&gt;Jan 11 11:54:33 oss7 kernel: Lustre: 2916:0:(o2iblnd_cb.c:2249:kiblnd_passive_connect()) Conn stale 10.191.133.6@o2ib &lt;span class=&quot;error&quot;&gt;&amp;#91;old ver: 12, new ver: 12&amp;#93;&lt;/span&gt;&lt;br/&gt;
Jan 11 11:54:33 oss7 kernel: Lustre: 2916:0:(o2iblnd_cb.c:2249:kiblnd_passive_connect()) Skipped 101 previous similar messages&lt;br/&gt;
Jan 11 11:59:52 oss7 kernel: LDISKFS-fs error (device dm-20): ldiskfs_mb_free_metadata: Double free of blocks 30208 (30208 148)&lt;br/&gt;
Jan 11 11:59:52 oss7 kernel: Aborting journal on device dm-20-8.&lt;br/&gt;
Jan 11 11:59:52 oss7 kernel: LDISKFS-fs (dm-20): Remounting filesystem read-only&lt;br/&gt;
Jan 11 11:59:52 oss7 kernel: LDISKFS-fs error (device dm-20) in ldiskfs_reserve_inode_write: Journal has aborted&lt;br/&gt;
Jan 11 11:59:52 oss7 kernel: LDISKFS-fs error (device dm-20) in ldiskfs_ext_remove_space: Journal has aborted&lt;br/&gt;
Jan 11 11:59:52 oss7 kernel: LDISKFS-fs error (device dm-20) in ldiskfs_reserve_inode_write: Journal has aborted&lt;br/&gt;
Jan 11 11:59:52 oss7 kernel: LDISKFS-fs error (device dm-20) in ldiskfs_orphan_del: Journal has aborted&lt;br/&gt;
Jan 11 11:59:52 oss7 kernel: LDISKFS-fs error (device dm-20) in ldiskfs_reserve_inode_write: Journal has aborted&lt;br/&gt;
Jan 11 11:59:52 oss7 kernel: LDISKFS-fs error (device dm-20) in ldiskfs_ext_truncate: Journal has aborted&lt;br/&gt;
Jan 11 11:59:52 oss7 kernel: LustreError: 22850:0:(fsfilt-ldiskfs.c:369:fsfilt_ldiskfs_start()) error starting handle for op 8 (106 credits): rc -30&lt;/p&gt;


&lt;p&gt;FSCK output:&lt;br/&gt;
e2fsck 1.42.3.wc3 (15-Aug-2012)&lt;br/&gt;
device /dev/mapper/ost_scratch_40 mounted by lustre per /proc/fs/lustre/obdfilter/scratch-OST0028/mntdev&lt;br/&gt;
Warning!  /dev/mapper/ost_scratch_40 is mounted.&lt;br/&gt;
MMP interval is 10 seconds and total wait time is 42 seconds. Please wait...&lt;br/&gt;
Warning: skipping journal recovery because doing a read-only filesystem check.&lt;br/&gt;
scratch-OST0028 contains a file system with errors, check forced.&lt;br/&gt;
Pass 1: Checking inodes, blocks, and sizes&lt;/p&gt;

&lt;p&gt;Running additional passes to resolve blocks claimed by more than one inode...&lt;br/&gt;
Pass 1B: Rescanning for multiply-claimed blocks&lt;/p&gt;

&lt;p&gt;What other debugging or data can be pulled to explain the problem?&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Oz&lt;/p&gt;
</description>
                <environment></environment>
        <key id="22831">LU-4523</key>
            <summary>Need explanation for FS corruption -  ldiskfs_mb_free_metadata: Double free of blocks</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="hongchao.zhang">Hongchao Zhang</assignee>
                                    <reporter username="orentas">Oz Rentas</reporter>
                        <labels>
                    </labels>
                <created>Tue, 21 Jan 2014 19:32:19 +0000</created>
                <updated>Tue, 11 Feb 2014 15:54:28 +0000</updated>
                            <resolved>Tue, 11 Feb 2014 15:54:28 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="75378" author="pjones" created="Tue, 21 Jan 2014 20:53:38 +0000"  >&lt;p&gt;Oz&lt;/p&gt;

&lt;p&gt;We&apos;ll certainly need details about which Lustre version is in use and some logs - dmesg from this node (or syslog) as a start, say 24 hours into the past.&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="75393" author="orentas" created="Tue, 21 Jan 2014 22:23:38 +0000"  >&lt;p&gt;Ah, yes, of course.  Sorry about that.  I&apos;ve attached the missing files.&lt;/p&gt;

&lt;p&gt;Lustre: 1.8.9&lt;br/&gt;
Kernel: 2.6.18-348.1.1.el5&lt;/p&gt;</comment>
                            <comment id="75431" author="pjones" created="Wed, 22 Jan 2014 15:00:42 +0000"  >&lt;p&gt;Hongchao has been looking at this information&lt;/p&gt;</comment>
                            <comment id="75588" author="pjones" created="Fri, 24 Jan 2014 20:43:49 +0000"  >&lt;p&gt;Hongchao&lt;/p&gt;

&lt;p&gt;As per our recent discussion on this topic I understand that you believe this issue to be a duplication of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-482&quot; title=&quot;Test failure on test suite replay-dual, subtest test_0a&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-482&quot;&gt;&lt;del&gt;LU-482&lt;/del&gt;&lt;/a&gt; which periodically affected our internal testing on older Lustre releases but has not been seen on 2.4.x and newer releases and that you believe this is due to an issue in the underlying ext4 code that has been addressed with newer kernel versions.&lt;/p&gt;

&lt;p&gt;Do I have this right? Is there anything to add/correct?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="75650" author="hongchao.zhang" created="Sun, 26 Jan 2014 14:26:48 +0000"  >&lt;p&gt;Hi Peter,&lt;/p&gt;

&lt;p&gt;Yes, it could be problem related to the ext4 (patched a little by Lustre and renamed to ldiskfs), there are some similar ticket (&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-482&quot; title=&quot;Test failure on test suite replay-dual, subtest test_0a&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-482&quot;&gt;&lt;del&gt;LU-482&lt;/del&gt;&lt;/a&gt;, &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-699&quot; title=&quot;replay-dual test_1 fails to remount mdt&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-699&quot;&gt;&lt;del&gt;LU-699&lt;/del&gt;&lt;/a&gt;, etc)&lt;/p&gt;

&lt;p&gt;btw, there is a similar issues reported on Redhat,&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://access.redhat.com/site/solutions/157393&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://access.redhat.com/site/solutions/157393&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;</comment>
                            <comment id="76732" author="orentas" created="Tue, 11 Feb 2014 15:51:24 +0000"  >&lt;p&gt;Thank you for this very useful information.  It has been passed on to the customer.&lt;br/&gt;
We have since upgraded the OS / Lustre build on the servers. This ticket can be closed.&lt;/p&gt;</comment>
                            <comment id="76733" author="pjones" created="Tue, 11 Feb 2014 15:54:28 +0000"  >&lt;p&gt;ok thanks Oz&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="14002" name="kern.log.1" size="168644" author="orentas" created="Tue, 21 Jan 2014 22:21:17 +0000"/>
                            <attachment id="14003" name="uname_r" size="31" author="orentas" created="Tue, 21 Jan 2014 22:21:17 +0000"/>
                            <attachment id="14004" name="version" size="114" author="orentas" created="Tue, 21 Jan 2014 22:21:17 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwdcf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>12367</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>