<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:52:08 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-12386] ldiskfs-fs error: ldiskfs_iget:4374: inode #x: comm ll_ostx_y: bad extra_isize (36832 != 512)</title>
                <link>https://jira.whamcloud.com/browse/LU-12386</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Similar error occurred on 2 osts, 2 different nodes using 2 different DDN raid controllers. The ost aborted journal and was remounted RO.&lt;/p&gt;

&lt;p&gt;Subsequent e2fsck successfully cleared the problem inodes and the targets re-mounted.&#160;&lt;/p&gt;

&lt;p&gt;We don&apos;t have extra_isize showing as a file system feature on these OST devs, or at least it doesn&apos;t show up in dumpe2fs output.&lt;/p&gt;

&lt;p&gt;The OSTs have been up and running ok since last September or so.&#160;&#160;&lt;/p&gt;</description>
                <environment>lustre 2.10.5.2.chaos-1.ch6_1&lt;br/&gt;
e2fsprogs 1.42.13.wc6-7.el7.x86_64&lt;br/&gt;
kernel 3.10.0-862.14.4.1chaos.ch6.x86_64&lt;br/&gt;
client side is running lustre 2.10.6_2.chaos</environment>
        <key id="55848">LU-12386</key>
            <summary>ldiskfs-fs error: ldiskfs_iget:4374: inode #x: comm ll_ostx_y: bad extra_isize (36832 != 512)</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="adilger">Andreas Dilger</assignee>
                                    <reporter username="ruth.klundt@gmail.com">Ruth Klundt</reporter>
                        <labels>
                    </labels>
                <created>Tue, 4 Jun 2019 20:27:48 +0000</created>
                <updated>Tue, 11 Jun 2019 13:11:13 +0000</updated>
                            <resolved>Tue, 11 Jun 2019 13:11:09 +0000</resolved>
                                    <version>Lustre 2.10.5</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="248405" author="adilger" created="Tue, 4 Jun 2019 22:08:28 +0000"  >&lt;p&gt;Ruth, the &quot;&lt;tt&gt;inode #x&lt;/tt&gt;&quot; part of the message may be relevant if this filesystem was formatted a &lt;b&gt;long&lt;/b&gt; time ago.  There was a bug in very old &lt;tt&gt;mke2fs&lt;/tt&gt; that didn&apos;t zero out the extra inode space in very low-numbered inodes (e.g. inodes 2-15 or so).&lt;/p&gt;

&lt;p&gt;Otherwise, it appears that this is inode corruption with some random garbage.  Do you have the e2fsck output to see if those inodes had other corruption, or was only the &lt;tt&gt;i_extra_isize&lt;/tt&gt; bad?  When was the previous time that e2fsck was run?  Was there anything run recently that would cause very old files to be accessed for the first time in a long time, or is this corruption on a recently-created file?&lt;/p&gt;

&lt;p&gt;Note that the &lt;tt&gt;extra_isize&lt;/tt&gt; feature is not needed to use the large inode space, that is enabled by default when the filesystem is formatted with inodes larger than 256 bytes (as with all Lustre filesystems) and enough space is reserved for the current kernel&apos;s fixed inode fields (32 bytes currently).  The &lt;tt&gt;extra_isize&lt;/tt&gt; feature is only needed for the case where &lt;b&gt;additional&lt;/b&gt; space is reserved beyond what is needed beyond the fixed inode fields.&lt;/p&gt;</comment>
                            <comment id="248465" author="ruth.klundt@gmail.com" created="Wed, 5 Jun 2019 17:14:32 +0000"  >&lt;p&gt;Thanks for the info about extra_isize, I&apos;m less confused &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;

&lt;p&gt;The filesystem was created on new gear in September 2018, with the software stack as listed above. One of the OSTs had a sequential group of 5 inodes with the problem, and they all had other corruption such as huge i_size, too many blocks, dtime set, and bitmap fixes were necessary. Also&#160;the fsck had to use a backup superblock because of &apos;bad block for block bitmap&apos;.&lt;/p&gt;

&lt;p&gt;Not sure how to determine whether these inodes are new or old. The inode numbers were in the 36M range, with each ost having ~72M total inodes. Currently the number of inodes in use is ~3M on all the osts.&lt;/p&gt;

&lt;p&gt;On the other OST 8 consecutive inode numbers (in 33M range) were showing other problems in addition to extra_isize. No bad superblock though.&#160;&lt;/p&gt;

&lt;p&gt;The OSTs are relatively large compared to what we&apos;ve had before on ldiskfs, 74TB. They are 46% full.&lt;/p&gt;

&lt;p&gt;Not sure about recent changes in user activity, I&apos;ll be looking around for that.&lt;/p&gt;</comment>
                            <comment id="248466" author="adilger" created="Wed, 5 Jun 2019 17:21:23 +0000"  >&lt;p&gt;Ok, that rules out the old mke2fs bug. &lt;/p&gt;

&lt;p&gt;Do you have the actual e2fsck output? Sometimes it is possible to see, based on what corrupt values are printed, what might have been overwriting a block. It definitely seems like a block-level corruption, since we have 8 512-byte OST inodes per 4KB block. &lt;/p&gt;

&lt;p&gt;Any errors on the controllers? Any other errors in the filesystems?&lt;/p&gt;</comment>
                            <comment id="248474" author="jamervi" created="Wed, 5 Jun 2019 19:33:22 +0000"  >&lt;p&gt;I checked out the storage subsystem and from the storage side of things (this is an SFA12K 10 stack) only 1 drive in the system is reporting a physical error. Otherwise there is no other reported errors. However, I checked on the IO channels (IB) and on one of the channels not associated with the servers with the OSTs is reporting symbol errors that appears to be a bad cable. This started getting reported at ~16:30 on 6/1 in the controller log. No other messages with the exception of the &apos;keep alive&apos; messages were reported.&lt;/p&gt;</comment>
                            <comment id="248491" author="ruth.klundt@gmail.com" created="Wed, 5 Jun 2019 22:12:43 +0000"  >&lt;p&gt;I&apos;m working on getting e2fsck output cleared to post.&lt;/p&gt;

&lt;p&gt;Other errors on the filesystem are things that are more or less usual, like high order page allocation failures, and grant complaints. ( &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9704&quot; title=&quot;ofd_grant_check() claims GRANT, real grant 0&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9704&quot;&gt;&lt;del&gt;LU-9704&lt;/del&gt;&lt;/a&gt; )&lt;/p&gt;</comment>
                            <comment id="248559" author="ruth.klundt@gmail.com" created="Thu, 6 Jun 2019 16:36:46 +0000"  >&lt;p&gt;ost002e output was not captured from the beginning.&lt;/p&gt;

&lt;p&gt;ost0008 I removed all of the &apos;fix? yes&apos; lines and lines describing &apos;count wrong for group&apos; since they were for the whole fs - someone has to actually look at these in order to approve release. Let me know if those omissions are of interest. Basically every group had a default value for blocks and inodes and was updated.&lt;/p&gt;</comment>
                            <comment id="248731" author="adilger" created="Fri, 7 Jun 2019 18:23:05 +0000"  >&lt;p&gt;Looking through the logs I don&apos;t see any kind of pattern with the broken inodes. They appear to be just random corruption in the inode block, with random flags set and bogus file sizes. &lt;/p&gt;

&lt;p&gt;It looks like the problem is limited to one block in the inode table (8 inodes), and the superblock, which could be recovered from a backup.  The inodes were cleared by e2fsck, since they no longer contained useful information, so there isn&apos;t anything that can be done to recover the data there. It doesn&apos;t look like there are any other problems with the filesystem. &lt;/p&gt;

&lt;p&gt;At this point it isn&apos;t clear if anything can be done to diagnose the source of this problem. I don&apos;t know the hardware well enough to say whether the drive or cable that Joe reported could be causing this or not. &lt;/p&gt;</comment>
                            <comment id="248980" author="ruth.klundt@gmail.com" created="Tue, 11 Jun 2019 13:05:43 +0000"  >&lt;p&gt;Thanks for looking, I think we would mostly be concerned with whether there is a need to upgrade anything in order to avoid a repeat occurrence. If not then we can close for now.&lt;/p&gt;</comment>
                            <comment id="248982" author="pjones" created="Tue, 11 Jun 2019 13:11:09 +0000"  >&lt;p&gt;ok - thanks Ruth&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="32736" name="fsck-ost0008.txt" size="6395708" author="ruth.klundt@gmail.com" created="Thu, 6 Jun 2019 16:33:25 +0000"/>
                            <attachment id="32737" name="fsck-ost002e.txt" size="6084" author="ruth.klundt@gmail.com" created="Thu, 6 Jun 2019 16:33:08 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00hkv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>