<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:16:04 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-15171] corrupted in-inode xattr</title>
                <link>https://jira.whamcloud.com/browse/LU-15171</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
LDISKFS-fs error (device md6): ldiskfs_get_inode_usage:818: inode #310: comm ll_ost_io00_570: corrupted in-inode xattr
Aborting journal on device md6-8.
Kernel panic - not syncing: LDISKFS-fs (device md6): panic forced after error

LDISKFS-fs error (device md6): ldiskfs_journal_check_start:61: Detected aborted journal
LDISKFS-fs error (device md6): ldiskfs_journal_check_start:61: Detected aborted journal
LDISKFS-fs (md6): Remounting filesystem read-only
CPU: 9 PID: 55386 Comm: ll_ost_io00_570 4.18.0-305.10.2.x6.0.29.x86_64 #1
Hardware name: Seagate Laguna Seca/Laguna Seca, BIOS v02.0043 09/24/2019
Call Trace:
  dump_stack+0x5c/0x80
  panic+0xe7/0x2a9
  ldiskfs_handle_error.cold.139+0x13/0x13 [ldiskfs]
  __ldiskfs_error_inode+0xaf/0x130 [ldiskfs]
  __xattr_check_inode+0x4a/0x70 [ldiskfs]
  ldiskfs_get_inode_usage+0x195/0x290 [ldiskfs]
  __dquot_transfer+0x8a/0x5d0
  dquot_transfer+0x8e/0x130
  osd_quota_transfer+0x188/0x310 [osd_ldiskfs]
  osd_attr_set+0xd4/0x740 [osd_ldiskfs]
  ofd_write_attr_set+0x7bf/0x1070 [ofd]
  ofd_commitrw_write+0x222/0x1a50 [ofd]
  ofd_commitrw+0x458/0xa60 [ofd]
  tgt_brw_write+0x18de/0x2390 [ptlrpc]
  tgt_request_handle+0xc93/0x1a00 [ptlrpc]
  ptlrpc_server_handle_request+0x323/0xbd0 [ptlrpc]
  ptlrpc_main+0xc06/0x1550 [ptlrpc]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment></environment>
        <key id="66875">LU-15171</key>
            <summary>corrupted in-inode xattr</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="panda">Andrew Perepechko</assignee>
                                    <reporter username="stancheff">Shaun Tancheff</reporter>
                        <labels>
                            <label>rhel8.3</label>
                    </labels>
                <created>Thu, 28 Oct 2021 00:56:33 +0000</created>
                <updated>Wed, 10 Aug 2022 03:23:21 +0000</updated>
                            <resolved>Sat, 20 Nov 2021 14:36:43 +0000</resolved>
                                                    <fixVersion>Lustre 2.15.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="316806" author="stancheff" created="Thu, 28 Oct 2021 02:36:09 +0000"  >&lt;p&gt;Last seen with HPE: v2_14_55-54-g0a74a33f87 based on WC: v2_14_55-41-g14d07b6237&lt;/p&gt;</comment>
                            <comment id="317067" author="adilger" created="Fri, 29 Oct 2021 22:47:10 +0000"  >&lt;p&gt;Shaun, it would be useful in this case to use debugfs and/or &quot;dd&quot; to dump the inode and its xattr and attach it here, to see how it is corrupted.  Unfortunately, the &lt;tt&gt;ldiskfs_get_inode_usage()&amp;#45;&amp;gt;__xattr_check_inode()&amp;#45;&amp;gt;ldiskfs_xattr_check_entries()&lt;/tt&gt; code does not say &lt;b&gt;how&lt;/b&gt; the xattr was corrupted, and this is important to determine how to handle this error better.&lt;/p&gt;

&lt;p&gt;I hit a similar issue on my home system running 2.14.0 on a RHEL8.2 kernel (though I don&apos;t mount with &lt;tt&gt;errors=panic&lt;/tt&gt;, so it just prevents access to one file instead of rebooting the server):&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;ldiskfs_xattr_inode_get:497: inode #2405396: comm mdt00_002: EA inode hash validation failed
(mdt_handler.c:1429:mdt_getattr_internal()) myth-MDT0000: getattr error for [0x20002da99:0x3cc:0x0]: rc = -117
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In my case, it looks like it is caused by a slight incompatibility between how the RHEL7-patched &lt;tt&gt;ea_inode&lt;/tt&gt; code is storing xattrs (not storing a checksum in the xattr), compared to how this functionality was implemented when it landed in the upstream kernel for RHEL8 and above.  I can&apos;t say in your case whether your xattr is totally corrupted, or has a similar minor error.&lt;/p&gt;

&lt;p&gt;It would be useful to fix this on several fronts:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;correct how ldiskfs in RHEL7 is storing those xattrs, to be forward compatible with RHEL8&lt;/li&gt;
	&lt;li&gt;patch ldiskfs in RHEL8 to be more forgiving about minor inconsistencies (e.g. checksum=0, but not a random checksum error)&lt;/li&gt;
	&lt;li&gt;update e2fsck to correct such errors (at least e2fsck from e2fsprogs-1.46.2.wc3 does &lt;b&gt;not&lt;/b&gt; correct this xattr, even though the kernel rejects it).&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="317104" author="panda" created="Sun, 31 Oct 2021 20:07:47 +0000"  >&lt;p&gt;It seems that this issue is not really a corruption. At least I haven&apos;t found any corruption in the buffer heads from the crash dump&lt;/p&gt;

&lt;p&gt;Commit 7a9ca53ae (~v4.13) added the requirement for xattr_sem locking when calling *dquot_transfer:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
commit 7a9ca53aea10ad4677a0f347ad7639c304b80194
Author: Tahsin Erdogan &amp;lt;tahsin@google.com&amp;gt;
Date:   Thu Jun 22 11:46:48 2017 -0400

    quota: add get_inode_usage callback to transfer multi-inode charges

...

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 962f28a0e176..d9733aa955e9 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5295,7 +5295,14 @@ &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; ext4_setattr(struct dentry *dentry, struct iattr *attr)
                        error = PTR_ERR(handle);
                        &lt;span class=&quot;code-keyword&quot;&gt;goto&lt;/span&gt; err_out;
                }
+
+               /* dquot_transfer() calls back ext4_get_inode_usage() which
+                * counts xattr inode references.
+                */
+               down_read(&amp;amp;EXT4_I(inode)-&amp;gt;xattr_sem);
                error = dquot_transfer(inode, attr);
+               up_read(&amp;amp;EXT4_I(inode)-&amp;gt;xattr_sem);
+
                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (error) {
                        ext4_journal_stop(handle);
                        &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; error;
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index dde8deb11e59..42b3a73143cf 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -373,7 +373,13 @@ &lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; ext4_ioctl_setproject(struct file *filp, __u32 projid)
 
        transfer_to[PRJQUOTA] = dqget(sb, make_kqid_projid(kprojid));
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (!IS_ERR(transfer_to[PRJQUOTA])) {
+
+               /* __dquot_transfer() calls back ext4_get_inode_usage() which
+                * counts xattr inode references.
+                */
+               down_read(&amp;amp;EXT4_I(inode)-&amp;gt;xattr_sem);
                err = __dquot_transfer(inode, transfer_to);
+               up_read(&amp;amp;EXT4_I(inode)-&amp;gt;xattr_sem);
                dqput(transfer_to[PRJQUOTA]);
                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (err)
                        &lt;span class=&quot;code-keyword&quot;&gt;goto&lt;/span&gt; out_dirty;
...
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In Lustre, we do not take this lock. It seems that a first write race is possible when one thread attempts to modify inode xattrs and another thread performs dquot_transfer which analyzes xattr consistency (and eventually fails).&lt;/p&gt;

&lt;p&gt;It seems that we can simply wrap *dquot_transfer() calls with xattr locking for the newer kernels. It should be ok performance-wise for the OST side and, hopefully, for the MDT side either. A proof of concept path will be uploaded for review.&lt;/p&gt;</comment>
                            <comment id="317105" author="gerrit" created="Sun, 31 Oct 2021 20:08:00 +0000"  >&lt;p&gt;&quot;Andrew Perepechko &amp;lt;andrew.perepechko@hpe.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/45424&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/45424&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15171&quot; title=&quot;corrupted in-inode xattr&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15171&quot;&gt;&lt;del&gt;LU-15171&lt;/del&gt;&lt;/a&gt; osd-ldiskfs: xattr_sem locking is missing for dquot_transfer&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: fb9f97b3481214017be8de89555076262ecaa6e1&lt;/p&gt;</comment>
                            <comment id="318730" author="gerrit" created="Sat, 20 Nov 2021 06:24:45 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/45424/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/45424/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15171&quot; title=&quot;corrupted in-inode xattr&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15171&quot;&gt;&lt;del&gt;LU-15171&lt;/del&gt;&lt;/a&gt; osd-ldiskfs: xattr_sem locking is missing for dquot_transfer&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: e6c7fcdaf40b130c39af2e3ee8b108c6e31a8ca8&lt;/p&gt;</comment>
                            <comment id="318772" author="pjones" created="Sat, 20 Nov 2021 14:36:43 +0000"  >&lt;p&gt;Landed for 2.15&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="71785">LU-16082</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i028mn:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>