<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:53:57 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-12593] update_log corruption</title>
                <link>https://jira.whamcloud.com/browse/LU-12593</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;During upgrade process from a old version, we saw errors&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Jun 13 18:16:48 282 kernel: LustreError: 24685:0:(lod_dev.c:131:lod_fld_lookup()) lustre02-MDT0005-mdtlov: invalid FID [0x0:0x1cf60001:0x0]
Jun 13 18:17:01 280 kernel: LustreError: 26388:0:(lod_dev.c:131:lod_fld_lookup()) lustre02-MDT0003-mdtlov: invalid FID [0x0:0x2050100:0x0]
Jun 13 18:20:30 279 kernel: LustreError: 26741:0:(lod_dev.c:131:lod_fld_lookup()) lustre02-MDT0002-mdtlov: invalid FID [0x0:0x1cf60001:0x0]
Jun 13 18:26:48 282 kernel: LustreError: 24685:0:(lod_dev.c:131:lod_fld_lookup()) lustre02-MDT0005-mdtlov: invalid FID [0x0:0x1cf60001:0x0]
Jun 13 18:27:01 280 kernel: LustreError: 26388:0:(lod_dev.c:131:lod_fld_lookup()) lustre02-MDT0003-mdtlov: invalid FID [0x0:0x2050100:0x0]
Jun 13 18:30:30 279 kernel: LustreError: 26741:0:(lod_dev.c:131:lod_fld_lookup()) lustre02-MDT0002-mdtlov: invalid FID [0x0:0x1cf60001:0x0]
Jun 13 19:03:39 281 kernel: LustreError: 17588:0:(lod_dev.c:131:lod_fld_lookup()) lustre02-MDT0004-mdtlov: invalid FID [0x0:0x1d:0x0]
Jun 13 19:03:41 281 kernel: LustreError: 17588:0:(lod_dev.c:131:lod_fld_lookup()) lustre02-MDT0004-mdtlov: invalid FID [0x0:0x1d:0x0]
Jun 13 19:03:43 281 kernel: LustreError: 17588:0:(lod_dev.c:131:lod_fld_lookup()) lustre02-MDT0004-mdtlov: invalid FID [0x0:0x1d:0x0]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Lustre logs shows the next info&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000004:00080000:12.0:1560446017.274394:0:17588:0:(lod_dev.c:423:lod_sub_recovery_thread()) lustre02-MDT0005-osp-MDT0004 get update log failed -5, retry
00000040:01000000:12.0:1560446017.274395:0:17588:0:(llog_osd.c:2027:llog_osd_get_cat_list()) cat list: disk size=192, read=32
00000004:00020000:12.0:1560446017.274467:0:17588:0:(lod_dev.c:131:lod_fld_lookup()) lustre02-MDT0004-mdtlov: invalid FID [0x0:0x1d:0x0]
00000004:00080000:12.0:1560446017.274468:0:17588:0:(lod_dev.c:423:lod_sub_recovery_thread()) lustre02-MDT0005-osp-MDT0004 get update log failed -5, retry
00000040:01000000:12.0:1560446017.274469:0:17588:0:(llog_osd.c:2027:llog_osd_get_cat_list()) cat list: disk size=192, read=32
00000004:00020000:12.0:1560446017.274548:0:17588:0:(lod_dev.c:131:lod_fld_lookup()) lustre02-MDT0004-mdtlov: invalid FID [0x0:0x1d:0x0]
00000004:00080000:12.0:1560446017.274549:0:17588:0:(lod_dev.c:423:lod_sub_recovery_thread()) lustre02-MDT0005-osp-MDT0004 get update log failed -5, retry
00000040:01000000:12.0:1560446017.274549:0:17588:0:(llog_osd.c:2027:llog_osd_get_cat_list()) cat list: disk size=192, read=32
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000040:00000001:1.0:1560447502.778180:0:17588:0:(llog_osd.c:1968:llog_osd_get_cat_list()) Process entered
00000004:00000001:1.0:1560447502.778181:0:17588:0:(osp_object.c:546:osp_attr_get()) Process entered
00000004:00000001:1.0:1560447502.778181:0:17588:0:(osp_object.c:556:osp_attr_get()) Process leaving (rc=0 : 0 : 0)
00000040:01000000:1.0:1560447502.778182:0:17588:0:(llog_osd.c:2027:llog_osd_get_cat_list()) cat list: disk size=192, read=32
....
00000004:00000040:1.0:1560447502.778187:0:17588:0:(osp_md_object.c:1148:osp_md_read()) lustre02-MDT0005-osp-MDT0004 [0x200000009:0x5:0x0] read offset 128 size 32
...
00000040:00000001:1.0:1560447502.778489:0:17588:0:(llog_osd.c:2056:llog_osd_get_cat_list()) Process leaving (rc=0 : 0 : 0)
...
00000004:00000040:1.0:1560447502.778495:0:17588:0:(mdt_handler.c:5441:mdt_object_init()) object init, fid = [0x0:0x1d:0x0]
00000004:00000010:1.0:1560447502.778495:0:17588:0:(mdd_object.c:282:mdd_object_alloc()) slab-alloced &apos;mdd_obj&apos;: 96 at ffff880f5ce8cc60.
00000004:00000001:1.0:1560447502.778496:0:17588:0:(mdt_handler.c:5450:mdt_object_init()) Process leaving (rc=0 : 0 : 0)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;the above was a remote read of update_log from MDT0005 on 282, 32 bytes on offset 128, &lt;br/&gt;
 however the received data is interpreted as FID=&lt;span class=&quot;error&quot;&gt;&amp;#91;0x0:0x1d:0x0&amp;#93;&lt;/span&gt;, invalid fid and the operation fails.&lt;/p&gt;

&lt;p&gt;The part of update_log from MDT0005 fetched by MDT0004 :&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;$ od -X -j 128 -N 32 -A d update_log
0000128          00000000        00000003        0000001d        00000004
0000144          0000001d        00000000        7473756c        32306572
0000160
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The issue is reproducible,&#160; reproducer do the next steps&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;umount MDTS/OSTs/Clients&lt;/li&gt;
	&lt;li&gt;for every MDT debugfs -w -R &quot;rm update_log&quot;&lt;/li&gt;
	&lt;li&gt;mount MDTS in parallel&lt;/li&gt;
	&lt;li&gt;mount OSTs and client&lt;/li&gt;
	&lt;li&gt;create some files&lt;/li&gt;
	&lt;li&gt;repeat from the beginning&lt;br/&gt;
I used 4 MDTs. I guess more MDTs is better. Issue happens not often, one time in a hour.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;The root cause for update_log corruption.&lt;br/&gt;
When the first write to file happen at some offset, the data before offset is not zeroed. It happens in rare case.&lt;br/&gt;
The path is&#160;osd_ldiskfs_write_record()&lt;del&gt;&amp;gt;__ldiskfs_bread()&lt;/del&gt;&amp;gt;ldiskfs_getblk()&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;   if (map.m_flags &amp;amp; LDISKFS_MAP_NEW) {
                J_ASSERT(create != 0);
                J_ASSERT(handle != NULL);

                /*
                 * Now that we do not always journal data, we should
                 * keep in mind whether this should always journal the
                 * new buffer as metadata.  For now, regular file
                 * writes use ldiskfs_get_block instead, so it&apos;s not a
                 * problem.
                 */
                lock_buffer(bh);
                BUFFER_TRACE(bh, &quot;call get_create_access&quot;);
                fatal = ldiskfs_journal_get_create_access(handle, bh);
                if (!fatal &amp;amp;&amp;amp; !buffer_uptodate(bh)) {
                        memset(bh-&amp;gt;b_data, 0, inode-&amp;gt;i_sb-&amp;gt;s_blocksize);
                        set_buffer_uptodate(bh);
                }
                unlock_buffer(bh);
                BUFFER_TRACE(bh, &quot;call ldiskfs_handle_dirty_metadata&quot;);
                err = ldiskfs_handle_dirty_metadata(handle, inode, bh);
                if (!fatal)
                        fatal = err;
        } else {
 &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;Some times NEW bh is uptodate, in this case data is not zeroed, and we get corruption for update_log.&lt;/p&gt;</description>
                <environment></environment>
        <key id="56498">LU-12593</key>
            <summary>update_log corruption</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="aboyko">Alexander Boyko</assignee>
                                    <reporter username="aboyko">Alexander Boyko</reporter>
                        <labels>
                            <label>patch</label>
                    </labels>
                <created>Fri, 26 Jul 2019 14:12:35 +0000</created>
                <updated>Wed, 17 Feb 2021 22:02:06 +0000</updated>
                            <resolved>Wed, 23 Oct 2019 03:29:27 +0000</resolved>
                                    <version>Lustre 2.12.0</version>
                                    <fixVersion>Lustre 2.13.0</fixVersion>
                    <fixVersion>Lustre 2.14.0</fixVersion>
                    <fixVersion>Lustre 2.12.4</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="252073" author="gerrit" created="Fri, 26 Jul 2019 14:23:47 +0000"  >&lt;p&gt;Alexandr Boyko (c17825@cray.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/35629&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/35629&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12593&quot; title=&quot;update_log corruption&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12593&quot;&gt;&lt;del&gt;LU-12593&lt;/del&gt;&lt;/a&gt; osd: zeroing a freshly allocated block buffer&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: b36f2f294d5365bc5a5336eb05437bafb510e2c1&lt;/p&gt;</comment>
                            <comment id="256881" author="gerrit" created="Tue, 22 Oct 2019 23:57:07 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/35629/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/35629/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12593&quot; title=&quot;update_log corruption&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12593&quot;&gt;&lt;del&gt;LU-12593&lt;/del&gt;&lt;/a&gt; osd: zeroing a freshly allocated block buffer&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: f832a7dc33c69fae9af199f0317e6385deeaeccf&lt;/p&gt;</comment>
                            <comment id="256901" author="pjones" created="Wed, 23 Oct 2019 03:29:27 +0000"  >&lt;p&gt;Landed for 2.13&lt;/p&gt;</comment>
                            <comment id="257817" author="bzzz" created="Wed, 6 Nov 2019 10:59:27 +0000"  >&lt;p&gt;Alexandr, have you seen LDISKFS_GET_BLOCKS_ZERO ? looks like it does what we needed?&lt;/p&gt;</comment>
                            <comment id="257901" author="aboyko" created="Thu, 7 Nov 2019 07:57:11 +0000"  >&lt;p&gt;Yes, I saw it. It does what we need, but for extents.&#160;osd_ldiskfs_write_record is based on another codepath.&lt;/p&gt;</comment>
                            <comment id="257910" author="bzzz" created="Thu, 7 Nov 2019 12:41:09 +0000"  >&lt;p&gt;osd_ldiskfs_write_record() -&amp;gt; __ldiskfs_bread() -&amp;gt; ldiskfs_bread() -&amp;gt; ldiskfs_getblk() -&amp;gt; ldiskfs_map_blocks() - so far it&apos;s common, no?&lt;/p&gt;</comment>
                            <comment id="257945" author="gerrit" created="Thu, 7 Nov 2019 19:16:21 +0000"  >&lt;p&gt;Minh Diep (mdiep@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/36709&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/36709&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12593&quot; title=&quot;update_log corruption&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12593&quot;&gt;&lt;del&gt;LU-12593&lt;/del&gt;&lt;/a&gt; osd: zeroing a freshly allocated block buffer&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: a468cb238833ee6f1cf854f58f3d6b0123d2be69&lt;/p&gt;</comment>
                            <comment id="259201" author="gerrit" created="Thu, 5 Dec 2019 14:56:53 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/36709/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/36709/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12593&quot; title=&quot;update_log corruption&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12593&quot;&gt;&lt;del&gt;LU-12593&lt;/del&gt;&lt;/a&gt; osd: zeroing a freshly allocated block buffer&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: e196376e841113c47423639b0f5f09f46cdfa25c&lt;/p&gt;</comment>
                            <comment id="262373" author="adilger" created="Sat, 1 Feb 2020 01:41:23 +0000"  >&lt;p&gt;It looks like there is a bug in this patch pointed out by Misc Code Checks Robot that it is leaking &lt;tt&gt;i_alloc_sem&lt;/tt&gt; in some error paths that may result in an OST deadlock in some cases, see my comments in &lt;a href=&quot;https://review.whamcloud.com/#/c/36976/2/lustre/osd-ldiskfs/osd_io.c,unified&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/#/c/36976/2/lustre/osd-ldiskfs/osd_io.c,unified&lt;/a&gt; &lt;/p&gt;</comment>
                            <comment id="262439" author="gerrit" created="Mon, 3 Feb 2020 09:40:56 +0000"  >&lt;p&gt;Alexandr Boyko (c17825@cray.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/37406&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/37406&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12593&quot; title=&quot;update_log corruption&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12593&quot;&gt;&lt;del&gt;LU-12593&lt;/del&gt;&lt;/a&gt; osd: up i_append_sem during errors&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 3fe8b86ae9f2061d26e0328f39f63012849f212d&lt;/p&gt;</comment>
                            <comment id="262629" author="gerrit" created="Wed, 5 Feb 2020 15:26:27 +0000"  >&lt;p&gt;Minh Diep (mdiep@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/37445&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/37445&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12593&quot; title=&quot;update_log corruption&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12593&quot;&gt;&lt;del&gt;LU-12593&lt;/del&gt;&lt;/a&gt; osd: up i_append_sem during errors&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: dd89b75768028123f38b40cd8a048d18bcc8f8cd&lt;/p&gt;</comment>
                            <comment id="262906" author="gerrit" created="Sat, 8 Feb 2020 04:07:05 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/37406/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/37406/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12593&quot; title=&quot;update_log corruption&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12593&quot;&gt;&lt;del&gt;LU-12593&lt;/del&gt;&lt;/a&gt; osd: up i_append_sem during errors&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 7599dd3d20d6bb4ee89634c5a76730481ca62470&lt;/p&gt;</comment>
                            <comment id="262920" author="gerrit" created="Sat, 8 Feb 2020 05:35:34 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/37445/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/37445/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12593&quot; title=&quot;update_log corruption&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12593&quot;&gt;&lt;del&gt;LU-12593&lt;/del&gt;&lt;/a&gt; osd: up i_append_sem during errors&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: f223dd255a4bb884b6013f3b69cb24c1da6c5d27&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                                        </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="57594">LU-13061</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00k7z:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>