<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:01:05 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-13416] Data corruption during IOR testing with DoM files and hard failover</title>
                <link>https://jira.whamcloud.com/browse/LU-13416</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;IAM tables uses a zero copy update for files as similar as ldiskfs directories does.&lt;br/&gt;
osd-ldiskfs staring from &lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# git describe 67076c3c7e2b11023b943db2f5031d9b9a11329c
v2_2_50_0-22-g67076c3
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;does same. But it&apos;s not a safe without set a LDISKFS_INODE_JOURNAL_DATA to inodes.&lt;br/&gt;
(thanks bzzz for tip).&lt;br/&gt;
Otherwise metadata blocks can be reused before journal checkpoint without corresponded revoke records. It caused a valid file data will replaced with stale journaled data.&lt;br/&gt;
from blk trace perspective it shown&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;    mdt_io01_025-32148 [003]  4161.223760: block_bio_queue:      9,65 W 12075997800 + 8 [mdt_io01_025]
    mdt_io01_019-31765 [003]  4163.374449: block_bio_queue:      9,65 W 12075997800 + 8 [mdt_io01_019]
    mdt_io01_000-12006 [014]  4165.256635: block_bio_queue:      9,65 W 12075997800 + 8 [mdt_io01_000]
    mdt_io01_019-31765 [004]  4167.030265: block_bio_queue:      9,65 W 12075997800 + 8 [mdt_io01_019]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;but this info is committed&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000001:00080000:9.0:1585615546.198190:0:11825:0:(tgt_lastrcvd.c:902:tgt_cb_last_committed()) snx11281-MDT0000: transno 4522600752066 is committed
00000001:00080000:9.0:1585615546.198196:0:11825:0:(tgt_lastrcvd.c:902:tgt_cb_last_committed()) snx11281-MDT0000: transno 4522600752064 is committed
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;


&lt;p&gt;but after crash, journal records is&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Commit time 1585612866.905807896
  FS block 1509499725 logged at journal block 1370 (flags 0x2)
Found expected sequence 86453863, type 2 (commit block) at block 1382
Commit time 1585612871.80796396
Found expected sequence 86453864, type 2 (commit block) at block 1395
Commit time 1585612871.147796211
  FS block 1509499725 logged at journal block 1408 (flags 0x2)
Found expected sequence 86453865, type 2 (commit block) at block 1414
Commit time 1585612872.386792798
  FS block 1509499725 logged at journal block 1427 (flags 0x2)
Found expected sequence 86453866, type 2 (commit block) at block 1438
Commit time 1585612876.763804361
Found expected sequence 86453867, type 2 (commit block) at block 1451
Commit time 1585612876.834804666
  FS block 1509499725 logged at journal block 1464 (flags 0x2)
Found expected sequence 86453868, type 2 (commit block) at block 1471
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and none revoke records.&lt;/p&gt;</description>
                <environment>Any Lustre 2.x affected.</environment>
        <key id="58629">LU-13416</key>
            <summary>Data corruption during IOR testing with DoM files and hard failover</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="shadow">Alexey Lyashkov</assignee>
                                    <reporter username="shadow">Alexey Lyashkov</reporter>
                        <labels>
                    </labels>
                <created>Mon, 6 Apr 2020 16:20:41 +0000</created>
                <updated>Wed, 23 Dec 2020 09:46:14 +0000</updated>
                            <resolved>Wed, 20 May 2020 13:36:03 +0000</resolved>
                                                    <fixVersion>Lustre 2.14.0</fixVersion>
                    <fixVersion>Lustre 2.12.5</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>9</watches>
                                                                            <comments>
                            <comment id="267302" author="adilger" created="Thu, 9 Apr 2020 19:57:46 +0000"  >&lt;p&gt;Is the solution here just to set &lt;tt&gt;LDISKFS_JOURNAL_DATA_FL&lt;/tt&gt; on IAM files at create/open time (if unset)?  I know we can&apos;t normally change the &quot;&lt;tt&gt;+j&lt;/tt&gt;&quot; flag on an inode without flushing the journal, but this is handled properly via &lt;tt&gt;ldiskfs_ioctl_setflags()&lt;/tt&gt; and will only change the flag once per IAM file (normally at mount) so it is not a significant performance concern.&lt;/p&gt;</comment>
                            <comment id="267312" author="shadow" created="Thu, 9 Apr 2020 21:18:35 +0000"  >&lt;p&gt;Andreas,&lt;br/&gt;
solution is simple, but realization is too hard, as OSD don&apos;t know is this object is under journaling or not.&lt;br/&gt;
In most cases we can check a fid sequences, but sometimes not.&lt;/p&gt;

&lt;p&gt;&apos;out&apos; protocol is problem, and Wangdi version for DNE2 llogs.&lt;br/&gt;
for first case, we don&apos;t have an &apos;open&apos; - just a locate an object and start to write to it.&lt;br/&gt;
second case (probably it&apos;s different side of first) - it&apos;s llogs who uses an normal fid (it can be fixed - but it need to change an object type to create).&lt;/p&gt;

&lt;p&gt;first case is primary problem - as OUT processing don&apos;t have something like &apos;open&apos; command.&lt;/p&gt;

&lt;p&gt;PS. lustre large EA is affected also - ext4 version have an exclusion in code, which force to add a revoke records in EA inode destroyed.&lt;/p&gt;

</comment>
                            <comment id="267313" author="bzzz" created="Thu, 9 Apr 2020 21:20:21 +0000"  >&lt;p&gt;I&apos;d think that setting this flag in osd_write() and when IAM is being modified should be enough?&lt;/p&gt;</comment>
                            <comment id="267330" author="shadow" created="Fri, 10 Apr 2020 03:42:14 +0000"  >&lt;p&gt;Alex,&lt;/p&gt;

&lt;p&gt;I think it&apos;s not enough. this flags affects just for objects in memory, but what will be under high memory load? object will flushed from cache and inode is released without unlink. at next time, object allocated again and passed to unlink path. It mean no flag while unlink so .. bug will exist.&lt;br/&gt;
In case on journal size 1G and more - this mean up 1000s before journal checkpoint, in comparison to commit which is more often. &lt;/p&gt;</comment>
                            <comment id="267341" author="bzzz" created="Fri, 10 Apr 2020 06:19:07 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=shadow&quot; class=&quot;user-hover&quot; rel=&quot;shadow&quot;&gt;shadow&lt;/a&gt; this flag is part of on-disk inode like INODE_EXTENTS, for example.&lt;/p&gt;</comment>
                            <comment id="267342" author="shadow" created="Fri, 10 Apr 2020 06:24:47 +0000"  >&lt;p&gt;I understand, but inode dirty to have write this flag to disk on each write is too expensive. &lt;br/&gt;
Otherwise this flag will don&apos;t flushed to disk in case file updates (same size don&apos;t want to have mark_inode_dirty) - which is common case for already existent system. &lt;/p&gt;

&lt;p&gt;PS. i checked a last logs. System had a 4100s up and none checkpoint process was processed and journal is ~20% full. &lt;/p&gt;</comment>
                            <comment id="267343" author="bzzz" created="Fri, 10 Apr 2020 06:31:24 +0000"  >&lt;p&gt;this is to be done once, then mark_inode_dirty() can be skipped.&lt;/p&gt;</comment>
                            <comment id="267344" author="shadow" created="Fri, 10 Apr 2020 07:00:47 +0000"  >&lt;p&gt;&amp;gt; this is to be done once, then mark_inode_dirty() can be skipped.&lt;/p&gt;

&lt;p&gt;anyway this change need to be checked with performance evaluation, once DNE2 have so much llog operations.&lt;/p&gt;</comment>
                            <comment id="267345" author="bzzz" created="Fri, 10 Apr 2020 07:01:19 +0000"  >&lt;p&gt;buffer is kept in cache until it&apos;s checkpointed, so I don&apos;t see why storing the flag in buffer (with mark_inode_dirty()) won&apos;t work.&lt;/p&gt;</comment>
                            <comment id="267346" author="bzzz" created="Fri, 10 Apr 2020 07:06:51 +0000"  >&lt;blockquote&gt;
&lt;p&gt;anyway this change need to be checked with performance evaluation, once DNE2 have so much llog operations.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;sure, though I&apos;d expect that to be barely visible - any &lt;em&gt;first&lt;/em&gt; write to a llog file would set the flag and given llog file is just created, inode is to be modified &lt;em&gt;anyway&lt;/em&gt; to get it&apos;s block(s). so it could be optimized even with something like the following in osd_write():&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
&lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (LDISKFS_INODE_JOURNAL_DATA not set) {
    set LDISKFS_INODE_JOURNAL_DATA;
    &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (inode-&amp;gt;i_size != 0)
        mark_inode_dirty();
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;the first condition can be wrapped with unlikely(), I guess.&lt;/p&gt;</comment>
                            <comment id="268028" author="gerrit" created="Mon, 20 Apr 2020 09:58:35 +0000"  >&lt;p&gt;Alexey Lyashkov (alexey.lyashkov@hpe.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/38281&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/38281&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13416&quot; title=&quot;Data corruption during IOR testing with DoM files and hard failover&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13416&quot;&gt;&lt;del&gt;LU-13416&lt;/del&gt;&lt;/a&gt; ldiskfs: don&apos;t corrupt data on journal replay&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 94b7608c78c85d1d79fec8196ba0fecc9538ab78&lt;/p&gt;</comment>
                            <comment id="270649" author="gerrit" created="Wed, 20 May 2020 08:22:19 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/38281/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/38281/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13416&quot; title=&quot;Data corruption during IOR testing with DoM files and hard failover&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13416&quot;&gt;&lt;del&gt;LU-13416&lt;/del&gt;&lt;/a&gt; ldiskfs: don&apos;t corrupt data on journal replay&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: a23aac2219047cb04ed1fa555f31fa39e5c499dc&lt;/p&gt;</comment>
                            <comment id="270709" author="pjones" created="Wed, 20 May 2020 13:36:03 +0000"  >&lt;p&gt;Landed for 2.14&lt;/p&gt;</comment>
                            <comment id="270977" author="gerrit" created="Sat, 23 May 2020 01:40:11 +0000"  >&lt;p&gt;Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/38705&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/38705&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13416&quot; title=&quot;Data corruption during IOR testing with DoM files and hard failover&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13416&quot;&gt;&lt;del&gt;LU-13416&lt;/del&gt;&lt;/a&gt; ldiskfs: don&apos;t corrupt data on journal replay&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 77c1f307df4a3c068ec45a4948350bc55112e151&lt;/p&gt;</comment>
                            <comment id="271355" author="gerrit" created="Wed, 27 May 2020 21:34:35 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/38705/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/38705/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13416&quot; title=&quot;Data corruption during IOR testing with DoM files and hard failover&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13416&quot;&gt;&lt;del&gt;LU-13416&lt;/del&gt;&lt;/a&gt; ldiskfs: don&apos;t corrupt data on journal replay&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 76b1050a56385cf8ddea47c9fea12eec21478601&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="62127">LU-14267</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00x2v:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>