<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:17:49 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-1573] avoid data corruption for direct io data</title>
                <link>https://jira.whamcloud.com/browse/LU-1573</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;when we call a shutdown (without -f) we a set a &apos;notransno&apos; flag to put all requests in replay queue.&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;&lt;span class=&quot;code-keyword&quot;&gt;case&lt;/span&gt; &lt;span class=&quot;code-quote&quot;&gt;&apos;A&apos;&lt;/span&gt;:
 &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;LCONSOLE_WARN(&lt;span class=&quot;code-quote&quot;&gt;&quot;Failing over %s\n&quot;&lt;/span&gt;,
 &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;obd-&amp;gt;obd_name);
 &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;obd-&amp;gt;obd_fail = 1;
 &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;obd-&amp;gt;obd_no_transno = 1;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;if that will be raced with obd_commitrw_write() which process a DIO request - reply will be sent without last_commited trasno update.&lt;br/&gt;
ptlrpc client will be put that request in replay queue as transno &amp;gt; last_commited and send a completion event to user land application... we have a request with brw pages &lt;em&gt;directly&lt;/em&gt; pointed in user data in replay queue.&lt;br/&gt;
OOPS.&lt;/p&gt;

&lt;p&gt;If user land application will be reused a same buffer for different data after exit from write(2) call, ptlrpc will started to replay that request, but send a &lt;em&gt;invalid&lt;/em&gt; data to the OST. so we a corrupt data on OST side.&lt;/p&gt;

&lt;p&gt;replicate that bug is very easy.&lt;br/&gt;
use lctl --device notransno command and use directio write. that will don&apos;t blocked and exited - but ptlrpc request will have a pointer to userspace.&lt;/p&gt;

&lt;p&gt;we found that bug in testing DIO under failover.&lt;br/&gt;
we call a default replay_barier / fail functions on ost side and see - sometimes file a corrupted.&lt;br/&gt;
corruption fully addressed to the requests replayed after reconnect.&lt;br/&gt;
after disable sending a reply from a OST to the client for sync journal case - we have found that bug fixes,&lt;br/&gt;
but looks it&apos;s affected not just testing environment - but race window be smaller.&lt;/p&gt;</description>
                <environment>any lustre version</environment>
        <key id="15054">LU-1573</key>
            <summary>avoid data corruption for direct io data</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bzzz">Alex Zhuravlev</assignee>
                                    <reporter username="shadow">Alexey Lyashkov</reporter>
                        <labels>
                    </labels>
                <created>Wed, 27 Jun 2012 03:51:08 +0000</created>
                <updated>Fri, 3 Feb 2017 00:43:46 +0000</updated>
                            <resolved>Fri, 3 Feb 2017 00:43:46 +0000</resolved>
                                                    <fixVersion>Lustre 2.10.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>11</watches>
                                                                            <comments>
                            <comment id="41189" author="shadow" created="Wed, 27 Jun 2012 06:27:51 +0000"  >&lt;p&gt;remote:   &lt;a href=&quot;http://review.whamcloud.com/3197&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/3197&lt;/a&gt;&lt;br/&gt;
remote: &lt;/p&gt;

&lt;p&gt;patch created as port of 2.1 code, so question about create same for osd_io.c and possible for osd-zfs.&lt;/p&gt;</comment>
                            <comment id="41458" author="adilger" created="Wed, 4 Jul 2012 17:24:56 +0000"  >&lt;p&gt;Alex, I was looking to see how this would apply to OFD/osd-ldiskfs/osd-zfs, but it is not at all obvious due to changes in this code.&lt;/p&gt;

&lt;p&gt;Could you (or Mike, as you see fit) please look into porting this patch for the updated OFD/OSD code.&lt;/p&gt;</comment>
                            <comment id="41465" author="bzzz" created="Thu, 5 Jul 2012 01:22:00 +0000"  >&lt;p&gt;shouldn&apos;t we just always set transno to 0 on OSS side for sync requests?&lt;/p&gt;</comment>
                            <comment id="41468" author="green" created="Thu, 5 Jul 2012 02:17:16 +0000"  >&lt;p&gt;I personally think this is not really a &quot;critical&quot; issue.&lt;/p&gt;

&lt;p&gt;Sure the corruption is real, but the condition is not all that normal.&lt;br/&gt;
It requires a lustre umount (a soft failover) that&apos;s hopefully a pretty rare operation. Typically a node just dies and that&apos;s it.&lt;/p&gt;

&lt;p&gt;Soft failovers are usually only done in testing for simplicity and speed, or for administrative reasons.&lt;/p&gt;

&lt;p&gt;I guess the only case not like this is the fail back once the primary node comes back online.&lt;/p&gt;</comment>
                            <comment id="41470" author="shadow" created="Thu, 5 Jul 2012 02:43:43 +0000"  >&lt;p&gt;Alex,&lt;/p&gt;

&lt;p&gt;for sync requests - we need always reply with request transno &amp;lt;= committed transno, or don&apos;t send reply at all.&lt;br/&gt;
or provide more changes on client side to avoid moving that requests from sending to replay list.&lt;/p&gt;</comment>
                            <comment id="41471" author="bzzz" created="Thu, 5 Jul 2012 02:45:20 +0000"  >&lt;p&gt;for sync request transno makes no sense, IMHO.&lt;/p&gt;</comment>
                            <comment id="48251" author="nrutman" created="Wed, 21 Nov 2012 18:50:10 +0000"  >&lt;p&gt;Xyratex-bug-id: &lt;a href=&quot;http://jira-nss.xy01.xyratex.com:8080/browse/MRP-542&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;MRP-542&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="49363" author="bzzz" created="Tue, 18 Dec 2012 03:15:08 +0000"  >&lt;p&gt;not sure what&apos;s expected in this ticket.. as I said before, if RPC is not a subject to replay, then just do not set transno in reply.&lt;/p&gt;</comment>
                            <comment id="54097" author="shadow" created="Fri, 15 Mar 2013 07:02:33 +0000"  >&lt;p&gt;Alex,&lt;/p&gt;

&lt;p&gt;that is not enough because DIO page have a pointer to user page, instead of buffered IO have a copy user data inside.&lt;br/&gt;
if you set a transno to zero - you will leave that is from recovery so if we have uncommited data we will lost these changes.&lt;/p&gt;</comment>
                            <comment id="54099" author="bzzz" created="Fri, 15 Mar 2013 07:08:27 +0000"  >&lt;p&gt;sorry, I don&apos;t follow... it doesn&apos;t matter where the buffers are, imo. sync request means OST replies when the write is committed. in turn, this mean recovery/replay is not needed in the first place. then setting transno to 0 should be enough.&lt;/p&gt;</comment>
                            <comment id="54896" author="shadow" created="Wed, 27 Mar 2013 08:40:15 +0000"  >&lt;p&gt;when we have set a no_trasno - we have stop sending a transo updates to a clients (same way used to create recovery barrier in tests). after it - all requests put in recovery queue as rq_transno isn&apos;t less then last_transo from a clients. Ptlrpc don&apos;t have a differences between buffered write (where kernel have a copy of userspace data) and DIO - don&apos;t have a copy and have a direct pointer to userspace. both types of requests put in replay queue to able to replay after fail.&lt;/p&gt;

&lt;p&gt;that description enough? or that is fixes in 2.4 in different way? may you verify DIO requests don&apos;t put in recovery queue after recovery barrier? &lt;/p&gt;</comment>
                            <comment id="54897" author="bzzz" created="Wed, 27 Mar 2013 08:46:16 +0000"  >&lt;p&gt;you&apos;re confusing last_committed (which we do not update when notransno flag is set) and per-rpc transno which we set always:&lt;/p&gt;

&lt;p&gt;void target_committed_to_req(struct ptlrpc_request *req)&lt;br/&gt;
{&lt;br/&gt;
        struct obd_export *exp = req-&amp;gt;rq_export;&lt;/p&gt;

&lt;p&gt;        if (!exp-&amp;gt;exp_obd-&amp;gt;obd_no_transno &amp;amp;&amp;amp; req-&amp;gt;rq_repmsg != NULL)&lt;br/&gt;
                lustre_msg_set_last_committed(req-&amp;gt;rq_repmsg,&lt;br/&gt;
                                              exp-&amp;gt;exp_last_committed);&lt;/p&gt;
</comment>
                            <comment id="57094" author="bzzz" created="Fri, 26 Apr 2013 05:51:29 +0000"  >&lt;p&gt;as per discussion with Alexey at LUG: DIO should be waiting for IO completion on OST (otherwise it can get stuck awaiting for last transno to be transfered by ping), given so it makes sense to set transno to 0 in this case.&lt;/p&gt;</comment>
                            <comment id="57095" author="shadow" created="Fri, 26 Apr 2013 06:54:54 +0000"  >&lt;p&gt;so my original patch looks fine. that is drop a reply for a no trasno mode, simulating to waiting a commit from journal and prevent a moving from sending to replay list.&lt;/p&gt;</comment>
                            <comment id="181164" author="gerrit" created="Wed, 18 Jan 2017 17:18:24 +0000"  >&lt;p&gt;James Nunez (james.a.nunez@intel.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/24940&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/24940&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1573&quot; title=&quot;avoid data corruption for direct io data&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1573&quot;&gt;&lt;del&gt;LU-1573&lt;/del&gt;&lt;/a&gt; test: replay-ost-single test_9 issue extra write&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: a9a40f1130d2355443b0a08f77c21135f2ff7a66&lt;/p&gt;</comment>
                            <comment id="183216" author="gerrit" created="Fri, 3 Feb 2017 00:26:53 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/16680/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/16680/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1573&quot; title=&quot;avoid data corruption for direct io data&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1573&quot;&gt;&lt;del&gt;LU-1573&lt;/del&gt;&lt;/a&gt; recovery: Avoid data corruption for DIO during FOFB&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 1d2fbade1b658db4386091e7938d9483f7aa4a05&lt;/p&gt;</comment>
                            <comment id="183222" author="pjones" created="Fri, 3 Feb 2017 00:43:46 +0000"  >&lt;p&gt;Landed for 2.10&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzv33j:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>4001</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10002" key="com.atlassian.jira.plugin.system.customfieldtypes:float">
                        <customfieldname>Story Points</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>4.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>