<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:18:15 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-15431] read may return stale data from replicated file</title>
                <link>https://jira.whamcloud.com/browse/LU-15431</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;FLR doesn&apos;t hold layout lock (LL) during IO (OST_READ) operations, but verify layout at the beginning of IO and at the end of IO.&lt;br/&gt;
the problem is that pages get Uptodate flag from brw_interpret context while the layout is checked for changes later. the pages are still marked uptodate till that point.&lt;br/&gt;
three processes:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;1st modifying the file&lt;/li&gt;
	&lt;li&gt;2nd initiated OST_READ and going to repeat OST_READ due to layout change&lt;/li&gt;
	&lt;li&gt;3rd which finds result of OST_READ (stale data) in ll_do_fast_read()&lt;/li&gt;
&lt;/ul&gt;
</description>
                <environment></environment>
        <key id="67911">LU-15431</key>
            <summary>read may return stale data from replicated file</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bzzz">Alex Zhuravlev</assignee>
                                    <reporter username="bzzz">Alex Zhuravlev</reporter>
                        <labels>
                            <label>flr-improvement</label>
                    </labels>
                <created>Tue, 11 Jan 2022 09:39:04 +0000</created>
                <updated>Mon, 27 Nov 2023 11:15:35 +0000</updated>
                            <resolved>Fri, 14 Jul 2023 12:46:10 +0000</resolved>
                                                    <fixVersion>Lustre 2.16.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>10</watches>
                                                                            <comments>
                            <comment id="322276" author="bzzz" created="Tue, 11 Jan 2022 09:40:16 +0000"  >&lt;p&gt;Gentelmens, please have a look&lt;/p&gt;</comment>
                            <comment id="323474" author="adilger" created="Fri, 21 Jan 2022 15:20:15 +0000"  >&lt;p&gt;A second, more serious case can happen:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;1st client writes to file, MDS marks some replica(s) stale&lt;/li&gt;
	&lt;li&gt;2nd client has cached OST extent lock of (now) stale objects&lt;/li&gt;
	&lt;li&gt;2nd client &lt;b&gt;only&lt;/b&gt; fast-reads from pages cached under extent lock of stale object (eg. 4KB/2KB/1KB sized/aligned reads).&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Because fast reads do not check DLM locks or layout version (they assume that if a page is in cache that it is valid), then there is nothing on the second client that will trigger layout refresh or reading from the primary OST object. &lt;/p&gt;

&lt;p&gt;At some point, the 2md client would run out of cached pages, or possibly have a read that crosses a page boundary that is not &quot;fast&quot; and would check the layout version so that reads go to the primary object, but this could happen seconds after the cached object became stale.&lt;/p&gt;

&lt;p&gt;It also isn&apos;t clear if there is a mechanism for the layout update on the client to cause all cached pages on the stale replica objects to be flushed from cache, or if only OST reads for uncached pages access the primary replica objects?&lt;/p&gt;</comment>
                            <comment id="323482" author="paf0186" created="Fri, 21 Jan 2022 16:00:39 +0000"  >&lt;p&gt;Yuck, that&#8217;s true.&lt;/p&gt;

&lt;p&gt;We definitely need to add a layout version check to the fast read code, but we also need to add some way to flush the old layout and data. &#160;That I am not sure how best to do - layout changes very carefully don&#8217;t flush everything today, but perhaps they should.&lt;/p&gt;

&lt;p&gt;It would be something like &#8220;I detect a layout version change, so I flush all local data&#8221;.&lt;/p&gt;

&lt;p&gt;I&#8217;m also a bit concerned about pending writes in cache - the FLR state cycle for the layout (read only, pending, writable) covers ongoing writes nicely, but doesn&#8217;t seem to handle dirty data already in cache. &#160;It seems like maybe losing the layout lock should cause a flush to disk of any dirty data for the file?&lt;/p&gt;

&lt;p&gt;And maybe losing the layout lock should just cause a complete cache flush/destroy. &#160;Yuck.&lt;/p&gt;</comment>
                            <comment id="323483" author="paf0186" created="Fri, 21 Jan 2022 16:05:02 +0000"  >&lt;p&gt;So today, losing the layout lock ensures any ongoing I/O is restarted. &#160;I think that&#8217;s handled ok. &#160;But it does nothing for cached data, dirty or clean.&lt;/p&gt;

&lt;p&gt;The layout lock is already required for doing I/O to a file. &#160;Perhaps losing it should just cause a complete cache flush for that file? &#160;Honestly this seems like the only safe option. &#160;Adding version checks doesn&#8217;t solve the &#8220;dirty data in cache&#8221; problem and it could leave data and locks linked to the old layout staying in cache in a weird state where they were inaccessible but still present.&lt;/p&gt;

&lt;p&gt;I&#8217;m not quite sure how that flush would interact with ongoing I/Os, but I don&#8217;t think I see another safe way to do this.&lt;/p&gt;</comment>
                            <comment id="323485" author="paf0186" created="Fri, 21 Jan 2022 16:08:09 +0000"  >&lt;p&gt;And to answer the unasked question about the cost of doing that - yes, it would be expensive and slow if a file is open many places. &#160;But if a file is open many places why are you changing the layout? &#160;Ugh - I note this interacts poorly with SEL, which would&#160;change the layout of a file open many places and rely on not destroying the cache to keep it reasonable.&lt;/p&gt;

&lt;p&gt;I don&#8217;t know if we can respect that (and I don&#8217;t know if SEL has ever come in to real use. &#160;I suspect not.).&lt;/p&gt;</comment>
                            <comment id="323487" author="bzzz" created="Fri, 21 Jan 2022 16:15:02 +0000"  >&lt;p&gt;probably cache invalidation should be driven indirectly by layout version change on OST, not by layout lock itself.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;I&#8217;m also a bit concerned about pending writes in cache&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15300&quot; title=&quot;mirror resync can cause EIO to unrelated applications&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15300&quot;&gt;&lt;del&gt;LU-15300&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="323489" author="adilger" created="Fri, 21 Jan 2022 16:21:39 +0000"  >&lt;p&gt;Hmm, I ran a manual test to see if I could create the &quot;fast read from stale mirror&quot; problem, but it seems to be working as it should. Steps were roughly as follows:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;client1# cp /etc/hosts /mnt/testfs/flr
client1# lfs mirror extend -N -c1 /mnt/testfs/flr
client1# while sleep 1; do dd if=/mnt/testfs/flr bs=20 count=1; done &amp;amp;
[verify via ldlm.namespaces.*.lru_size lock is on mirror#1]
client2# lfs mirror write -N2 -i /etc/passwd /mnt/testfs/flr
[note &quot;lfs mirror write&quot; does not mark mirror#1 stale]
[client1 still reading /etc/hosts from mirror#1]
client2# lfs setstripe --comp-set -I 65537 --comp-flags stale /mnt/testfs/flr
[client1 immediately changed to /etc/passwd data]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So it appears this is working correctly, and client1 either had its DLM lock on the stale object cancelled, or its layout lock update caused it to dump the local page cache for the stale object. IMHO, having the OST extent locks on the stale objects cancelled would be more robust, as this avoids dependency on the layout lock being updated on the client and any interaction with fast reads. &lt;/p&gt;

&lt;p&gt;Side notes:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;&quot;&lt;tt&gt;--comp-set&lt;/tt&gt;&quot; is not listed in the &quot;&lt;tt&gt;lfs setstripe&lt;/tt&gt;&quot; usage message, so does not appear in tab completion (fixed in patch &lt;a href=&quot;https://review.whamcloud.com/41423&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/41423&lt;/a&gt; &quot;&lt;tt&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14385&quot; title=&quot;add more test cases to verify &amp;quot;lfs setstripe&amp;quot; options&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14385&quot;&gt;&lt;del&gt;LU-14385&lt;/del&gt;&lt;/a&gt; tests: verify lfs setstripe comp-flags and flags options&lt;/tt&gt;&quot; on master).&lt;/li&gt;
	&lt;li&gt;it would be useful to have an &quot;&lt;tt&gt;lfs mirror&lt;/tt&gt;&quot; equivalent to mark a mirror stale&lt;/li&gt;
	&lt;li&gt;it would be useful to have an option to &quot;&lt;tt&gt;lfs mirror write&lt;/tt&gt;&quot; to mark the other mirrors stale. Unlike normal writes, this would allow selecting a specific mirror for write.&lt;/li&gt;
	&lt;li&gt;this test case is the only one so far that I wanted to be able to split prefer-read from prefer-write, to verify that the read was mirror#1 and the write was mirror#2&lt;/li&gt;
	&lt;li&gt;there wasn&apos;t an easy way to verify which mirror the client1 reads were using and which objects pages were cached. I cleared the DLM cache and used &lt;tt&gt;lru_count&lt;/tt&gt; to see which OSC had a lock, but not great. (&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14858&quot; title=&quot;kernfs tree to dump/traverse ldlm locks&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14858&quot;&gt;LU-14858&lt;/a&gt;)&lt;/li&gt;
	&lt;li&gt;the &lt;tt&gt;llite.*.dump_page_cache&lt;/tt&gt; output only showed the MDT FID and not the OST FID used.&lt;/li&gt;
	&lt;li&gt;it would be very useful if dump_page cache had a &quot;comment&quot; line at the top to describe the fields&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="323497" author="adilger" created="Fri, 21 Jan 2022 16:31:44 +0000"  >&lt;p&gt;I don&apos;t think we need to  cancel locks for every version update, since that happens frequently. Only for updates that mark components stale, which is rarely. &lt;/p&gt;</comment>
                            <comment id="323528" author="paf0186" created="Fri, 21 Jan 2022 20:24:05 +0000"  >&lt;p&gt;So what I&apos;ve learned so far is specifically the setstripe command to change the flags&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;lfs setstripe --comp-set -I 65537 --comp-flags stale /mnt/testfs/flr &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Appears to trigger a flush.&lt;/p&gt;

&lt;p&gt;Specifically, it calls &lt;tt&gt;llapi_file_flush()&lt;/tt&gt; (internally &lt;tt&gt;ioctl(LL_IOC_DATA_VERSION)&lt;/tt&gt;), which sets &lt;tt&gt;LL_DV_WR_FLUSH&lt;/tt&gt; in a command it sends to all the OSTs.&lt;/p&gt;

&lt;p&gt;And that&apos;s turned in to:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (dv-&amp;gt;dv_flags &amp;amp; LL_DV_WR_FLUSH)
&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; oa-&amp;gt;o_flags |= OBD_FL_FLUSH; &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Which causes the OST to flush.&lt;/p&gt;

&lt;p&gt;So changing the prefer flag manually doesn&apos;t work.&lt;/p&gt;

&lt;p&gt;Andreas, your suggestion to expose the mirror write and mirror read &apos;prefer&apos; flags separately suddenly seems relevant, because I need to read data from one mirror then write data to the other without changing the layout.&#160; I could also achieve this by taking one mirror temporarily offline but that&apos;s kind of challenging on my setup...&#160; I&apos;ll think about it.&lt;/p&gt;</comment>
                            <comment id="323536" author="adilger" created="Fri, 21 Jan 2022 20:52:02 +0000"  >&lt;p&gt;I see that &lt;tt&gt;llapi_file_flush()&lt;/tt&gt; is also called in &lt;tt&gt;mirror_extend_file()&lt;/tt&gt;, but &lt;b&gt;not&lt;/b&gt; in &lt;tt&gt;llapi_mirror_resync_many()&lt;/tt&gt;, which would be a simple change to make, but doesn&apos;t solve the whole problem.  Alternately, the resync should probably be bracketed with an &lt;tt&gt;llapi_get_data_version()&lt;/tt&gt; call at the start/end so that it can verify that the data hasn&apos;t changed during resync?  Otherwise, I don&apos;t see how we are certain that the file hasn&apos;t been modified since the resync started?&lt;/p&gt;


&lt;p&gt;&lt;b&gt;aside&lt;/b&gt;: the use of &lt;tt&gt;llapi_get_data_version()&lt;/tt&gt; in &lt;tt&gt;lsom_update_one()&lt;/tt&gt; for the &quot;flush client data&quot; side effect should be replaced with &lt;tt&gt;llapi_file_flush()&lt;/tt&gt; to make this explicit, and avoid problems in the future.&lt;/p&gt;</comment>
                            <comment id="323538" author="paf0186" created="Fri, 21 Jan 2022 21:06:01 +0000"  >&lt;p&gt;I &lt;b&gt;believe&lt;/b&gt; mirror resync is going to depend on the FLR state over on the MDS, that state machine with the cycle from readonly to stale, etc.&#160; So it doesn&apos;t need to verify anything - It controls the state until the resync is complete, then once it&apos;s complete, well, things can change after that.&lt;/p&gt;

&lt;p&gt;That&apos;s just what I &lt;b&gt;believe&lt;/b&gt;, though.&lt;/p&gt;</comment>
                            <comment id="323539" author="paf0186" created="Fri, 21 Jan 2022 21:22:02 +0000"  >&lt;p&gt;OK, I (finally) found it.&#160; We&apos;re fine for reads, though I still need to double check on writes.&#160; The comment in ll_readpage needs to be corrected.&lt;/p&gt;

&lt;p&gt;I noticed when a mirror became stale, the data stayed in cache.&#160; But the stale data was being reliably destroyed at the start of the next I/O, which confused me.&#160; I finally figured it out.&lt;/p&gt;

&lt;p&gt;When the layout lock has been lost, when we next &lt;b&gt;open&lt;/b&gt; the file (ie, before we go to read it), we end up calling in to ll_iget - it looks like we clear some part of the object metadata, I&apos;m not quite sure what - and that calls in to the layout conf code.&lt;/p&gt;

&lt;p&gt;That eventually calls in to lov_layout_change, which calls cl_object_prune and destroys all of the OSC objects associated with the file, which destroys the cached pages.&#160; All before the &apos;read&apos; code starts.&#160; This &lt;b&gt;should&lt;/b&gt; also work for a file which is held open because I believe Lustre should force a dentry refresh, etc, which should have the same effect?&lt;/p&gt;

&lt;p&gt;I&apos;ll talk about writes in a second.&lt;/p&gt;</comment>
                            <comment id="323540" author="paf0186" created="Fri, 21 Jan 2022 21:24:05 +0000"  >&lt;p&gt;As I reflect on pending writes, I realize I forgot the basic FLR lifecycle.&lt;/p&gt;

&lt;p&gt;If there is a pending write (ie, dirty data), the mirrors can&apos;t be in sync.&#160; The I/O which created that dirty data brings the mirrors out of sync.&#160; And bringing them back in to sync &lt;b&gt;does&lt;/b&gt; flush all the data - that&apos;s a requirement when implementing lfs mirror resync.&lt;/p&gt;

&lt;p&gt;So writes are fine too.&#160; I think this is all fine except that comment is misleading (and probably a leftover from an earlier design).&lt;/p&gt;</comment>
                            <comment id="323582" author="bzzz" created="Sat, 22 Jan 2022 05:28:18 +0000"  >&lt;p&gt;please check the original description - it was about early setting of PG_uptodate before the 2nd layout check. if the 2nd check is not required, then why is it here?&lt;/p&gt;</comment>
                            <comment id="323584" author="adilger" created="Sat, 22 Jan 2022 07:36:34 +0000"  >&lt;blockquote&gt;
&lt;p&gt;When the layout lock has been lost, when we next open the file (ie, before we go to read it), we end up calling in to ll_iget - it looks like we clear some part of the object metadata, I&apos;m not quite sure what - and that calls in to the layout conf code.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;So this depends on the client being properly notified about the layout lock and canceling its own cache.  It would be more robust if the OST extent lock was revoked from objects in the stale mirror(s) at the time they are marked stale, so that the client couldn&apos;t continue to read from the old objects.&lt;/p&gt;

&lt;p&gt;This won&apos;t help in the case of a totally isolated client (i.e. not getting callbacks from either MDS or OSS), but it does help in the case of the client being evicted/incommunicado with the MDS, but still able to contact the OSS to read data.&lt;/p&gt;</comment>
                            <comment id="323608" author="gerrit" created="Sat, 22 Jan 2022 16:49:05 +0000"  >&lt;p&gt;&quot;Alex Zhuravlev &amp;lt;bzzz@whamcloud.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/46273&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/46273&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15431&quot; title=&quot;read may return stale data from replicated file&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15431&quot;&gt;&lt;del&gt;LU-15431&lt;/del&gt;&lt;/a&gt; tests: a demo&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: a6662c7ba1d1785e10bfacbc564a0d9d3ed4f44c&lt;/p&gt;</comment>
                            <comment id="323609" author="bzzz" created="Sat, 22 Jan 2022 16:51:05 +0000"  >&lt;p&gt;please, have a look at &lt;a href=&quot;https://review.whamcloud.com/#/c/46273/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/#/c/46273/&lt;/a&gt;&lt;br/&gt;
basically the test reads inbetween llapi_file_flush() and layout merge in lod_xattr_set(), then repeat the same read in the same process with the same old fd and gets old data. unless I missed something.&lt;br/&gt;
the output:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
[   48.723121] Lustre: Mounted lustre-client
AAAA written:
osd-ldiskfs.lustre-OST0000.nonrotational=0
osd-ldiskfs.lustre-OST0001.nonrotational=0
osd-ldiskfs.lustre-OST0001.nonrotational=1
fail_loc=0x16f
fail_val=4
create replicate in background
[   49.049398] LustreError: 5192:0:(fail.c:138:__cfs_fail_timeout_set()) cfs_fail_timeout id 16f sleeping &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; 4ms
[   49.150019] LustreError: 5192:0:(fail.c:149:__cfs_fail_timeout_set()) cfs_fail_timeout id 16f awake
[   49.154344] lustre-OST0000: OST_READ from [0x100000000:0x3:0x0]
read right before layout merge, after data flush
[   49.549402] lustre-OST0001: OST_READ from [0x100010000:0x2:0x0]
wait &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; replication
replication done
BBBB written
fast read from pagecache
DATA: AAAA
DATA: AAAA
read via &lt;span class=&quot;code-keyword&quot;&gt;new&lt;/span&gt; open(2)
[   51.576003] lustre-OST0000: OST_READ from [0x100000000:0x3:0x0]
DATA: BBBB
[   51.586037] lustre-OST0000: OST_READ from [0x100000000:0x3:0x0]
DATA: BBBB
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="323633" author="bzzz" created="Sun, 23 Jan 2022 19:07:14 +0000"  >&lt;p&gt;here is another test, with no injections with cfs_fail:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
test_2000() {
	local tf=$DIR/$tfile
	local osts=$(comma_list $(osts_nodes))

	mkdir -p $MOUNT2 &amp;amp;&amp;amp; mount_client $MOUNT2

	# to make replica on ost1 preferred &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;new&lt;/span&gt; writes
	do_nodes $osts \
		$LCTL set_param osd*.*OST*.nonrotational=0
	do_nodes $osts \
		$LCTL set_param osd*.*OST0001*.nonrotational=1

	$LFS setstripe -c1 -i0 $tf || errro &lt;span class=&quot;code-quote&quot;&gt;&quot;can&apos;t create $tf&quot;&lt;/span&gt;
	echo &lt;span class=&quot;code-quote&quot;&gt;&quot;AAAA&quot;&lt;/span&gt; &amp;gt;$tf
	$LFS mirror extend -N -o1 $tf || error &lt;span class=&quot;code-quote&quot;&gt;&quot;can&apos;t make replica&quot;&lt;/span&gt;
	log2 &lt;span class=&quot;code-quote&quot;&gt;&quot;replicated file created&quot;&lt;/span&gt;

	cancel_lru_locks mdc
	cancel_lru_locks osc

	log2 &lt;span class=&quot;code-quote&quot;&gt;&quot;open(O_RDONLY) and first read from OST&quot;&lt;/span&gt;
	$MULTIOP $tf oO_RDONLY:P4_z0P4_z0P4c &amp;amp;
	PID=$!
	sleep 1
	log2 &lt;span class=&quot;code-quote&quot;&gt;&quot;first read complete&quot;&lt;/span&gt;

	echo &lt;span class=&quot;code-quote&quot;&gt;&quot;BBBB&quot;&lt;/span&gt; &amp;gt;/tmp&lt;span class=&quot;code-comment&quot;&gt;//$tfile
&lt;/span&gt;	dd &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt;=/tmp/$tfile of=$DIR2/$tfile conv=notrunc &amp;gt;&amp;amp;/dev/&lt;span class=&quot;code-keyword&quot;&gt;null&lt;/span&gt; ||
		error &lt;span class=&quot;code-quote&quot;&gt;&quot;can&apos;t write BBBB&quot;&lt;/span&gt;
	log2 &lt;span class=&quot;code-quote&quot;&gt;&quot;BBBB written which made replica on ost1 stale&quot;&lt;/span&gt;

	log2 &lt;span class=&quot;code-quote&quot;&gt;&quot;fast read from pagecache in the original process&quot;&lt;/span&gt;
	kill -USR1 $PID
	sleep 1

	log2 &lt;span class=&quot;code-quote&quot;&gt;&quot;read via $DIR2 &lt;span class=&quot;code-keyword&quot;&gt;new&lt;/span&gt; open(2)&quot;&lt;/span&gt;
	$MULTIOP $DIR2/$tfile oO_RDONLY:P4c

	log2 &lt;span class=&quot;code-quote&quot;&gt;&quot;fast read from pagecache after 5s in the original process&quot;&lt;/span&gt;
	sleep 5
	kill -USR1 $PID
	wait $PID

	log2 &lt;span class=&quot;code-quote&quot;&gt;&quot;read via $DIR &lt;span class=&quot;code-keyword&quot;&gt;new&lt;/span&gt; open(2)&quot;&lt;/span&gt;
	$MULTIOP $tf oO_RDONLY:P4c
}
run_test 2000 &lt;span class=&quot;code-quote&quot;&gt;&quot;test2&quot;&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and the output:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
== sanity-flr test 2000: test2 =========================== 20:26:32 (1642969592)
[   47.814975] Lustre: DEBUG MARKER: == sanity-flr test 2000: test2 =========================== 20:26:32 (1642969592)
Starting client: tmp.y4sbWzerEx:  -o user_xattr,flock tmp.y4sbWzerEx@tcp:/lustre /mnt/lustre2
[   47.982943] Lustre: Mounted lustre-client
osd-ldiskfs.lustre-OST0000.nonrotational=0
osd-ldiskfs.lustre-OST0001.nonrotational=0
osd-ldiskfs.lustre-OST0001.nonrotational=1
[   48.133925] Lustre: DEBUG MARKER: replicated file created
[   48.175340] Lustre: DEBUG MARKER: open(O_RDONLY) and first read from OST
DATA: AAAA in 6318
[   49.196269] Lustre: DEBUG MARKER: first read complete
[   49.222184] Lustre: DEBUG MARKER: BBBB written which made replica on ost1 stale
[   49.229754] Lustre: DEBUG MARKER: fast read from pagecache in the original process
DATA: AAAA in 6318
[   50.245148] Lustre: DEBUG MARKER: read via /mnt/lustre2 &lt;span class=&quot;code-keyword&quot;&gt;new&lt;/span&gt; open(2)
DATA: BBBB in 6326
[   50.261963] Lustre: DEBUG MARKER: fast read from pagecache after 5s in the original process
DATA: AAAA in 6318
[   55.279692] Lustre: DEBUG MARKER: read via &lt;span class=&quot;code-keyword&quot;&gt;new&lt;/span&gt; open(2)
DATA: BBBB in 6330
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;notice read via 2nd client (/mnt/lustre2) gets BBBB and still after that the first client reads AAAA again (I checked the logs - it was fast read path).&lt;/p&gt;

&lt;p&gt;I used a trivial change to multiop: command P reads &amp;lt;num&amp;gt; bytes and prints as a string.&lt;/p&gt;</comment>
                            <comment id="323677" author="gerrit" created="Mon, 24 Jan 2022 13:12:03 +0000"  >&lt;p&gt;&quot;Alex Zhuravlev &amp;lt;bzzz@whamcloud.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/46282&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/46282&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15431&quot; title=&quot;read may return stale data from replicated file&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15431&quot;&gt;&lt;del&gt;LU-15431&lt;/del&gt;&lt;/a&gt; llite: skip fast reads if layout is invalid&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 371fa1b9153a0ea765f5aea73b35e00a25be0b97&lt;/p&gt;</comment>
                            <comment id="323718" author="bzzz" created="Mon, 24 Jan 2022 19:05:21 +0000"  >&lt;blockquote&gt;
&lt;p&gt;So this depends on the client being properly notified about the layout lock and canceling its own cache. It would be more robust if the OST extent lock was revoked from objects in the stale mirror(s) at the time they are marked stale, so that the client couldn&apos;t continue to read from the old objects.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;I think there are two options:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;update LV (layout version) on now-stale OST objects, which in turn must cancel all OST locks&lt;/li&gt;
	&lt;li&gt;just stop to trust pagecache upon LL cancellation (implmented in the patch above)&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="341442" author="JIRAUSER17312" created="Mon, 25 Jul 2022 14:36:13 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=bzzz&quot; class=&quot;user-hover&quot; rel=&quot;bzzz&quot;&gt;bzzz&lt;/a&gt;&#160;&lt;/p&gt;

&lt;p&gt;What are the next steps here? I thought we went with option 2, however with your patch above, shadow is saying -1. Can you please advise?&lt;/p&gt;</comment>
                            <comment id="341561" author="bzzz" created="Tue, 26 Jul 2022 09:08:46 +0000"  >&lt;blockquote&gt;&lt;p&gt;What are the next steps here? I thought we went with option 2, however with your patch above, shadow is saying -1. Can you please advise?&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;it&apos;d be great it Patrick could step in and review the patch.&lt;/p&gt;</comment>
                            <comment id="377519" author="paf0186" created="Wed, 5 Jul 2023 16:36:30 +0000"  >&lt;p&gt;I&apos;m not 100% sure we&apos;ve closed all the races here, but I think the patch is good.&#160; Just noting we might still have something that could happen, not 100% sure.&lt;/p&gt;</comment>
                            <comment id="378107" author="bzzz" created="Mon, 10 Jul 2023 08:30:25 +0000"  >&lt;p&gt;I&apos;m 100% sure hte patch closes at least one hole &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;</comment>
                            <comment id="378627" author="gerrit" created="Fri, 14 Jul 2023 02:50:12 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/46282/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/46282/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15431&quot; title=&quot;read may return stale data from replicated file&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15431&quot;&gt;&lt;del&gt;LU-15431&lt;/del&gt;&lt;/a&gt; llite: skip fast reads if layout is invalid&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: fe2fafa1af7edc251009e3fbd46665e05573bf4a&lt;/p&gt;</comment>
                            <comment id="378695" author="pjones" created="Fri, 14 Jul 2023 12:46:10 +0000"  >&lt;p&gt;Landed for 2.16&lt;/p&gt;</comment>
                            <comment id="394295" author="gerrit" created="Mon, 27 Nov 2023 11:15:35 +0000"  >&lt;p&gt;&quot;Etienne AUJAMES &amp;lt;eaujames@ddn.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/53247&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/53247&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15431&quot; title=&quot;read may return stale data from replicated file&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15431&quot;&gt;&lt;del&gt;LU-15431&lt;/del&gt;&lt;/a&gt; llite: skip fast reads if layout is invalid&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_15&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 2cfbd62a4bbcdb59c8a54ddc49b8617bbb363fc4&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="67916">LU-15434</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="54095">LU-11695</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10092" key="com.pyxis.greenhopper.jira:gh-epic-link">
                        <customfieldname>Epic Link</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>EX-4394</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i02ebb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>