<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:25:34 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-16276] stale data read with simple IOR testing.</title>
                <link>https://jira.whamcloud.com/browse/LU-16276</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;CLIO violates a Linux kernel MM protocol. &lt;br/&gt;
 Linux kernel expect vmpage ref will released immedetely after&lt;br/&gt;
    page-&amp;gt;private clear. But CLIO broke it.&lt;br/&gt;
    It caused race ll_releasepage vs bl ast handler,&lt;br/&gt;
    ll_releasepage remove a page-&amp;gt;private, but bl_ast handler take a&lt;br/&gt;
    cl_page reference in same time.&lt;br/&gt;
    It caused vmpage still in the mapping after __remove_mapping call,&lt;br/&gt;
    because vmpage-&amp;gt;_refcount isn&apos;t decresed.&lt;br/&gt;
    So we needs to stay with kernel protocol and release a pageref after&lt;br/&gt;
    cl_page_delete call.&lt;/p&gt;

&lt;p&gt;lustre debug logs indicate it&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000008:00000001:9.0:1666910059.632700:0:5016:0:(osc_cache.c:3088:osc_page_gang_lookup()) Process entered

bl ast enter and interrrupted by ll_releasepage aka cache flush.
but cl_page ref was hold where

00000020:00000001:8.0:1666910059.632703:0:11668:0:(cl_page.c:545:cl_vmpage_page()) Process leaving (rc=18446624413482391544 : -119660227160072 : ffff932b6eaa97f8)
00000020:00000001:8.0:1666910059.632708:0:11668:0:(cl_page.c:444:cl_page_state_set0()) page@ffff932b6eaa97f8[3 ffff932b5cd3f2b0 1 1 0000000000000000]
00000020:00000001:8.0:1666910059.632709:0:11668:0:(cl_page.c:445:cl_page_state_set0()) page fffff2cc04e941c0 map ffff932c62810218 index 82632 flags 17ffffc0002015 count 3 priv ffff932b6eaa97f8:
00000020:00000001:8.0:1666910059.633545:0:11668:0:(cl_page.c:489:cl_pagevec_put()) page@ffff932b6eaa97f8[2 ffff932b5cd3f2b0 5 1 0000000000000000]
00000020:00000001:8.0:1666910059.633546:0:11668:0:(cl_page.c:490:cl_pagevec_put()) page fffff2cc04e941c0 map ffff932c62810218 index 82632 flags 17ffffc0000015 count 3 priv 0:
00000080:00008000:8.0:1666910059.633548:0:11668:0:(rw26.c:175:ll_releasepage()) page fffff2cc04e941c0 map ffff932c62810218 index 82632 flags 17ffffc0000015 count 3 priv 0: clpage ffff932b6eaa97f8 : 1
ll_releasepage exit and expect to free a cl_page but ref hold by BL AST thread.
and vmpage still with 3 refs while __remove_mapping whats 2. 
so __remove_mapping will fail with freeze refs.

00000020:00000001:9.0:1666910059.642999:0:5016:0:(cl_page.c:489:cl_pagevec_put()) page@ffff932b6eaa97f8[1 ffff932b5cd3f2b0 5 1 0000000000000000]
00000020:00000001:9.0:1666910059.643000:0:5016:0:(cl_page.c:490:cl_pagevec_put()) page fffff2cc04e941c0 map ffff932c62810218 index 82632 flags 17ffffc0000014 count 2 priv 0:
00000020:00000010:9.0:1666910059.643003:0:5016:0:(cl_page.c:178:__cl_page_free()) slab-freed &apos;cl_page&apos;: 472 at ffff932b6eaa97f8.

cl_page freed -&amp;gt; vmpage ref released, vmpage with 2refs and it may removed from pagecache, but none want&apos;s to do it and uptodate page still in pagecache.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;bug introduced &lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;

fbf5870b984 (nikita         2008-11-07 23:54:43 +0000  56) &lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; void vvp_page_fini_common(struct ccc_page *cp)
fbf5870b984 (nikita         2008-11-07 23:54:43 +0000  57) {
fbf5870b984 (nikita         2008-11-07 23:54:43 +0000  58)         cfs_page_t *vmpage = cp-&amp;gt;cpg_page;
fbf5870b984 (nikita         2008-11-07 23:54:43 +0000  59)
fbf5870b984 (nikita         2008-11-07 23:54:43 +0000  60)         LASSERT(vmpage != NULL);
fbf5870b984 (nikita         2008-11-07 23:54:43 +0000  61)         page_cache_release(vmpage);
fbf5870b984 (nikita         2008-11-07 23:54:43 +0000  62)         OBD_SLAB_FREE_PTR(cp, vvp_page_kmem);
fbf5870b984 (nikita         2008-11-07 23:54:43 +0000  63) }

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment></environment>
        <key id="72974">LU-16276</key>
            <summary>stale data read with simple IOR testing.</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="shadow">Alexey Lyashkov</assignee>
                                    <reporter username="shadow">Alexey Lyashkov</reporter>
                        <labels>
                    </labels>
                <created>Fri, 28 Oct 2022 13:53:15 +0000</created>
                <updated>Wed, 14 Dec 2022 16:04:37 +0000</updated>
                                            <version>Lustre 2.15.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="356430" author="shadow" created="Wed, 14 Dec 2022 16:04:37 +0000"  >&lt;p&gt;in fact this bug was don&apos;t seen until.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;commit d033f2f120abc20374535de7bc28d2dd385c8181
Author: Jinshan Xiong &amp;lt;jinshan.xiong@whamcloud.com&amp;gt;
Date:   Tue Apr 17 21:40:24 2012 -0700
    LU-1320 llite: fix a race between readpage and releasepage
    This is a race between page stealing and readpage. If a just read
    page is stolen, readpage will find the page is not uptodate, this
    makes it panic so -EIO is returned to the reading application.
    Signed-off-by: Jinshan Xiong &amp;lt;jinshan.xiong@whamcloud.com&amp;gt;
    Change-Id: Ib16d12d3bc3cc8c0545aa27f0836e4fd89c3a809
    Reviewed-on: http://review.whamcloud.com/2591
    Reviewed-by: Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;
    Tested-by: Hudson
    Reviewed-by: Bobi Jam &amp;lt;bobijam@whamcloud.com&amp;gt;
    Tested-by: Maloo &amp;lt;whamcloud.maloo@gmail.com&amp;gt;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This patch adds a conditionally remove a page from page cache with racy checks.&lt;/p&gt;

&lt;p&gt;In fact, these checks don&apos;t help in cases.&lt;br/&gt;
1. readpage vs drop caches. read page holds an vm ref before lock page, this check skips, and cl_page freeds at end of ll_releasepage function. so blocking ast don&apos;t found anything to work.&lt;/p&gt;

&lt;p&gt;2. active-&amp;gt;inactive LRU refill vs drop caches. and some other cases. similar case, cl_page freed, page still live in page cache.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i033yn:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>