<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:26:22 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-16362] landing async readhead caused a &quot;uncovered page&quot; panic during sanityN runs</title>
                <link>https://jira.whamcloud.com/browse/LU-16362</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Async readhead have skip a &quot;locking&quot; phase&lt;/p&gt;

&lt;p&gt;c2791674260 (Wang Shilong           2019-01-21 20:23:47 +0800  658)     io-&amp;gt;ci_state = CIS_LOCKED;&lt;/p&gt;

&lt;p&gt;But separate cl io created, this caused pages sends outside of original page lock and parallel blocking AST caused an &quot;uncovered page&quot; panic hit.&lt;/p&gt;</description>
                <environment></environment>
        <key id="73469">LU-16362</key>
            <summary>landing async readhead caused a &quot;uncovered page&quot; panic during sanityN runs</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="shadow">Alexey Lyashkov</assignee>
                                    <reporter username="shadow">Alexey Lyashkov</reporter>
                        <labels>
                    </labels>
                <created>Fri, 2 Dec 2022 09:35:57 +0000</created>
                <updated>Wed, 10 Jan 2024 13:56:33 +0000</updated>
                                                                                <due></due>
                            <votes>0</votes>
                                    <watches>8</watches>
                                                                            <comments>
                            <comment id="358269" author="zam" created="Mon, 9 Jan 2023 10:33:40 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16332&quot; title=&quot;LBUG with osc_req_attr_set() uncovered page!&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16332&quot;&gt;LU-16332&lt;/a&gt; is a similar issue.&lt;/p&gt;</comment>
                            <comment id="378706" author="gerrit" created="Fri, 14 Jul 2023 13:21:59 +0000"  >&lt;p&gt;&quot;Alexey Lyashkov &amp;lt;alexey.lyashkov@hpe.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/51677&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/51677&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16362&quot; title=&quot;landing async readhead caused a &amp;quot;uncovered page&amp;quot; panic during sanityN runs&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16362&quot;&gt;LU-16362&lt;/a&gt; lov: dont&apos;t allow RA outside of active stripe&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 01966622f7b4f1c7626bc5980f9adc0d01338e1c&lt;/p&gt;</comment>
                            <comment id="390722" author="shadow" created="Thu, 26 Oct 2023 15:02:54 +0000"  >&lt;p&gt;once first patch version found an issues, I spent more time to investing it.&lt;br/&gt;
Current picture is:&lt;/p&gt;

&lt;p&gt;application started to read a data from a file. It&apos;s multi stripe file, but read want to read just first stripe only, and cl io had created to offset range assigned to first stripe.&lt;br/&gt;
once stripe size is 4M now - RA started to full rpc with this size and finished at end of stripe offset.&lt;br/&gt;
On next read, RA window moved outside of IO region requested by cl_io and outside of active locks pined by this IO.&lt;br/&gt;
In the current code, it caused a situation osc will search an old lock for the second stripe. lock might exist from past, it&apos;s OK. but RA had pined this lock just to send a prepare read for one page - and release it after it via cl_read_ahead_release().&lt;br/&gt;
It caused a situation when some pages prepared to send read before lock cancel, but some not and error returned from cl_io_read_ahead(), but previous pages still in osc extents and ready to send rpc. It caused a panic in the osc_req_attr_set when osc have a time to send.&lt;br/&gt;
This is easy reproduce with simple assertion and sanity 44A+44a (START_AT=44A).&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
@@ -1195,22 +1198,27 @@ &lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; lov_io_read_ahead(&lt;span class=&quot;code-keyword&quot;&gt;const&lt;/span&gt; struct lu_env *env,
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (unlikely(!r0-&amp;gt;lo_sub[stripe]))
                RETURN(-EIO);

        sub = lov_sub_get(env, lio, lov_comp_index(index, stripe));
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (IS_ERR(sub))
                RETURN(PTR_ERR(sub));

        &lt;span class=&quot;code-comment&quot;&gt;/* no RA outside of active stripe */&lt;/span&gt;
+       &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (sub-&amp;gt;sub_io.ci_state != CIS_IO_GOING &amp;amp;&amp;amp;
+           sub-&amp;gt;sub_io.ci_state != CIS_LOCKED)
+               LBUG();
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I&apos;m not sure what is best in this situation - just disable an RA outside of active region. or make lock pinning better to avoid situation with submitting without lock held.&lt;/p&gt;</comment>
                            <comment id="394731" author="shadow" created="Wed, 29 Nov 2023 13:49:45 +0000"  >&lt;p&gt;I looks I understand an issue finally.&lt;br/&gt;
readahead loop collects a locked pages during loop, but ldlm lock released after each iteration. So just PG_lock prevents a lock canceling during submit, bit osc_make_rpc had release a page lock after rpc created but don&apos;t send. it open window when lock might be canceled and osc_req_attr_set complains lock don&apos;t exist.&lt;br/&gt;
Lock might be canceled due aging or conlfict on the different node.&lt;br/&gt;
Right code should create an special cl_io to collect a lock and cl_io_submit_sync should be called during lock held. osc extent will hold a ldlm lock (after Bobi Jam patch     &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16160&quot; title=&quot;take ldlm lock when queue sync pages&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16160&quot;&gt;&lt;del&gt;LU-16160&lt;/del&gt;&lt;/a&gt; osc: take ldlm lock when queue sync pages) it will prevent to cancel a lock until it will don&apos;t needs.&lt;/p&gt;
</comment>
                            <comment id="394899" author="shadow" created="Thu, 30 Nov 2023 08:16:39 +0000"  >&lt;p&gt;It might be artifact of PG_writeback change. page don&apos;t unlocked until IO done before it and panic don&apos;t hit. &lt;/p&gt;</comment>
                            <comment id="399128" author="gerrit" created="Wed, 10 Jan 2024 13:56:33 +0000"  >&lt;p&gt;&quot;Alexey Lyashkov &amp;lt;alexey.lyashkov@hpe.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/53635&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/53635&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16362&quot; title=&quot;landing async readhead caused a &amp;quot;uncovered page&amp;quot; panic during sanityN runs&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16362&quot;&gt;LU-16362&lt;/a&gt; readahead: simplification&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: b41d1363be6ee45e2b19ec708be43954dde69e86&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="73354">LU-16332</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="73633">LU-16401</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i036zj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>