<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:34:48 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-10407] osc_cache.c:1141:osc_extent_make_ready()) ASSERTION( last_oap_count &gt; 0 ) failed: </title>
                <link>https://jira.whamcloud.com/browse/LU-10407</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;while testing an own patch I hit a this panic.&lt;br/&gt;
As I see, it&apos;s result of not atomically counting of unstable pages.&lt;br/&gt;
osc_io_commit_async add to cache and unlock an osc object, and size will be update a short after. osc_io_unplug called from brw_queue_work and found a pages in cache, when tries to count an unstable pages number while panic a hit.&lt;/p&gt;

&lt;p&gt;this panic can easy replicated if we have delay a size updates with patch&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;diff --git a/lustre/osc/osc_io.c b/lustre/osc/osc_io.c
index 3d353324f1..5471b96ec1 100644
--- a/lustre/osc/osc_io.c
+++ b/lustre/osc/osc_io.c
@@ -267,7 +267,7 @@ &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; osc_io_commit_async(&lt;span class=&quot;code-keyword&quot;&gt;const&lt;/span&gt; struct lu_env *env,
 	struct osc_object *osc = cl2osc(ios-&amp;gt;cis_obj);
 	struct cl_page  *page;
 	struct cl_page  *last_page;
-	struct osc_page *opg;
+	struct osc_page *opg = NULL;
 	&lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; result = 0;
 	ENTRY;
 
@@ -311,9 +311,6 @@ &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; osc_io_commit_async(&lt;span class=&quot;code-keyword&quot;&gt;const&lt;/span&gt; struct lu_env *env,
 				&lt;span class=&quot;code-keyword&quot;&gt;break&lt;/span&gt;;
 		}
 
-		osc_page_touch_at(env, osc2cl(osc), osc_index(opg),
-				  page == last_page ? to : PAGE_SIZE);
-
 		cl_page_list_del(env, qin, page);
 
 		(*cb)(env, io, page);
@@ -321,6 +318,9 @@ &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; osc_io_commit_async(&lt;span class=&quot;code-keyword&quot;&gt;const&lt;/span&gt; struct lu_env *env,
 		 * complete at any time. */
 	}
 
+	osc_page_touch_at(env, osc2cl(osc), osc_index(opg),
+			  page == last_page ? to : PAGE_SIZE);
+
 	/* &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; sync write, kernel will wait &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; page to be flushed before
 	 * osc_io_end() is called, so release it earlier.
 	 * &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; mkwrite(), it&apos;s known there is no further pages. */
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;panic hit constantly with &lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;ONLY=42 REFORMAT=yes sh sanity.sh&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;typically in 42e test,.&lt;/p&gt;</description>
                <environment>RHEL 7 + master 063a83ab1fe518e52dbc7fb5f6e9d092b20f44e9.</environment>
        <key id="49916">LU-10407</key>
            <summary>osc_cache.c:1141:osc_extent_make_ready()) ASSERTION( last_oap_count &gt; 0 ) failed: </summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="hongchao.zhang">Hongchao Zhang</assignee>
                                    <reporter username="shadow">Alexey Lyashkov</reporter>
                        <labels>
                    </labels>
                <created>Tue, 19 Dec 2017 14:51:14 +0000</created>
                <updated>Thu, 14 Jan 2021 22:35:31 +0000</updated>
                            <resolved>Thu, 14 Jan 2021 22:35:31 +0000</resolved>
                                    <version>Lustre 2.11.0</version>
                                    <fixVersion>Lustre 2.13.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>10</watches>
                                                                            <comments>
                            <comment id="216732" author="jay" created="Tue, 19 Dec 2017 17:24:34 +0000"  >&lt;p&gt;the corresponding osc_extent should be in ACTIVE state so it shouldn&apos;t be picked by RPC engine.&lt;/p&gt;</comment>
                            <comment id="216765" author="pjones" created="Tue, 19 Dec 2017 19:10:20 +0000"  >&lt;p&gt;Hongchao&lt;/p&gt;

&lt;p&gt;Can you please look into this?&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="218090" author="gerrit" created="Fri, 12 Jan 2018 11:46:23 +0000"  >&lt;p&gt;Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/30848&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/30848&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10407&quot; title=&quot;osc_cache.c:1141:osc_extent_make_ready()) ASSERTION( last_oap_count &amp;gt; 0 ) failed: &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10407&quot;&gt;&lt;del&gt;LU-10407&lt;/del&gt;&lt;/a&gt; osc: update size before queue page&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 13031e6ee6eccedfb0b108115599d3d69e4577c9&lt;/p&gt;</comment>
                            <comment id="219350" author="hongchao.zhang" created="Mon, 29 Jan 2018 10:55:17 +0000"  >&lt;p&gt;Hi Alex,&lt;/p&gt;

&lt;p&gt;Which patch do you apply when the issue is triggered?&lt;/p&gt;

&lt;p&gt;Delaying the size update will cause the issue for there could some extent is full and released to be ready to write out&lt;br/&gt;
(its state will be changed to OES_CACHE) during the loop in &quot;osc_io_commit_async&quot;. but it won&apos;t if the size is updated&lt;br/&gt;
right after the &quot;osc_page_cache_add&quot; for the current active extent will be checked to be non-full before the page is added to it&lt;br/&gt;
(its state will be OES_ACTIVE)&lt;/p&gt;</comment>
                            <comment id="232638" author="green" created="Tue, 28 Aug 2018 04:32:05 +0000"  >&lt;p&gt;I just hit this in master-next running racer&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[35616.314978] LustreError: 11162:0:(osc_cache.c:1141:osc_extent_make_ready()) ASSERTION( last_oap_count &amp;gt; 0 ) failed: 
[35616.322765] LustreError: 11162:0:(osc_cache.c:1141:osc_extent_make_ready()) LBUG
[35616.326801] Pid: 11162, comm: ldlm_bl_05 3.10.0-7.5-debug #1 SMP Sun Jun 3 13:35:38 EDT 2018
[35616.329285] Call Trace:
[35616.330443]  [&amp;lt;ffffffffa01eb7dc&amp;gt;] libcfs_call_trace+0x8c/0xc0 [libcfs]
[35616.331870]  [&amp;lt;ffffffffa01eb88c&amp;gt;] lbug_with_loc+0x4c/0xa0 [libcfs]
[35616.334288]  [&amp;lt;ffffffffa0d3eda6&amp;gt;] osc_extent_make_ready+0x936/0xe70 [osc]
[35616.335640]  [&amp;lt;ffffffffa0d45ab3&amp;gt;] osc_cache_writeback_range+0x4f3/0x1260 [osc]
[35616.337641]  [&amp;lt;ffffffffa0e0bac7&amp;gt;] mdc_lock_flush+0x2e7/0x3f0 [mdc]
[35616.339070]  [&amp;lt;ffffffffa0e0bfb4&amp;gt;] mdc_ldlm_blocking_ast+0x2f4/0x3f0 [mdc]
[35616.343073]  [&amp;lt;ffffffffa0b01bd4&amp;gt;] ldlm_cancel_callback+0x84/0x320 [ptlrpc]
[35616.344357]  [&amp;lt;ffffffffa0b189b0&amp;gt;] ldlm_cli_cancel_local+0xa0/0x420 [ptlrpc]
[35616.345719]  [&amp;lt;ffffffffa0b1e7e7&amp;gt;] ldlm_cli_cancel+0x157/0x620 [ptlrpc]
[35616.347470]  [&amp;lt;ffffffffa0e0be4a&amp;gt;] mdc_ldlm_blocking_ast+0x18a/0x3f0 [mdc]
[35616.348765]  [&amp;lt;ffffffffa0b2a67f&amp;gt;] ldlm_handle_bl_callback+0xff/0x530 [ptlrpc]
[35616.351095]  [&amp;lt;ffffffffa0b2afb1&amp;gt;] ldlm_bl_thread_main+0x501/0x680 [ptlrpc]
[35616.352495]  [&amp;lt;ffffffff810ae864&amp;gt;] kthread+0xe4/0xf0
[35616.353757]  [&amp;lt;ffffffff81783777&amp;gt;] ret_from_fork_nospec_end+0x0/0x39
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="263020" author="simmonsja" created="Mon, 10 Feb 2020 17:55:43 +0000"  >&lt;p&gt;One of the patches that landed during 2.13 fixed this issue. Should be close this?&lt;/p&gt;</comment>
                            <comment id="289220" author="bzzz" created="Mon, 11 Jan 2021 18:15:35 +0000"  >&lt;p&gt;-&lt;a href=&quot;https://testing.whamcloud.com/test_sessions/0f2a364a-716a-4eec-9786-3fa81b3143c2-&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sessions/0f2a364a-716a-4eec-9786-3fa81b3143c2-&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Moved to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14326&quot; title=&quot;sanity-dom test_fsx: crash in osc_extent_make_ready()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14326&quot;&gt;&lt;del&gt;LU-14326&lt;/del&gt;&lt;/a&gt;, closing this one for 2.13 per previous comment.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="53481">LU-11463</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="62314">LU-14326</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzpv3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>