<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:32:28 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-3274] osc_cache.c:1774:osc_dec_unstable_pages()) ASSERTION( atomic_read(&amp;cli-&gt;cl_cache-&gt;ccc_unstable_nr) &gt;= 0 ) failed</title>
                <link>https://jira.whamcloud.com/browse/LU-3274</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Hit this running replay-single in a loop&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[123247.989106] Lustre: DEBUG MARKER: == replay-single test 87b: write replay with changed data (checksum resend) == 00:41:34 (1367728894)
[123248.537768] Turning device loop1 (0x700001) read-only
[123248.562834] Lustre: DEBUG MARKER: ost1 REPLAY BARRIER on lustre-OST0000
[123248.580423] Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-OST0000
[123250.719913] Lustre: DEBUG MARKER: cancel_lru_locks osc start
[123250.880677] Lustre: DEBUG MARKER: cancel_lru_locks osc stop
[123251.389028] Removing read-only on unknown block (0x700001)
[123263.314546] LDISKFS-fs (loop1): recovery complete
[123263.357806] LDISKFS-fs (loop1): mounted filesystem with ordered data mode. quota=on. Opts: 
[123268.399242] LustreError: 168-f: BAD WRITE CHECKSUM: lustre-OST0000 from 12345-0@lo inode [0x2000061c0:0x5:0x0] object 0x0:7458 extent [0-1048575]: client csum 7945acf6, server csum b9e6e441
[123268.477228] LustreError: 17404:0:(osc_cache.c:1774:osc_dec_unstable_pages()) ASSERTION( atomic_read(&amp;amp;cli-&amp;gt;cl_cache-&amp;gt;ccc_unstable_nr) &amp;gt;= 0 ) failed: 
[123268.477757] LustreError: 17404:0:(osc_cache.c:1774:osc_dec_unstable_pages()) LBUG
[123268.478168] Pid: 17404, comm: ptlrpcd_rcv
[123268.478381] 
[123268.478381] Call Trace:
[123268.478752]  [&amp;lt;ffffffffa0e018a5&amp;gt;] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[123268.479019]  [&amp;lt;ffffffffa0e01ea7&amp;gt;] lbug_with_loc+0x47/0xb0 [libcfs]
[123268.479981]  [&amp;lt;ffffffffa0458bcc&amp;gt;] osc_dec_unstable_pages+0x12c/0x190 [osc]
[123268.480270]  [&amp;lt;ffffffffa114d76b&amp;gt;] ptlrpc_free_committed+0x14b/0x620 [ptlrpc]
[123268.480593]  [&amp;lt;ffffffffa114f4e3&amp;gt;] after_reply+0x7a3/0xd90 [ptlrpc]
[123268.480871]  [&amp;lt;ffffffffa1154493&amp;gt;] ptlrpc_check_set+0x1093/0x1da0 [ptlrpc]
[123268.481150]  [&amp;lt;ffffffffa1180e2b&amp;gt;] ptlrpcd_check+0x55b/0x590 [ptlrpc]
[123268.481420]  [&amp;lt;ffffffffa1181373&amp;gt;] ptlrpcd+0x233/0x390 [ptlrpc]
[123268.481665]  [&amp;lt;ffffffff8105ad10&amp;gt;] ? default_wake_function+0x0/0x20
[123268.481963]  [&amp;lt;ffffffffa1181140&amp;gt;] ? ptlrpcd+0x0/0x390 [ptlrpc]
[123268.482220]  [&amp;lt;ffffffff8100c10a&amp;gt;] child_rip+0xa/0x20
[123268.482465]  [&amp;lt;ffffffffa1181140&amp;gt;] ? ptlrpcd+0x0/0x390 [ptlrpc]
[123268.482733]  [&amp;lt;ffffffffa1181140&amp;gt;] ? ptlrpcd+0x0/0x390 [ptlrpc]
[123268.482978]  [&amp;lt;ffffffff8100c100&amp;gt;] ? child_rip+0x0/0x20
[123268.483209] 
[123268.543751] Kernel panic - not syncing: LBUG
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This is somewhat current master plust 3 more lu-2139 patches applied on top.&lt;br/&gt;
I have a crashdump in /exports/crashdumps/192.168.10.221-2013-05-05-00\:41\:57/&lt;br/&gt;
code tag: master-20130505&lt;/p&gt;</description>
                <environment></environment>
        <key id="18709">LU-3274</key>
            <summary>osc_cache.c:1774:osc_dec_unstable_pages()) ASSERTION( atomic_read(&amp;cli-&gt;cl_cache-&gt;ccc_unstable_nr) &gt;= 0 ) failed</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="jay">Jinshan Xiong</assignee>
                                    <reporter username="green">Oleg Drokin</reporter>
                        <labels>
                    </labels>
                <created>Sun, 5 May 2013 04:48:53 +0000</created>
                <updated>Fri, 7 Nov 2014 08:30:44 +0000</updated>
                            <resolved>Wed, 19 Mar 2014 17:08:51 +0000</resolved>
                                    <version>Lustre 2.4.0</version>
                                    <fixVersion>Lustre 2.6.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>8</watches>
                                                                            <comments>
                            <comment id="57716" author="green" created="Mon, 6 May 2013 07:12:46 +0000"  >&lt;p&gt;Got another hit, replay-single again /exports/crashdumps/192.168.10.216-2013-05-06-02\:27\:44/vmcore&lt;br/&gt;
This time in replay-single test 87, apparently there&apos;s some mismatch on replay? Double accounting by any chance?&lt;/p&gt;</comment>
                            <comment id="57766" author="prakash" created="Mon, 6 May 2013 20:03:31 +0000"  >&lt;p&gt;Hmm, I never saw this failure during my testing. The asssertion was added because the unstable count for a given FS should never become negative, which looks to be happening here. I&apos;m curious if the commit callback is getting called twice for certain requests, once directly &lt;tt&gt;after_reply&lt;/tt&gt;, and then again in &lt;tt&gt;ptlrpc_free_committed&lt;/tt&gt; (via &lt;tt&gt;after_reply&lt;/tt&gt;). I&apos;ll read through the patch again and see if I can piece together what might be happening here. Was this failure triggered through Maloo?&lt;/p&gt;</comment>
                            <comment id="57770" author="green" created="Mon, 6 May 2013 20:41:36 +0000"  >&lt;p&gt;no, it was not triggered on maloo, it&apos;s on my private burn-in cluster that promotes race conditions. But if you are interested, i can make the vmcore, modules and the code snapshot available somehow.&lt;/p&gt;</comment>
                            <comment id="57771" author="prakash" created="Mon, 6 May 2013 20:45:20 +0000"  >&lt;p&gt;OK. Let me see if I can get a possible explanation out of reading the code first, and I might want to look at those later to back up my theory.&lt;/p&gt;</comment>
                            <comment id="57786" author="prakash" created="Mon, 6 May 2013 23:41:36 +0000"  >&lt;p&gt;I looked over the code some more and I&apos;m still unsure how this would occur. I don&apos;t think it is a result of &lt;tt&gt;rq_commit_cb&lt;/tt&gt; getting called twice for the same request (via &lt;tt&gt;after_reply&lt;/tt&gt; and &lt;tt&gt;ptlrpc_free_committed&lt;/tt&gt;). AFAICT it will get run directly in &lt;tt&gt;after_reply&lt;/tt&gt;, or get added to the &lt;tt&gt;imp_replay_list&lt;/tt&gt; and run later via &lt;tt&gt;ptlrpc_free_committed&lt;/tt&gt;, but not both. I&apos;m running replay-single in a loop in a single node VM setup, but haven&apos;t hit the bug yet. Oleg, if you have some spare cycles it &lt;em&gt;might&lt;/em&gt; be interesting to add this patch:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;
diff --git i/lustre/osc/osc_request.c w/lustre/osc/osc_request.c
index a1c740c..ac2eaf5 100644
--- i/lustre/osc/osc_request.c
+++ w/lustre/osc/osc_request.c
@@ -2165,6 +2165,8 @@ int osc_build_rpc(const struct lu_env *env, struct client_obd *cli,
                GOTO(out, rc);
        }
 
+       LASSERT(req-&amp;gt;rq_committed == 0);
+       LASSERT(req-&amp;gt;rq_unstable == 0);
        req-&amp;gt;rq_commit_cb = brw_commit;
        req-&amp;gt;rq_interpret_reply = brw_interpret;
 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I assumed these values would be already initialized to 0, but perhaps I&apos;m wrong. If we allocate a request and it &lt;em&gt;happens&lt;/em&gt; to have &lt;tt&gt;rq_unstable&lt;/tt&gt; set to 1, and then &lt;tt&gt;brw_commit&lt;/tt&gt; is called via &lt;tt&gt;rq_commit_cb&lt;/tt&gt; without &lt;tt&gt;osc_inc_unstable_pages&lt;/tt&gt; first being called, I could see this failure happening. But I have no evidence of this yet.&lt;/p&gt;</comment>
                            <comment id="71000" author="jay" created="Thu, 7 Nov 2013 19:00:26 +0000"  >&lt;p&gt;it occurred in current master at: &lt;a href=&quot;https://maloo.whamcloud.com/sub_tests/6f5c1b80-47cb-11e3-a71f-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/sub_tests/6f5c1b80-47cb-11e3-a71f-52540035b04c&lt;/a&gt; in the console message of client 2:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;23:26:13:LustreError: 25093:0:(osc_cache.c:1807:osc_dec_unstable_pages()) ASSERTION( atomic_read(&amp;amp;cli-&amp;gt;cl_cache-&amp;gt;ccc_unstable_nr) &amp;gt;= 0 ) failed: 
23:26:13:LustreError: 25093:0:(osc_cache.c:1807:osc_dec_unstable_pages()) LBUG
23:26:13:Pid: 25093, comm: ptlrpcd_rcv
23:26:13:
23:26:13:Call Trace:
23:26:13: [&amp;lt;ffffffffa0b58895&amp;gt;] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
23:26:14: [&amp;lt;ffffffffa0b58e97&amp;gt;] lbug_with_loc+0x47/0xb0 [libcfs]
23:26:14: [&amp;lt;ffffffffa0f64f33&amp;gt;] osc_dec_unstable_pages+0x133/0x1a0 [osc]
23:26:14: [&amp;lt;ffffffffa0d601c9&amp;gt;] ptlrpc_free_committed+0x149/0x620 [ptlrpc]
23:26:14: [&amp;lt;ffffffffa0d61a74&amp;gt;] after_reply+0x7a4/0xd90 [ptlrpc]
23:26:14: [&amp;lt;ffffffffa0d66ab1&amp;gt;] ptlrpc_check_set+0xf71/0x1b40 [ptlrpc]
23:26:15: [&amp;lt;ffffffffa0d9120b&amp;gt;] ptlrpcd_check+0x53b/0x560 [ptlrpc]
23:26:15: [&amp;lt;ffffffffa0d9185b&amp;gt;] ptlrpcd+0x33b/0x3f0 [ptlrpc]
23:26:15: [&amp;lt;ffffffff81063990&amp;gt;] ? default_wake_function+0x0/0x20
23:26:15: [&amp;lt;ffffffffa0d91520&amp;gt;] ? ptlrpcd+0x0/0x3f0 [ptlrpc]
23:26:15: [&amp;lt;ffffffff81096a36&amp;gt;] kthread+0x96/0xa0
23:26:15: [&amp;lt;ffffffff8100c0ca&amp;gt;] child_rip+0xa/0x20
23:26:15: [&amp;lt;ffffffff810969a0&amp;gt;] ? kthread+0x0/0xa0
23:26:16: [&amp;lt;ffffffff8100c0c0&amp;gt;] ? child_rip+0x0/0x20
23:26:16:
23:26:16:Kernel panic - not syncing: LBUG
23:26:16:Pid: 25093, comm: ptlrpcd_rcv Not tainted 2.6.32-358.23.2.el6.x86_64 #1
23:26:16:Call Trace:
23:26:16: [&amp;lt;ffffffff8150daac&amp;gt;] ? panic+0xa7/0x16f
23:26:17: [&amp;lt;ffffffffa0b58eeb&amp;gt;] ? lbug_with_loc+0x9b/0xb0 [libcfs]
23:26:17: [&amp;lt;ffffffffa0f64f33&amp;gt;] ? osc_dec_unstable_pages+0x133/0x1a0 [osc]
23:26:17: [&amp;lt;ffffffffa0d601c9&amp;gt;] ? ptlrpc_free_committed+0x149/0x620 [ptlrpc]
23:26:17: [&amp;lt;ffffffffa0d61a74&amp;gt;] ? after_reply+0x7a4/0xd90 [ptlrpc]
23:26:17: [&amp;lt;ffffffffa0d66ab1&amp;gt;] ? ptlrpc_check_set+0xf71/0x1b40 [ptlrpc]
23:26:17: [&amp;lt;ffffffffa0d9120b&amp;gt;] ? ptlrpcd_check+0x53b/0x560 [ptlrpc]
23:26:17: [&amp;lt;ffffffffa0d9185b&amp;gt;] ? ptlrpcd+0x33b/0x3f0 [ptlrpc]
23:26:18: [&amp;lt;ffffffff81063990&amp;gt;] ? default_wake_function+0x0/0x20
23:26:18: [&amp;lt;ffffffffa0d91520&amp;gt;] ? ptlrpcd+0x0/0x3f0 [ptlrpc]
23:26:18: [&amp;lt;ffffffff81096a36&amp;gt;] ? kthread+0x96/0xa0
23:26:18: [&amp;lt;ffffffff8100c0ca&amp;gt;] ? child_rip+0xa/0x20
23:26:19: [&amp;lt;ffffffff810969a0&amp;gt;] ? kthread+0x0/0xa0
23:26:19: [&amp;lt;ffffffff8100c0c0&amp;gt;] ? child_rip+0x0/0x20
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="71097" author="jay" created="Fri, 8 Nov 2013 04:08:02 +0000"  >&lt;p&gt;patch is at: &lt;a href=&quot;http://review.whamcloud.com/8215&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/8215&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="71172" author="prakash" created="Fri, 8 Nov 2013 21:51:58 +0000"  >&lt;p&gt;Jinshan, please see &lt;a href=&quot;http://review.whamcloud.com/8219&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/8219&lt;/a&gt; . I tried to make a more understandable version of your fix from 8215. What do you think?&lt;/p&gt;</comment>
                            <comment id="79676" author="jlevi" created="Wed, 19 Mar 2014 17:08:51 +0000"  >&lt;p&gt;Patch landed to Master. Please reopen ticket if more work is needed.&lt;/p&gt;</comment>
                            <comment id="98639" author="yujian" created="Fri, 7 Nov 2014 08:30:44 +0000"  >&lt;p&gt;Here is the back-ported patch for Lustre b2_5 branch: &lt;a href=&quot;http://review.whamcloud.com/12613&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/12613&lt;/a&gt;&lt;br/&gt;
The patch depends on the patches for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2139&quot; title=&quot;Tracking unstable pages&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2139&quot;&gt;&lt;del&gt;LU-2139&lt;/del&gt;&lt;/a&gt; on Lustre b2_5 branch.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10120">
                    <name>Blocker</name>
                                            <outwardlinks description="is blocking">
                                        <issuelink>
            <issuekey id="15971">LU-2139</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvq5r:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>8111</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>