<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:04:05 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-134] recovery-mds-scale (FLAVOR=OSS): (filter.c:151:filter_finish_transno()) LBUG</title>
                <link>https://jira.whamcloud.com/browse/LU-134</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;While running recovery-mds-scale test with &apos;FAILURE_MODE=HARD&apos; (power off &amp;amp; on) and &apos;FLAVOR=OSS&apos; on&lt;br/&gt;
Burlington cluster, client loads failed after OSS failed over 3 times:&lt;/p&gt;

&lt;p&gt;==== Checking the clients loads AFTER  failover &amp;#8211; failure NOT OK&lt;br/&gt;
ost1 has failed over 3 times, and counting...&lt;br/&gt;
sleeping 543 seconds ...&lt;br/&gt;
tar: etc/rc.d/init.d/avahi-dnsconfd: Cannot write: No such file or directory&lt;br/&gt;
tar: etc/rc.d/init.d/avahi-dnsconfd: Cannot utime: No such file or directory&lt;br/&gt;
tar: Error exit delayed from previous errors&lt;br/&gt;
Found the END_RUN_FILE file: /testsuite/yujian/end_run_file&lt;br/&gt;
client7&lt;br/&gt;
Client load failed on node client7&lt;/p&gt;

&lt;p&gt;client client7 load stdout and debug files :&lt;br/&gt;
/tmp/recovery-mds-scale.log_run_tar.sh-client7&lt;br/&gt;
/tmp/recovery-mds-scale.log_run_tar.sh-client7.debug&lt;br/&gt;
2009-08-07 05:25:56 Terminating clients loads ...&lt;br/&gt;
Duraion:                86400&lt;br/&gt;
Server failover period: 600 seconds&lt;br/&gt;
Exited after:           1556 seconds&lt;br/&gt;
Number of failovers before exit:&lt;br/&gt;
ost1 failed  over  3 times&lt;br/&gt;
Status: FAIL: rc=1&lt;/p&gt;


&lt;p&gt;An LBUG occurred after oss1 failed over 2 times:&lt;br/&gt;
-------&lt;del&gt;8&amp;lt;&lt;/del&gt;-------&lt;br/&gt;
Lustre: 10385:0:(ldlm_lib.c:565:target_handle_reconnect()) lustre-OST0000:&lt;br/&gt;
lustre-MDT0000-mdtlov_UUID reconnecting&lt;br/&gt;
Lustre: 10385:0:(ldlm_lib.c:801:target_handle_connect()) lustre-OST0000: refuse reconnection from&lt;br/&gt;
lustre-MDT0000-mdtlov_UUID@192.168.1.4@tcp to 0xffff8102f80a1800/1&lt;br/&gt;
LustreError: 10385:0:(ldlm_lib.c:2112:target_send_reply_msg()) @@@ processing error (-16) &lt;br/&gt;
req@ffff8103788d1c00 x1310336588446872/t0(0)&lt;br/&gt;
o8-&amp;gt;lustre-MDT0000-mdtlov_UUID@NET_0x20000c0a80104_UUID:0/0 lens 368/264 e 0 to 0 dl 1249636224 ref&lt;br/&gt;
1 fl Interpret:/0/0 rc -16/0&lt;br/&gt;
LustreError: 10544:0:(ldlm_lib.c:1494:check_for_next_transno()) lustre-OST0000: waking for gap in&lt;br/&gt;
transno, VBR is OFF (skip: 367985, ql: 1, comp: 8, conn: 9, next: 367988, last_committed: 367987)&lt;br/&gt;
Lustre: 10295:0:(ldlm_lib.c:565:target_handle_reconnect()) lustre-OST0000:&lt;br/&gt;
lustre-MDT0000-mdtlov_UUID reconnecting&lt;br/&gt;
Lustre: 10295:0:(sec.c:1414:sptlrpc_import_sec_adapt()) import&lt;br/&gt;
lustre-OST0000-&amp;gt;NET_0x20000c0a80104_UUID netid 20000: select flavor null&lt;br/&gt;
Lustre: 10295:0:(sec.c:1414:sptlrpc_import_sec_adapt()) Skipped 3 previous similar messages&lt;br/&gt;
Lustre: 10345:0:(ldlm_lib.c:2009:target_queue_recovery_request()) Next recovery transno: 367989,&lt;br/&gt;
current: 367988, replaying: 1&lt;br/&gt;
LustreError: 10345:0:(filter.c:151:filter_finish_transno()) LBUG&lt;br/&gt;
Pid: 10345, comm: ll_ost_79&lt;/p&gt;

&lt;p&gt;Call Trace:&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff887a35b1&amp;gt;&amp;#93;&lt;/span&gt; libcfs_debug_dumpstack+0x51/0x60 &lt;span class=&quot;error&quot;&gt;&amp;#91;libcfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff887a3aea&amp;gt;&amp;#93;&lt;/span&gt; lbug_with_loc+0x7a/0xd0 &lt;span class=&quot;error&quot;&gt;&amp;#91;libcfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff88ce2bf0&amp;gt;&amp;#93;&lt;/span&gt; filter_finish_transno+0x1b0/0x4f0 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdfilter&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff88c2174c&amp;gt;&amp;#93;&lt;/span&gt; journal_callback_set+0x30/0x44 &lt;span class=&quot;error&quot;&gt;&amp;#91;jbd&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff88cf68d0&amp;gt;&amp;#93;&lt;/span&gt; filter_cancel_cookies_cb+0x0/0x5b0 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdfilter&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff88cc158c&amp;gt;&amp;#93;&lt;/span&gt; fsfilt_ldiskfs_add_journal_cb+0x2dc/0x310 &lt;span class=&quot;error&quot;&gt;&amp;#91;fsfilt_ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff88ce660f&amp;gt;&amp;#93;&lt;/span&gt; filter_setattr_internal+0x15ef/0x1d70 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdfilter&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff88cdbf82&amp;gt;&amp;#93;&lt;/span&gt; filter_fid2dentry+0x512/0x740 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdfilter&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff88cddcdc&amp;gt;&amp;#93;&lt;/span&gt; __filter_oa2dentry+0x4c/0x250 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdfilter&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8000d762&amp;gt;&amp;#93;&lt;/span&gt; dput+0x23/0x10a&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff88ce7224&amp;gt;&amp;#93;&lt;/span&gt; filter_setattr+0x494/0x670 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdfilter&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff88947e1a&amp;gt;&amp;#93;&lt;/span&gt; lustre_pack_reply_v2+0x23a/0x2f0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff88947fb4&amp;gt;&amp;#93;&lt;/span&gt; lustre_pack_reply_flags+0xe4/0x1e0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff88ca6aba&amp;gt;&amp;#93;&lt;/span&gt; ost_handle+0x2a7a/0x5b3d &lt;span class=&quot;error&quot;&gt;&amp;#91;ost&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff887ae66d&amp;gt;&amp;#93;&lt;/span&gt; libcfs_debug_vmsg2+0x6fd/0x9d0 &lt;span class=&quot;error&quot;&gt;&amp;#91;libcfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff88bdbde7&amp;gt;&amp;#93;&lt;/span&gt; vvp_session_key_init+0x147/0x1d0 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff88952750&amp;gt;&amp;#93;&lt;/span&gt; ptlrpc_server_handle_request+0x950/0xff0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8008881c&amp;gt;&amp;#93;&lt;/span&gt; __wake_up_common+0x3e/0x68&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff88957af6&amp;gt;&amp;#93;&lt;/span&gt; ptlrpc_main+0x13a6/0x1530 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8008a3f2&amp;gt;&amp;#93;&lt;/span&gt; default_wake_function+0x0/0xe&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff800b48d6&amp;gt;&amp;#93;&lt;/span&gt; audit_syscall_exit+0x327/0x342&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8005dfb1&amp;gt;&amp;#93;&lt;/span&gt; child_rip+0xa/0x11&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff88956750&amp;gt;&amp;#93;&lt;/span&gt; ptlrpc_main+0x0/0x1530 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8005dfa7&amp;gt;&amp;#93;&lt;/span&gt; child_rip+0x0/0x11&lt;/p&gt;


&lt;p&gt;Jay&apos;s analysis: &lt;a href=&quot;https://bugzilla.lustre.org/show_bug.cgi?id=20394#c5&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://bugzilla.lustre.org/show_bug.cgi?id=20394#c5&lt;/a&gt;&lt;br/&gt;
After discussing w/ tappro - the expert of recovery, he is so kind that I learnt many things about&lt;br/&gt;
recovery, this assertion might be over-strict because:&lt;/p&gt;

&lt;p&gt;        if (last_rcvd &amp;lt;= le64_to_cpu(lcd-&amp;gt;lcd_last_transno)) &lt;/p&gt;
{
                spin_unlock(&amp;amp;filter-&amp;gt;fo_translock);
                LBUG();
        }

&lt;p&gt;last_rcvd likely equals to lcd-&amp;gt;lcd_last_transno because the transaction of setattr might already&lt;br/&gt;
been committed, but the server doesn&apos;t have a chance to send the reply to the client side, which&lt;br/&gt;
then causes the request being handled immediately. For this case, at lease assertion for last_rcvd&lt;br/&gt;
== lcd-&amp;gt;lcd_last_transno might be (wrongly) hit.&lt;/p&gt;</description>
                <environment></environment>
        <key id="10456">LU-134</key>
            <summary>recovery-mds-scale (FLAVOR=OSS): (filter.c:151:filter_finish_transno()) LBUG</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="bobijam">Zhenyu Xu</assignee>
                                    <reporter username="bobijam">Zhenyu Xu</reporter>
                        <labels>
                    </labels>
                <created>Wed, 16 Mar 2011 23:21:56 +0000</created>
                <updated>Thu, 17 Mar 2011 19:10:26 +0000</updated>
                            <resolved>Thu, 17 Mar 2011 19:10:26 +0000</resolved>
                                    <version>Lustre 2.0.0</version>
                                    <fixVersion>Lustre 2.1.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>1</watches>
                                                                            <comments>
                            <comment id="11177" author="tappro" created="Thu, 17 Mar 2011 01:26:53 +0000"  >&lt;p&gt;this bug existed in 1.8 only, HEAD checks are correct, the only good thing we can take - the latest patch is better than pure assert. Meanwhile we have already &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-128&quot; title=&quot;OSSs frequent crashes due to LBUG/[ASSERTION(last_rcvd&amp;gt;=le64_to_cpu(lcd-&amp;gt;lcd_last_transno)) failed] in recovery&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-128&quot;&gt;&lt;del&gt;LU-128&lt;/del&gt;&lt;/a&gt; for the same thing and its patch is clever a bit. Therefore, while &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-134&quot; title=&quot;recovery-mds-scale (FLAVOR=OSS): (filter.c:151:filter_finish_transno()) LBUG&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-134&quot;&gt;&lt;del&gt;LU-134&lt;/del&gt;&lt;/a&gt; is not the same as &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-128&quot; title=&quot;OSSs frequent crashes due to LBUG/[ASSERTION(last_rcvd&amp;gt;=le64_to_cpu(lcd-&amp;gt;lcd_last_transno)) failed] in recovery&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-128&quot;&gt;&lt;del&gt;LU-128&lt;/del&gt;&lt;/a&gt; but their patches about the same code to change and &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-128&quot; title=&quot;OSSs frequent crashes due to LBUG/[ASSERTION(last_rcvd&amp;gt;=le64_to_cpu(lcd-&amp;gt;lcd_last_transno)) failed] in recovery&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-128&quot;&gt;&lt;del&gt;LU-128&lt;/del&gt;&lt;/a&gt; has better one, so &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-134&quot; title=&quot;recovery-mds-scale (FLAVOR=OSS): (filter.c:151:filter_finish_transno()) LBUG&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-134&quot;&gt;&lt;del&gt;LU-134&lt;/del&gt;&lt;/a&gt; can be closed as duplicate of it&lt;/p&gt;</comment>
                            <comment id="11200" author="bobijam" created="Thu, 17 Mar 2011 19:10:26 +0000"  >&lt;p&gt;dup of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-128&quot; title=&quot;OSSs frequent crashes due to LBUG/[ASSERTION(last_rcvd&amp;gt;=le64_to_cpu(lcd-&amp;gt;lcd_last_transno)) failed] in recovery&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-128&quot;&gt;&lt;del&gt;LU-128&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                    <customfield id="customfield_10020" key="com.atlassian.jira.plugin.system.customfieldtypes:float">
                        <customfieldname>Bugzilla ID</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>20394.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzw12f:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>10253</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>