<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:09:21 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-7490] out_tx_write_exec()) LBUG</title>
                <link>https://jira.whamcloud.com/browse/LU-7490</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;The error occurred during soak testing of master branch build &apos;20151122&apos; (see &lt;a href=&quot;https://wiki.hpdd.intel.com/pages/viewpage.action?title=Soak+Testing+on+Lola&amp;amp;spaceKey=Releases#SoakTestingonLola-20151122&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://wiki.hpdd.intel.com/pages/viewpage.action?title=Soak+Testing+on+Lola&amp;amp;spaceKey=Releases#SoakTestingonLola-20151122&lt;/a&gt;). DNE is enabled. MDSes are configured in active-active failover configuration.&lt;/p&gt;

&lt;p&gt;Sequence of events:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;2015-11-26 10:32 Failover resources (mdt-0,1)  &lt;tt&gt;lola-8&lt;/tt&gt; --&amp;gt; &lt;tt&gt;lola-9&lt;/tt&gt; started&lt;/li&gt;
	&lt;li&gt;2015-11-26 11:40 Failback resources (mdt-0.1) &lt;tt&gt;lola-9&lt;/tt&gt; --&amp;gt; &lt;tt&gt;lola-8&lt;/tt&gt; completed successful&lt;/li&gt;
	&lt;li&gt;2015-11-26 11:44 LBUG on lola-8. See the following message.&lt;/li&gt;
&lt;/ul&gt;


&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Nov 26 11:44:54 lola-8 kernel: LustreError: 8491:0:(out_lib.c:692:out_tx_write_exec()) read record [0x240089779:0x1:0x0] tail_pos 173122472 rc -53 index 50635 size 172659608
Nov 26 11:44:54 lola-8 kernel: LustreError: 8491:0:(out_lib.c:693:out_tx_write_exec()) LBUG
Nov 26 11:44:54 lola-8 kernel: Pid: 8491, comm: mdt_out03_004
Nov 26 11:44:54 lola-8 kernel: 
Nov 26 11:44:54 lola-8 kernel: Call Trace:
Nov 26 11:44:54 lola-8 kernel: [&amp;lt;ffffffffa07fb875&amp;gt;] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
Nov 26 11:44:54 lola-8 kernel: [&amp;lt;ffffffffa07fbe77&amp;gt;] lbug_with_loc+0x47/0xb0 [libcfs]
Nov 26 11:44:54 lola-8 kernel: [&amp;lt;ffffffffa0bb60a0&amp;gt;] out_tx_write_exec+0x500/0x7a0 [ptlrpc]
Nov 26 11:44:54 lola-8 kernel: [&amp;lt;ffffffffa0bb934b&amp;gt;] ? out_tx_xattr_set_exec+0xeb/0x680 [ptlrpc]
Nov 26 11:44:54 lola-8 kernel: [&amp;lt;ffffffffa0bae13a&amp;gt;] out_tx_end+0xda/0x5d0 [ptlrpc]
Nov 26 11:44:54 lola-8 kernel: [&amp;lt;ffffffffa0bb3726&amp;gt;] out_handle+0xbd6/0x1890 [ptlrpc]
Nov 26 11:44:54 lola-8 kernel: [&amp;lt;ffffffffa0afa4e0&amp;gt;] ? target_bulk_timeout+0x0/0xc0 [ptlrpc]
Nov 26 11:44:54 lola-8 kernel: [&amp;lt;ffffffffa0baae1c&amp;gt;] tgt_request_handle+0x8bc/0x12e0 [ptlrpc]
Nov 26 11:44:54 lola-8 kernel: [&amp;lt;ffffffffa0b52711&amp;gt;] ptlrpc_main+0xe41/0x1910 [ptlrpc]
Nov 26 11:44:54 lola-8 kernel: [&amp;lt;ffffffff8152a39e&amp;gt;] ? thread_return+0x4e/0x7d0
Nov 26 11:44:54 lola-8 kernel: [&amp;lt;ffffffffa0b518d0&amp;gt;] ? ptlrpc_main+0x0/0x1910 [ptlrpc]
Nov 26 11:44:54 lola-8 kernel: [&amp;lt;ffffffff8109e78e&amp;gt;] kthread+0x9e/0xc0
Nov 26 11:44:54 lola-8 kernel: [&amp;lt;ffffffff8100c28a&amp;gt;] child_rip+0xa/0x20
Nov 26 11:44:54 lola-8 kernel: [&amp;lt;ffffffff8109e6f0&amp;gt;] ? kthread+0x0/0xc0
Nov 26 11:44:54 lola-8 kernel: [&amp;lt;ffffffff8100c280&amp;gt;] ? child_rip+0x0/0x20
Nov 26 11:44:54 lola-8 kernel: 
Nov 26 11:44:54 lola-8 kernel: LustreError: dumping log to /tmp/lustre-log.1448567093.8491
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Most likely this event is related to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7488&quot; title=&quot;_req_capsule_get()) ASSERTION( fmt != ((void *)(long)0x5a5a5a5a5a5a5a5a) ) failed:&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7488&quot;&gt;&lt;del&gt;LU-7488&lt;/del&gt;&lt;/a&gt; which happened almost at the same time on the HA failover partner (&lt;tt&gt;lola-9&lt;/tt&gt;)&lt;/p&gt;

&lt;p&gt;Attached console and messages log file of MDS (&lt;tt&gt;lola-8&lt;/tt&gt;), kernel debug log file mentioned in the LBUG error message and error messages extracted from Lustre client nodes messages files that showed up at the same time.&lt;/p&gt;</description>
                <environment>lola&lt;br/&gt;
build: 2.7.63-4-gf84e06e, a7eface85ea2d2aa6198681264b082a0244855d4 + patches</environment>
        <key id="33360">LU-7490</key>
            <summary>out_tx_write_exec()) LBUG</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="di.wang">Di Wang</assignee>
                                    <reporter username="heckes">Frank Heckes</reporter>
                        <labels>
                            <label>soak</label>
                    </labels>
                <created>Fri, 27 Nov 2015 11:38:37 +0000</created>
                <updated>Thu, 28 Jan 2016 16:59:32 +0000</updated>
                            <resolved>Thu, 28 Jan 2016 16:59:32 +0000</resolved>
                                    <version>Lustre 2.8.0</version>
                                    <fixVersion>Lustre 2.8.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="134818" author="di.wang" created="Tue, 1 Dec 2015 02:06:59 +0000"  >&lt;p&gt;According to the debug log&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000020:00080000:29.0:1448567093.939286:0:8491:0:(out_handler.c:89:out_check_resent()) @@@ no reply for RESENT req (have 1518924261645488)  req@ffff8808040a5450 x1518924261650748/t0(0) o1000-&amp;gt;soaked-MDT0006-mdtlov_UUID@192.168.1.111@o2ib10:694/0 lens 320/4320 e 0 to 0 dl 1448567099 ref 1 fl Interpret:/2/0 rc 0/0
00000020:00080000:29.0:1448567093.939290:0:8491:0:(out_handler.c:89:out_check_resent()) @@@ no reply for RESENT req (have 1518924261645488)  req@ffff8808040a5450 x1518924261650748/t0(0) o1000-&amp;gt;soaked-MDT0006-mdtlov_UUID@192.168.1.111@o2ib10:694/0 lens 320/4320 e 0 to 0 dl 1448567099 ref 1 fl Interpret:/2/0 rc 0/0
00000020:00080000:11.0:1448567093.939292:0:8215:0:(out_handler.c:89:out_check_resent()) @@@ no reply for RESENT req (have 1518918717705928)  req@ffff88082ea09050 x1518918717714200/t0(0) o1000-&amp;gt;soaked-MDT0003-mdtlov_UUID@192.168.1.109@o2ib10:694/0 lens 320/4320 e 0 to 0 dl 1448567099 ref 1 fl Interpret:/2/0 rc 0/0
00000020:00080000:29.0:1448567093.939297:0:8491:0:(out_handler.c:89:out_check_resent()) @@@ no reply for RESENT req (have 1518924261645488)  req@ffff8808040a5450 x1518924261650748/t0(0) o1000-&amp;gt;soaked-MDT0006-mdtlov_UUID@192.168.1.111@o2ib10:694/0 lens 320/4320 e 0 to 0 dl 1448567099 ref 1 fl Interpret:/2/0 rc 0/0
00000020:00080000:11.0:1448567093.939298:0:8215:0:(out_handler.c:89:out_check_resent()) @@@ no reply for RESENT req (have 1518918717705928)  req@ffff88082ea09050 x1518918717714200/t0(0) o1000-&amp;gt;soaked-MDT0003-mdtlov_UUID@192.168.1.109@o2ib10:694/0 lens 320/4320 e 0 to 0 dl 1448567099 ref 1 fl Interpret:/2/0 rc 0/0
00000020:00080000:29.0:1448567093.939303:0:8491:0:(out_handler.c:89:out_check_resent()) @@@ no reply for RESENT req (have 1518924261645488)  req@ffff8808040a5450 x1518924261650748/t0(0) o1000-&amp;gt;soaked-MDT0006-mdtlov_UUID@192.168.1.111@o2ib10:694/0 lens 320/4320 e 0 to 0 dl 1448567099 ref 1 fl Interpret:/2/0 rc 0/0
00000020:00080000:11.0:1448567093.939305:0:8215:0:(out_handler.c:89:out_check_resent()) @@@ no reply for RESENT req (have 1518918717705928)  req@ffff88082ea09050 x1518918717714200/t0(0) o1000-&amp;gt;soaked-MDT0003-mdtlov_UUID@192.168.1.109@o2ib10:694/0 lens 320/4320 e 0 to 0 dl 1448567099 ref 1 fl Interpret:/2/0 rc 0/0
00000020:00080000:29.0:1448567093.939310:0:8491:0:(out_handler.c:89:out_check_resent()) @@@ no reply for RESENT req (have 1518924261645488)  req@ffff8808040a5450 x1518924261650748/t0(0) o1000-&amp;gt;soaked-MDT0006-mdtlov_UUID@192.168.1.111@o2ib10:694/0 lens 320/4320 e 0 to 0 dl 1448567099 ref 1 fl Interpret:/2/0 rc 0/0
00000020:00080000:11.0:1448567093.939316:0:8215:0:(out_handler.c:89:out_check_resent()) @@@ no reply for RESENT req (have 1518918717705928)  req@ffff88082ea09050 x1518918717714200/t0(0) o1000-&amp;gt;soaked-MDT0003-mdtlov_UUID@192.168.1.109@o2ib10:694/0 lens 320/4320 e 0 to 0 dl 1448567099 ref 1 fl Interpret:/2/0 rc 0/0
00000020:00080000:29.0:1448567093.939318:0:8491:0:(out_handler.c:89:out_check_resent()) @@@ no reply for RESENT req (have 1518924261645488)  req@ffff8808040a5450 x1518924261650748/t0(0) o1000-&amp;gt;soaked-MDT0006-mdtlov_UUID@192.168.1.111@o2ib10:694/0 lens 320/4320 e 0 to 0 dl 1448567099 ref 1 fl Interpret:/2/0 rc 0/0
00000020:00080000:29.0:1448567093.939325:0:8491:0:(out_handler.c:89:out_check_resent()) @@@ no reply for RESENT req (have 1518924261645488)  req@ffff8808040a5450 x1518924261650748/t0(0) o1000-&amp;gt;soaked-MDT0006-mdtlov_UUID@192.168.1.111@o2ib10:694/0 lens 320/4320 e 0 to 0 dl 1448567099 ref 1 fl Interpret:/2/0 rc 0/0
00000020:00080000:11.0:1448567093.939325:0:8215:0:(out_handler.c:89:out_check_resent()) @@@ no reply for RESENT req (have 1518918717705928)  req@ffff88082ea09050 x1518918717714200/t0(0) o1000-&amp;gt;soaked-MDT0003-mdtlov_UUID@192.168.1.109@o2ib10:694/0 lens 320/4320 e 0 to 0 dl 1448567099 ref 1 fl Interpret:/2/0 rc 0/0
00000020:00080000:29.0:1448567093.939332:0:8491:0:(out_handler.c:89:out_check_resent()) @@@ no reply for RESENT req (have 1518924261645488)  req@ffff8808040a5450 x1518924261650748/t0(0) o1000-&amp;gt;soaked-MDT0006-mdtlov_UUID@192.168.1.111@o2ib10:694/0 lens 320/4320 e 0 to 0 dl 1448567099 ref 1 fl Interpret:/2/0 rc 0/0
00000020:00080000:29.0:1448567093.939336:0:8491:0:(out_handler.c:89:out_check_resent()) @@@ no reply for RESENT req (have 1518924261645488)  req@ffff8808040a5450 x1518924261650748/t0(0) o1000-&amp;gt;soaked-MDT0006-mdtlov_UUID@192.168.1.111@o2ib10:694/0 lens 320/4320 e 0 to 0 dl 1448567099 ref 1 fl Interpret:/2/0 rc 0/0
00000020:00080000:29.0:1448567093.939340:0:8491:0:(out_handler.c:89:out_check_resent()) @@@ no reply for RESENT req (have 1518924261645488)  req@ffff8808040a5450 x1518924261650748/t0(0) o1000-&amp;gt;soaked-MDT0006-mdtlov_UUID@192.168.1.111@o2ib10:694/0 lens 320/4320 e 0 to 0 dl 1448567099 ref 1 fl Interpret:/2/0 rc 0/0
00000020:00080000:29.0:1448567093.939344:0:8491:0:(out_handler.c:89:out_check_resent()) @@@ no reply for RESENT req (have 1518924261645488)  req@ffff8808040a5450 x1518924261650748/t0(0) o1000-&amp;gt;soaked-MDT0006-mdtlov_UUID@192.168.1.111@o2ib10:694/0 lens 320/4320 e 0 to 0 dl 1448567099 ref 1 fl Interpret:/2/0 rc 0/0
00000020:00020000:29.0:1448567093.940373:0:8491:0:(out_lib.c:692:out_tx_write_exec()) read record [0x240089779:0x1:0x0] tail_pos 173122472 rc -53 index 50635 size 172659608
00000020:00040000:29.0:1448567093.940379:0:8491:0:(out_lib.c:693:out_tx_write_exec()) LBUG
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It seems the request  x1518924261650748 is not being invalidated for failure, which cause the corruption. sigh, there are no debug log on MDT0006, so I do not know what happens there.&lt;/p&gt;

&lt;p&gt;Hmm, it seems not all of resend requests are added to the delayed_list, &lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;      /* retry indefinitely on EINPROGRESS */
        if (lustre_msg_get_status(req-&amp;gt;rq_repmsg) == -EINPROGRESS &amp;amp;&amp;amp;
            ptlrpc_no_resend(req) == 0 &amp;amp;&amp;amp; !req-&amp;gt;rq_no_retry_einprogress) {
                time_t  now = cfs_time_current_sec();

                DEBUG_REQ(D_RPCTRACE, req, &quot;Resending request on EINPROGRESS&quot;);
                spin_lock(&amp;amp;req-&amp;gt;rq_lock);
                req-&amp;gt;rq_resend = 1;
                spin_unlock(&amp;amp;req-&amp;gt;rq_lock);
                req-&amp;gt;rq_nr_resend++;

                /* Readjust the timeout for current conditions */
                ptlrpc_at_set_req_timeout(req);
                /* delay resend to give a chance to the server to get ready.
                 * The delay is increased by 1s on every resend and is capped to
                 * the current request timeout (i.e. obd_timeout if AT is off,
                 * or AT service time x 125% + 5s, see at_est2timeout) */
                if (req-&amp;gt;rq_nr_resend &amp;gt; req-&amp;gt;rq_timeout)
                        req-&amp;gt;rq_sent = now + req-&amp;gt;rq_timeout;
                else
                        req-&amp;gt;rq_sent = now + req-&amp;gt;rq_nr_resend;

                RETURN(0);
        }
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I will update &lt;a href=&quot;http://review.whamcloud.com/#/c/17199/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/17199/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="134831" author="di.wang" created="Tue, 1 Dec 2015 08:53:00 +0000"  >&lt;p&gt;Here is the patch &lt;a href=&quot;http://review.whamcloud.com/#/c/17199/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/17199/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="140371" author="gerrit" created="Thu, 28 Jan 2016 16:51:50 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/17199/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/17199/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7490&quot; title=&quot;out_tx_write_exec()) LBUG&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7490&quot;&gt;&lt;del&gt;LU-7490&lt;/del&gt;&lt;/a&gt; recovery: abort update recovery once fails&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 92890d8f555d12ad32dc9841a328e84c5d26e896&lt;/p&gt;</comment>
                            <comment id="140373" author="jgmitter" created="Thu, 28 Jan 2016 16:59:32 +0000"  >&lt;p&gt;Landed for 2.8&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="33358">LU-7488</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="33358">LU-7488</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="33262">LU-7455</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="19742" name="console-lola-8.log.bz2" size="297175" author="heckes" created="Fri, 27 Nov 2015 11:56:55 +0000"/>
                            <attachment id="19743" name="lola-8-lbug-client-messages.txt.bz2" size="2627" author="heckes" created="Fri, 27 Nov 2015 11:56:55 +0000"/>
                            <attachment id="19744" name="lustre-log.1448567093.8491.bz2" size="54769" author="heckes" created="Fri, 27 Nov 2015 11:56:55 +0000"/>
                            <attachment id="19745" name="messages-lola-8.log.bz2" size="514811" author="heckes" created="Fri, 27 Nov 2015 11:56:55 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10490" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>End date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Mon, 7 Dec 2015 11:38:37 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzxuf3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10493" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>Start date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Fri, 27 Nov 2015 11:38:37 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    </customfields>
    </item>
</channel>
</rss>