<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:07:50 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-519] (o2iblnd_cb.c:980:kiblnd_tx_complete()) ASSERTION(tx-&gt;tx_sending &gt; 0) failed</title>
                <link>https://jira.whamcloud.com/browse/LU-519</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;This is the same bug that was discussed on bugzilla ticket 22723 found here: &lt;a href=&quot;https://bugzilla.lustre.org/show_bug.cgi?id=22723&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://bugzilla.lustre.org/show_bug.cgi?id=22723&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We have run into this issue in the past, and we recently ran into the issue again on one of our machines running chaos 4.4-3, with lustre 1.8.5 installed.&lt;/p&gt;

&lt;p&gt;This time we had a patch installed for Liang from the above mentioned bugzilla ticket to print some more debugging information to the console:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;
18632:0:(o2iblnd_cb.c:980:kiblnd_tx_complete()) ASSERTION(tx-&amp;gt;tx_sending &amp;gt; 0)
failed: TX: ffffc200003da540, type: IMMEDIATE, magic: deadbeef, sending: 0,
waiting: 0, queued: 0, cookie: 9361598, comps: 1, status: 0

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And here is some more console output from the crash this time around:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;
2011-07-15 19:44:40 LustreError:
18632:0:(o2iblnd_cb.c:980:kiblnd_tx_complete()) ASSERTION(tx-&amp;gt;tx_sending &amp;gt; 0)
failed: TX: ffffc200003da540, type: IMMEDIATE, magic: deadbeef, sending: 0,
waiting: 0, queued: 0, cookie: 9361598, comps: 1, status: 0
2011-07-15 19:44:40 ib_mthca 0000:07:00.0: SQ 000413 full (292641 head, 292873
tail, 4096 max, 0 nreq)
2011-07-15 19:44:40 LustreError:
19245:0:(o2iblnd_cb.c:886:kiblnd_post_tx_locked()) Error -12 posting transmit
to 192.168.123.110@o2ib1
2011-07-15 19:44:40 Lustre: lsa-OST0036-osc-ffff8100cc104000: Request obd_ping
sent 0s ago to 172.16.68.23@tcp has failed due to network error (limit 105s)
2011-07-15 19:44:40 Lustre: Skipped 1 previous similar message
2011-07-15 19:44:40 Lustre: lsa-OST0036-osc-ffff8100cc104000: Connection to
lsa-OST0036 (at 172.16.68.23@tcp) was lost; in progress operations using this
service will wait for recovery to complete
2011-07-15 19:44:40 LustreError:
18632:0:(o2iblnd_cb.c:980:kiblnd_tx_complete()) LBUG
2011-07-15 19:44:40 Pid: 18632, comm: kiblnd_sd_03
2011-07-15 19:44:40
2011-07-15 19:44:40 Call Trace:
2011-07-15 19:44:40  [&amp;lt;ffffffff885d478f&amp;gt;] libcfs_debug_dumpstack+0x5f/0x80
[libcfs]
2011-07-15 19:44:40  [&amp;lt;ffffffff885d4cbf&amp;gt;] lbug_with_loc+0x7f/0xd0 [libcfs]
2011-07-15 19:44:40  [&amp;lt;ffffffff88657965&amp;gt;] kiblnd_tx_complete+0x155/0x460
[ko2iblnd]
2011-07-15 19:44:40  [&amp;lt;ffffffff80091f8c&amp;gt;] __wake_up_common+0x3e/0x68
2011-07-15 19:44:40  [&amp;lt;ffffffff88658f1c&amp;gt;] kiblnd_complete+0xbc/0xe0 [ko2iblnd]
2011-07-15 19:44:40  [&amp;lt;ffffffff8865eeee&amp;gt;] kiblnd_scheduler+0x50e/0x6b0
[ko2iblnd]
2011-07-15 19:44:40  [&amp;lt;ffffffff80093b5a&amp;gt;] default_wake_function+0x0/0xf
2011-07-15 19:44:40  [&amp;lt;ffffffff8006101d&amp;gt;] child_rip+0xa/0x11
2011-07-15 19:44:40  [&amp;lt;ffffffff80061013&amp;gt;] child_rip+0x0/0x11
2011-07-15 19:44:40  [&amp;lt;ffffffff8865e9e0&amp;gt;] kiblnd_scheduler+0x0/0x6b0 [ko2iblnd]
2011-07-15 19:44:40  [&amp;lt;ffffffff80061013&amp;gt;] child_rip+0x0/0x11
2011-07-15 19:44:40
2011-07-15 19:44:40 ib_mthca 00Linux version 2.6.18-107chaos
(mockbuild@chaos4-builder1) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1
SMP Thu Jun 23 14:36:14 PDT 2011
2011-07-15 19:44:41 Command line: initrd=initrd console=ttyS0,115200n8
elevator=deadline swiotlb=65536 selinux=0 BOOT_IMAGE=vmlinuz
BOOTIF=01-00-30-48-57-9b-24 irqpoll maxcpus=1 reset_devices  memmap=exactmap
memmap=640K@0K memmap=5312K@16384K memmap=125104K@22336K elfcorehdr=147440K
memmap=56K#3391360K memmap=69K#3391416K memmap=4K$3391484K memmap=4K$4173824K
memmap=1024K$4175872K memmap=9216K$4185088K
2011-07-15 19:44:41 BIOS-provided physical RAM map:

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment>lustre-modules-1.8.5.0-5chaos_2.6.18_107chaos.ch4.4&lt;br/&gt;
lustre-1.8.5.0-5chaos_2.6.18_107chaos.ch4.4&lt;br/&gt;
lustre-tools-llnl-1.2-6.ch4.4&lt;br/&gt;
chaos 4.4-3</environment>
        <key id="11363">LU-519</key>
            <summary>(o2iblnd_cb.c:980:kiblnd_tx_complete()) ASSERTION(tx-&gt;tx_sending &gt; 0) failed</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="doug">Doug Oucharek</assignee>
                                    <reporter username="prakash">Prakash Surya</reporter>
                        <labels>
                            <label>llnl</label>
                    </labels>
                <created>Wed, 20 Jul 2011 11:33:15 +0000</created>
                <updated>Wed, 25 May 2016 21:53:49 +0000</updated>
                            <resolved>Tue, 10 May 2016 21:36:14 +0000</resolved>
                                                    <fixVersion>Lustre 2.8.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>9</watches>
                                                                            <comments>
                            <comment id="18006" author="prakash" created="Wed, 20 Jul 2011 11:52:39 +0000"  >&lt;p&gt;Sorry, I forgot to put the formatting markup around the console output. Am I able to edit the description?&lt;/p&gt;</comment>
                            <comment id="18008" author="pjones" created="Wed, 20 Jul 2011 12:32:00 +0000"  >&lt;p&gt;Prakash, if you can&apos;t I can! Let me know what changes you need made.&lt;/p&gt;

&lt;p&gt;Liang, it would be great if you could make an initial assessment of this output but please let me know if you then need this issue to be reassigned to another engineer&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="18079" author="prakash" created="Thu, 21 Jul 2011 14:02:14 +0000"  >&lt;p&gt;Just for readability purposes, I usually put the copy/pasted console output in &apos;noformat&apos; tags. If you could add these I think it would definitely help people decipher the bug. Let me know if it isn&apos;t clear where the tags should go.&lt;/p&gt;

&lt;p&gt;Thanks Peter!&lt;/p&gt;</comment>
                            <comment id="18083" author="liang" created="Fri, 22 Jul 2011 00:03:11 +0000"  >&lt;p&gt;thanks, it did prove that we got an unexpected completion event while ib_post_send failed (we shouldn&apos;t get any event on that case), so the TX&apos;s been finalized for twice... I will need to look into ofa source.&lt;/p&gt;</comment>
                            <comment id="18089" author="liang" created="Fri, 22 Jul 2011 05:43:26 +0000"  >&lt;p&gt;One thing interesting is: &lt;/p&gt;

&lt;p&gt;2011-07-15 19:44:40 ib_mthca 0000:07:00.0: SQ 000413 full (292641 head, 292873&lt;br/&gt;
tail, 4096 max, 0 nreq)&lt;/p&gt;

&lt;p&gt;we can see (head &amp;lt; tail), how can this happen? In my understanding we should always see head &amp;gt;= tail, or am I wrong? &lt;/p&gt;

&lt;p&gt;Liang&lt;/p&gt;</comment>
                            <comment id="18245" author="pjones" created="Tue, 26 Jul 2011 13:48:39 +0000"  >&lt;p&gt;Prakash&lt;/p&gt;

&lt;p&gt;Finally got to fixing the formatting in the initial comment - sorry for the delay&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="18264" author="prakash" created="Tue, 26 Jul 2011 16:12:41 +0000"  >&lt;p&gt;Thanks Peter!&lt;/p&gt;</comment>
                            <comment id="18319" author="pjones" created="Wed, 27 Jul 2011 08:15:31 +0000"  >&lt;p&gt;Lai&lt;/p&gt;

&lt;p&gt;Could you please dig into this one further? Liang is available to help if need be.&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="18628" author="laisiyao" created="Tue, 2 Aug 2011 12:15:35 +0000"  >&lt;p&gt;Liang,&lt;/p&gt;

&lt;p&gt;Could you give me a hint which part of OFA source code should I look into to verify TX may be finalized twice? HCA driver (eg. mthca) or core? Thanks!&lt;/p&gt;</comment>
                            <comment id="18799" author="pjones" created="Mon, 8 Aug 2011 10:24:25 +0000"  >&lt;p&gt;Liang&lt;/p&gt;

&lt;p&gt;Ah, I see that you were not added as a watcher to this ticket so you missed this question&lt;/p&gt;

&lt;p&gt;&quot;Liang,&lt;/p&gt;

&lt;p&gt;Could you give me a hint which part of OFA source code should I look into to verify TX may be finalized twice? HCA driver (eg. mthca) or core? Thanks!&quot;&lt;/p&gt;</comment>
                            <comment id="18800" author="liang" created="Mon, 8 Aug 2011 10:38:17 +0000"  >&lt;p&gt;Lai, I think it&apos;s mthca because we got this error message:&lt;br/&gt;
ib_mthca 0000:07:00.0: SQ 000413 full (292641 head, 292873 tail, 4096 max, 0 nreq)&lt;br/&gt;
we probably need to look into post_send implementation in drivers/infiniband/hw/mthca/mthca_qp.c&lt;/p&gt;

&lt;p&gt;we can see the LASSERT give us information like this:&lt;br/&gt;
(o2iblnd_cb.c:980:kiblnd_tx_complete()) ASSERTION(tx-&amp;gt;tx_sending &amp;gt; 0)&lt;br/&gt;
failed: TX: ffffc200003da540, type: IMMEDIATE, magic: deadbeef, sending: 0,&lt;br/&gt;
waiting: 0, queued: 0, cookie: 9361598, comps: 1, status: 0&lt;/p&gt;

&lt;p&gt;these information are outputted by this patch&lt;br/&gt;
&lt;a href=&quot;https://bugzilla.lustre.org/attachment.cgi?id=31074&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://bugzilla.lustre.org/attachment.cgi?id=31074&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;magic number is 0xdeadbeef and completion couter is 1, which means the TX has already been finalized.&lt;br/&gt;
I suspect kiblnd_post_tx_locked()-&amp;gt;ib_post_send() failed for some reason so the TX should be finalized in kiblnd_post_tx_locked() and we shouldn&apos;t see any completion &lt;br/&gt;
event for this TX anymore, but for some unknown reason we did see an unexpected event and call kiblnd_tx_complete() for the same TX again.&lt;/p&gt;</comment>
                            <comment id="23681" author="laisiyao" created="Mon, 5 Dec 2011 09:20:53 +0000"  >&lt;p&gt;The ib_mthca SQ full looks similar to the problem of &lt;a href=&quot;http://linux.derkeiler.com/Mailing-Lists/Kernel/2006-11/msg00108.html&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://linux.derkeiler.com/Mailing-Lists/Kernel/2006-11/msg00108.html&lt;/a&gt; , Prakash, could you check the patch is included in your kernel?&lt;/p&gt;

&lt;p&gt;Liang, the ASSERTED tx-&amp;gt;tx_comps: 1 means the completion has been handled already, and this is the second time it&apos;s called. Therefor, I&apos;m not sure this tx is finalized in kiblnd_post_tx_locked() or in kiblnd_tx_complete().&lt;/p&gt;

&lt;p&gt;I made a debug patch to print address of failed tx in kiblnd_post_tx_locked(), Prakash, could you merge this patch (along with Liang&apos;s) and reproduce this issue?&lt;/p&gt;
</comment>
                            <comment id="23682" author="laisiyao" created="Mon, 5 Dec 2011 09:21:36 +0000"  >&lt;p&gt;print address of failed tx in kiblnd_post_tx_locked().&lt;/p&gt;</comment>
                            <comment id="27012" author="prakash" created="Thu, 19 Jan 2012 19:32:56 +0000"  >&lt;p&gt;We recently hit this again. Here&apos;s the report from our admin:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;fyi: it seems to have happened again on atlas825 at a newer version of lustre
lustre-modules-1.8.5.0-6chaos_2.6.18_108chaos.ch4.4
lustre-1.8.5.0-6chaos_2.6.18_108chaos.ch4.4
lustre-tools-llnl-1.3-1.ch4.4
No infiniband errors reported for that node at that time.
I&apos;ll leave it for a couple of days, but it appears hung

&amp;lt;ConMan&amp;gt; Console [atlas825] log at 2012-01-19 13:00:00 PST.

&amp;lt;ConMan&amp;gt; Console [atlas825] log at 2012-01-19 14:00:00 PST.
2012-01-19 14:40:35 LustreError: 18699:0:(o2iblnd_cb.c:980:kiblnd_tx_complete()) ASSERTION(tx-&amp;gt;tx_sending &amp;gt; 0) failed:
TX: ffffc200003d7e30, type: IMMEDIATE, magic: deadbeef, sending: 0, waiting: 0, queued: 0, cookie: 263119177, comps: 1,
status: 0
2012-01-19 14:40:35 LustreError: 18699:0:(o2iblnd_cb.c:980:kiblnd_tx_complete()) LBUG
2012-01-19 14:40:35 Pid: 18699, comm: kiblnd_sd_02
2012-01-19 14:40:35
2012-01-19 14:40:35 Call Trace:
2012-01-19 14:40:35  [&amp;lt;ffffffff885e078f&amp;gt;] libcfs_debug_dumpstack+0x5f/0x80 [libcfs]
2012-01-19 14:40:35  [&amp;lt;ffffffff885e0cbf&amp;gt;] lbug_with_loc+0x7f/0xd0 [libcfs]
2012-01-19 14:40:35  [&amp;lt;ffffffff88663965&amp;gt;] kiblnd_tx_complete+0x155/0x460 [ko2iblnd]
2012-01-19 14:40:35  [&amp;lt;ffffffff80091f8c&amp;gt;] __wake_up_common+0x3e/0x68
2012-01-19 14:40:35  [&amp;lt;ffffffff88664f1c&amp;gt;] kiblnd_complete+0xbc/0xe0 [ko2iblnd]
2012-01-19 14:40:35  [&amp;lt;ffffffff8866aeee&amp;gt;] kiblnd_scheduler+0x50e/0x6b0 [ko2iblnd]
2012-01-19 14:40:35  [&amp;lt;ffffffff80093b5a&amp;gt;] default_wake_function+0x0/0xf
2012-01-19 14:40:35  [&amp;lt;ffffffff8006101d&amp;gt;] child_rip+0xa/0x11
2012-01-19 14:40:35  [&amp;lt;ffffffff80061013&amp;gt;] child_rip+0x0/0x11
2012-01-19 14:40:35  [&amp;lt;ffffffff8866a9e0&amp;gt;] kiblnd_scheduler+0x0/0x6b0 [ko2iblnd]
2012-01-19 14:40:35  [&amp;lt;ffffffff80061013&amp;gt;] child_rip+0x0/0x11
2012-01-19 14:40:35
2012-01-19 14:40:35 Jan 19 14:40:35 Kernel panic - not syncing: LBUG
2012-01-19 14:40:35 atlas825 LustreE rr
&amp;lt;ConMan&amp;gt; Console [atlas825] log at 2012-01-19 15:00:00 PST.

&amp;lt;ConMan&amp;gt; Console [atlas825] joined by &amp;lt;root@localhost&amp;gt; on pts/159 at 01-19 15:49.

&amp;lt;ConMan&amp;gt; Console [atlas825] departed by &amp;lt;root@localhost&amp;gt; on pts/159 at 01-19 15:49.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="27078" author="liang" created="Fri, 20 Jan 2012 03:26:18 +0000"  >&lt;p&gt;Prakash, would it be possible to post your o2iblnd source code at here so I can check through it?&lt;/p&gt;</comment>
                            <comment id="27194" author="prakash" created="Mon, 23 Jan 2012 13:57:28 +0000"  >&lt;p&gt;Liang, our Lustre tree (with the o2iblnd source) can be found on our CHAOS github account: &lt;a href=&quot;https://github.com/chaos/lustre&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/chaos/lustre&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="40631" author="doug" created="Fri, 15 Jun 2012 01:56:14 +0000"  >&lt;p&gt;I came across some references on the web where some IB drivers may do a &quot;partial send&quot; when returning an ENOMEM from ib_post_send() (see &lt;a href=&quot;http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg01390.html&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg01390.html&lt;/a&gt;).  I noticed that the first console log posted in this Jira does show a -12 (-ENOMEM) being reported from kiblnd_post_tx_locked().  Here is what I suspect has happened: &lt;/p&gt;

&lt;p&gt;1- In kiblnd_post_tx_locked(), while holding the ibc_lock, tx_sending is incremented.&lt;br/&gt;
2- Then ib_post_send() is called.&lt;br/&gt;
3- ib_post_send() experiences a memory issue but does a partial send.  -ENOMEM is returned.&lt;br/&gt;
4- Because of the non-zero return code, tx_sending is decremented (becomes zero) and then the ibc_lock is released.&lt;br/&gt;
5- The partial send triggers a complete callback which executes before the -ENOMEM error message gets printed.  That callback triggers the ASSERT.&lt;/p&gt;

&lt;p&gt;Seems there are two bugs here: 1- that the IB driver experienced a memory issue, and 2- that o2iblnd ASSERTS on a situation which can happen (partial send on error).&lt;/p&gt;</comment>
                            <comment id="40885" author="doug" created="Tue, 19 Jun 2012 20:05:28 +0000"  >&lt;p&gt;I have submitted a patch to Gerrit to cover the case where Lustre is asserting in a situation which, apparently, can happen. The Gerrit URL is: &lt;a href=&quot;http://review.whamcloud.com/#change,3148&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,3148&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="40888" author="liang" created="Tue, 19 Jun 2012 21:45:23 +0000"  >&lt;p&gt;Doug, I think we only set IB_SEND_SIGNALED for the last ib_send_wr (see kiblnd_init_tx_msg), so not sure whether the callback will still be called for partial sending if w/o IB_SEND_SIGNALED&lt;/p&gt;</comment>
                            <comment id="40945" author="doug" created="Wed, 20 Jun 2012 14:52:16 +0000"  >&lt;p&gt;In this discussion thread: &lt;a href=&quot;http://comments.gmane.org/gmane.linux.drivers.rdma/6119&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://comments.gmane.org/gmane.linux.drivers.rdma/6119&lt;/a&gt; , Roland Dreier indicates that the following is in the IB spec: &quot;If a QP transitions to error state, then &lt;b&gt;all&lt;/b&gt; work requests, whether or not they were signaled, generate a completion with status &apos;flush&apos;.&quot;&lt;/p&gt;

&lt;p&gt;This indicates to me that is it possible to get a completion on a partial send even if only the last wr is set with IB_SEND_SIGNALED.&lt;/p&gt;</comment>
                            <comment id="41065" author="liang" created="Mon, 25 Jun 2012 02:38:57 +0000"  >&lt;p&gt;does it means that it &lt;em&gt;only&lt;/em&gt; and &lt;em&gt;always&lt;/em&gt; generate a completion event if ib_post_send() returned -ENOMEM? what if it returns other error code, will there still be completion event?&lt;/p&gt;</comment>
                            <comment id="41112" author="doug" created="Mon, 25 Jun 2012 18:51:20 +0000"  >&lt;p&gt;That&apos;s a good question, Liang.  I&apos;m not familiar with the IB driver(s) so I&apos;m not sure.  Can we assume that when tx_status is non-zero, we should always fail in the completion?  From this Jira report, I know that a tx_status of ENOMEM should trigger a fail.  Don&apos;t know about other.  &lt;/p&gt;

&lt;p&gt;Is there somewhere to see all possible tx_status conditions which can come from the IB driver?&lt;/p&gt;</comment>
                            <comment id="99319" author="gerrit" created="Mon, 17 Nov 2014 07:12:19 +0000"  >&lt;p&gt;Liang Zhen (liang.zhen@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/12747&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/12747&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-519&quot; title=&quot;(o2iblnd_cb.c:980:kiblnd_tx_complete()) ASSERTION(tx-&amp;gt;tx_sending &amp;gt; 0) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-519&quot;&gt;&lt;del&gt;LU-519&lt;/del&gt;&lt;/a&gt; o2iblnd: check wr_id returned by ib_poll_cq&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 524f19e949f38670483968fe0854074be31bc4ff&lt;/p&gt;</comment>
                            <comment id="99320" author="liang" created="Mon, 17 Nov 2014 07:16:18 +0000"  >&lt;p&gt;I suspect this is a driver bug, which results in ib_poll_cq() return +ve but uninitialised wc::wr_id, if this indeed happened, kiblnd_scheduler will refer to stale pointer in stack (see &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5271&quot; title=&quot;NULL pointer dereference in kiblnd_tx_complete&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5271&quot;&gt;&lt;del&gt;LU-5271&lt;/del&gt;&lt;/a&gt;) and run into unpredictable situation, this patch may help to confirm this:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://review.whamcloud.com/12747&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/12747&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="138621" author="gerrit" created="Tue, 12 Jan 2016 02:45:03 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/12747/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/12747/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-519&quot; title=&quot;(o2iblnd_cb.c:980:kiblnd_tx_complete()) ASSERTION(tx-&amp;gt;tx_sending &amp;gt; 0) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-519&quot;&gt;&lt;del&gt;LU-519&lt;/del&gt;&lt;/a&gt; o2iblnd: check wr_id returned by ib_poll_cq&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: f571d0fdb78516b615574cfcd903436eba975fc5&lt;/p&gt;</comment>
                            <comment id="151726" author="doug" created="Tue, 10 May 2016 21:36:14 +0000"  >&lt;p&gt;The merged patch will close the connection which gets into a bad state and log the details.  As such, we should no longer be seeing the assert.  If the new log shows up, then a new ticket should be opened with the details so we can address that.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                                        </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="10648" name="print_failed_tx.diff" size="764" author="laisiyao" created="Mon, 5 Dec 2011 09:21:36 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                    <customfield id="customfield_10020" key="com.atlassian.jira.plugin.system.customfieldtypes:float">
                        <customfieldname>Bugzilla ID</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>22723.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvydz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9755</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>