[LU-519] (o2iblnd_cb.c:980:kiblnd_tx_complete()) ASSERTION(tx->tx_sending > 0) failed Created: 20/Jul/11 Updated: 25/May/16 Resolved: 10/May/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Prakash Surya (Inactive) | Assignee: | Doug Oucharek (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | llnl | ||
| Environment: |
lustre-modules-1.8.5.0-5chaos_2.6.18_107chaos.ch4.4 |
||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Bugzilla ID: | 22,723 | ||||
| Rank (Obsolete): | 9755 | ||||
| Description |
|
This is the same bug that was discussed on bugzilla ticket 22723 found here: https://bugzilla.lustre.org/show_bug.cgi?id=22723 We have run into this issue in the past, and we recently ran into the issue again on one of our machines running chaos 4.4-3, with lustre 1.8.5 installed. This time we had a patch installed for Liang from the above mentioned bugzilla ticket to print some more debugging information to the console: 18632:0:(o2iblnd_cb.c:980:kiblnd_tx_complete()) ASSERTION(tx->tx_sending > 0) failed: TX: ffffc200003da540, type: IMMEDIATE, magic: deadbeef, sending: 0, waiting: 0, queued: 0, cookie: 9361598, comps: 1, status: 0 And here is some more console output from the crash this time around: 2011-07-15 19:44:40 LustreError: 18632:0:(o2iblnd_cb.c:980:kiblnd_tx_complete()) ASSERTION(tx->tx_sending > 0) failed: TX: ffffc200003da540, type: IMMEDIATE, magic: deadbeef, sending: 0, waiting: 0, queued: 0, cookie: 9361598, comps: 1, status: 0 2011-07-15 19:44:40 ib_mthca 0000:07:00.0: SQ 000413 full (292641 head, 292873 tail, 4096 max, 0 nreq) 2011-07-15 19:44:40 LustreError: 19245:0:(o2iblnd_cb.c:886:kiblnd_post_tx_locked()) Error -12 posting transmit to 192.168.123.110@o2ib1 2011-07-15 19:44:40 Lustre: lsa-OST0036-osc-ffff8100cc104000: Request obd_ping sent 0s ago to 172.16.68.23@tcp has failed due to network error (limit 105s) 2011-07-15 19:44:40 Lustre: Skipped 1 previous similar message 2011-07-15 19:44:40 Lustre: lsa-OST0036-osc-ffff8100cc104000: Connection to lsa-OST0036 (at 172.16.68.23@tcp) was lost; in progress operations using this service will wait for recovery to complete 2011-07-15 19:44:40 LustreError: 18632:0:(o2iblnd_cb.c:980:kiblnd_tx_complete()) LBUG 2011-07-15 19:44:40 Pid: 18632, comm: kiblnd_sd_03 2011-07-15 19:44:40 2011-07-15 19:44:40 Call Trace: 2011-07-15 19:44:40 [<ffffffff885d478f>] libcfs_debug_dumpstack+0x5f/0x80 [libcfs] 2011-07-15 19:44:40 [<ffffffff885d4cbf>] lbug_with_loc+0x7f/0xd0 [libcfs] 2011-07-15 19:44:40 [<ffffffff88657965>] kiblnd_tx_complete+0x155/0x460 [ko2iblnd] 2011-07-15 19:44:40 [<ffffffff80091f8c>] __wake_up_common+0x3e/0x68 2011-07-15 19:44:40 [<ffffffff88658f1c>] kiblnd_complete+0xbc/0xe0 [ko2iblnd] 2011-07-15 19:44:40 [<ffffffff8865eeee>] kiblnd_scheduler+0x50e/0x6b0 [ko2iblnd] 2011-07-15 19:44:40 [<ffffffff80093b5a>] default_wake_function+0x0/0xf 2011-07-15 19:44:40 [<ffffffff8006101d>] child_rip+0xa/0x11 2011-07-15 19:44:40 [<ffffffff80061013>] child_rip+0x0/0x11 2011-07-15 19:44:40 [<ffffffff8865e9e0>] kiblnd_scheduler+0x0/0x6b0 [ko2iblnd] 2011-07-15 19:44:40 [<ffffffff80061013>] child_rip+0x0/0x11 2011-07-15 19:44:40 2011-07-15 19:44:40 ib_mthca 00Linux version 2.6.18-107chaos (mockbuild@chaos4-builder1) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Thu Jun 23 14:36:14 PDT 2011 2011-07-15 19:44:41 Command line: initrd=initrd console=ttyS0,115200n8 elevator=deadline swiotlb=65536 selinux=0 BOOT_IMAGE=vmlinuz BOOTIF=01-00-30-48-57-9b-24 irqpoll maxcpus=1 reset_devices memmap=exactmap memmap=640K@0K memmap=5312K@16384K memmap=125104K@22336K elfcorehdr=147440K memmap=56K#3391360K memmap=69K#3391416K memmap=4K$3391484K memmap=4K$4173824K memmap=1024K$4175872K memmap=9216K$4185088K 2011-07-15 19:44:41 BIOS-provided physical RAM map: |
| Comments |
| Comment by Prakash Surya (Inactive) [ 20/Jul/11 ] |
|
Sorry, I forgot to put the formatting markup around the console output. Am I able to edit the description? |
| Comment by Peter Jones [ 20/Jul/11 ] |
|
Prakash, if you can't I can! Let me know what changes you need made. Liang, it would be great if you could make an initial assessment of this output but please let me know if you then need this issue to be reassigned to another engineer Thanks Peter |
| Comment by Prakash Surya (Inactive) [ 21/Jul/11 ] |
|
Just for readability purposes, I usually put the copy/pasted console output in 'noformat' tags. If you could add these I think it would definitely help people decipher the bug. Let me know if it isn't clear where the tags should go. Thanks Peter! |
| Comment by Liang Zhen (Inactive) [ 22/Jul/11 ] |
|
thanks, it did prove that we got an unexpected completion event while ib_post_send failed (we shouldn't get any event on that case), so the TX's been finalized for twice... I will need to look into ofa source. |
| Comment by Liang Zhen (Inactive) [ 22/Jul/11 ] |
|
One thing interesting is: 2011-07-15 19:44:40 ib_mthca 0000:07:00.0: SQ 000413 full (292641 head, 292873 we can see (head < tail), how can this happen? In my understanding we should always see head >= tail, or am I wrong? Liang |
| Comment by Peter Jones [ 26/Jul/11 ] |
|
Prakash Finally got to fixing the formatting in the initial comment - sorry for the delay Peter |
| Comment by Prakash Surya (Inactive) [ 26/Jul/11 ] |
|
Thanks Peter! |
| Comment by Peter Jones [ 27/Jul/11 ] |
|
Lai Could you please dig into this one further? Liang is available to help if need be. Thanks Peter |
| Comment by Lai Siyao [ 02/Aug/11 ] |
|
Liang, Could you give me a hint which part of OFA source code should I look into to verify TX may be finalized twice? HCA driver (eg. mthca) or core? Thanks! |
| Comment by Peter Jones [ 08/Aug/11 ] |
|
Liang Ah, I see that you were not added as a watcher to this ticket so you missed this question "Liang, Could you give me a hint which part of OFA source code should I look into to verify TX may be finalized twice? HCA driver (eg. mthca) or core? Thanks!" |
| Comment by Liang Zhen (Inactive) [ 08/Aug/11 ] |
|
Lai, I think it's mthca because we got this error message: we can see the LASSERT give us information like this: these information are outputted by this patch magic number is 0xdeadbeef and completion couter is 1, which means the TX has already been finalized. |
| Comment by Lai Siyao [ 05/Dec/11 ] |
|
The ib_mthca SQ full looks similar to the problem of http://linux.derkeiler.com/Mailing-Lists/Kernel/2006-11/msg00108.html , Prakash, could you check the patch is included in your kernel? Liang, the ASSERTED tx->tx_comps: 1 means the completion has been handled already, and this is the second time it's called. Therefor, I'm not sure this tx is finalized in kiblnd_post_tx_locked() or in kiblnd_tx_complete(). I made a debug patch to print address of failed tx in kiblnd_post_tx_locked(), Prakash, could you merge this patch (along with Liang's) and reproduce this issue? |
| Comment by Lai Siyao [ 05/Dec/11 ] |
|
print address of failed tx in kiblnd_post_tx_locked(). |
| Comment by Prakash Surya (Inactive) [ 19/Jan/12 ] |
|
We recently hit this again. Here's the report from our admin: fyi: it seems to have happened again on atlas825 at a newer version of lustre lustre-modules-1.8.5.0-6chaos_2.6.18_108chaos.ch4.4 lustre-1.8.5.0-6chaos_2.6.18_108chaos.ch4.4 lustre-tools-llnl-1.3-1.ch4.4 No infiniband errors reported for that node at that time. I'll leave it for a couple of days, but it appears hung <ConMan> Console [atlas825] log at 2012-01-19 13:00:00 PST. <ConMan> Console [atlas825] log at 2012-01-19 14:00:00 PST. 2012-01-19 14:40:35 LustreError: 18699:0:(o2iblnd_cb.c:980:kiblnd_tx_complete()) ASSERTION(tx->tx_sending > 0) failed: TX: ffffc200003d7e30, type: IMMEDIATE, magic: deadbeef, sending: 0, waiting: 0, queued: 0, cookie: 263119177, comps: 1, status: 0 2012-01-19 14:40:35 LustreError: 18699:0:(o2iblnd_cb.c:980:kiblnd_tx_complete()) LBUG 2012-01-19 14:40:35 Pid: 18699, comm: kiblnd_sd_02 2012-01-19 14:40:35 2012-01-19 14:40:35 Call Trace: 2012-01-19 14:40:35 [<ffffffff885e078f>] libcfs_debug_dumpstack+0x5f/0x80 [libcfs] 2012-01-19 14:40:35 [<ffffffff885e0cbf>] lbug_with_loc+0x7f/0xd0 [libcfs] 2012-01-19 14:40:35 [<ffffffff88663965>] kiblnd_tx_complete+0x155/0x460 [ko2iblnd] 2012-01-19 14:40:35 [<ffffffff80091f8c>] __wake_up_common+0x3e/0x68 2012-01-19 14:40:35 [<ffffffff88664f1c>] kiblnd_complete+0xbc/0xe0 [ko2iblnd] 2012-01-19 14:40:35 [<ffffffff8866aeee>] kiblnd_scheduler+0x50e/0x6b0 [ko2iblnd] 2012-01-19 14:40:35 [<ffffffff80093b5a>] default_wake_function+0x0/0xf 2012-01-19 14:40:35 [<ffffffff8006101d>] child_rip+0xa/0x11 2012-01-19 14:40:35 [<ffffffff80061013>] child_rip+0x0/0x11 2012-01-19 14:40:35 [<ffffffff8866a9e0>] kiblnd_scheduler+0x0/0x6b0 [ko2iblnd] 2012-01-19 14:40:35 [<ffffffff80061013>] child_rip+0x0/0x11 2012-01-19 14:40:35 2012-01-19 14:40:35 Jan 19 14:40:35 Kernel panic - not syncing: LBUG 2012-01-19 14:40:35 atlas825 LustreE rr <ConMan> Console [atlas825] log at 2012-01-19 15:00:00 PST. <ConMan> Console [atlas825] joined by <root@localhost> on pts/159 at 01-19 15:49. <ConMan> Console [atlas825] departed by <root@localhost> on pts/159 at 01-19 15:49. |
| Comment by Liang Zhen (Inactive) [ 20/Jan/12 ] |
|
Prakash, would it be possible to post your o2iblnd source code at here so I can check through it? |
| Comment by Prakash Surya (Inactive) [ 23/Jan/12 ] |
|
Liang, our Lustre tree (with the o2iblnd source) can be found on our CHAOS github account: https://github.com/chaos/lustre |
| Comment by Doug Oucharek (Inactive) [ 15/Jun/12 ] |
|
I came across some references on the web where some IB drivers may do a "partial send" when returning an ENOMEM from ib_post_send() (see http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg01390.html). I noticed that the first console log posted in this Jira does show a -12 (-ENOMEM) being reported from kiblnd_post_tx_locked(). Here is what I suspect has happened: 1- In kiblnd_post_tx_locked(), while holding the ibc_lock, tx_sending is incremented. Seems there are two bugs here: 1- that the IB driver experienced a memory issue, and 2- that o2iblnd ASSERTS on a situation which can happen (partial send on error). |
| Comment by Doug Oucharek (Inactive) [ 19/Jun/12 ] |
|
I have submitted a patch to Gerrit to cover the case where Lustre is asserting in a situation which, apparently, can happen. The Gerrit URL is: http://review.whamcloud.com/#change,3148 |
| Comment by Liang Zhen (Inactive) [ 19/Jun/12 ] |
|
Doug, I think we only set IB_SEND_SIGNALED for the last ib_send_wr (see kiblnd_init_tx_msg), so not sure whether the callback will still be called for partial sending if w/o IB_SEND_SIGNALED |
| Comment by Doug Oucharek (Inactive) [ 20/Jun/12 ] |
|
In this discussion thread: http://comments.gmane.org/gmane.linux.drivers.rdma/6119 , Roland Dreier indicates that the following is in the IB spec: "If a QP transitions to error state, then all work requests, whether or not they were signaled, generate a completion with status 'flush'." This indicates to me that is it possible to get a completion on a partial send even if only the last wr is set with IB_SEND_SIGNALED. |
| Comment by Liang Zhen (Inactive) [ 25/Jun/12 ] |
|
does it means that it only and always generate a completion event if ib_post_send() returned -ENOMEM? what if it returns other error code, will there still be completion event? |
| Comment by Doug Oucharek (Inactive) [ 25/Jun/12 ] |
|
That's a good question, Liang. I'm not familiar with the IB driver(s) so I'm not sure. Can we assume that when tx_status is non-zero, we should always fail in the completion? From this Jira report, I know that a tx_status of ENOMEM should trigger a fail. Don't know about other. Is there somewhere to see all possible tx_status conditions which can come from the IB driver? |
| Comment by Gerrit Updater [ 17/Nov/14 ] |
|
Liang Zhen (liang.zhen@intel.com) uploaded a new patch: http://review.whamcloud.com/12747 |
| Comment by Liang Zhen (Inactive) [ 17/Nov/14 ] |
|
I suspect this is a driver bug, which results in ib_poll_cq() return +ve but uninitialised wc::wr_id, if this indeed happened, kiblnd_scheduler will refer to stale pointer in stack (see |
| Comment by Gerrit Updater [ 12/Jan/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12747/ |
| Comment by Doug Oucharek (Inactive) [ 10/May/16 ] |
|
The merged patch will close the connection which gets into a bad state and log the details. As such, we should no longer be seeing the assert. If the new log shows up, then a new ticket should be opened with the details so we can address that. |