[LU-10459] LBUG o2iblnd_cb.c:991:kiblnd_check_sends_locked()) ASSERTION( conn->ibc_nsends_posted <= conn->ibc_queue_depth ) failed: Created: 04/Jan/18 Updated: 07/Jan/19 Resolved: 19/Jan/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0 |
| Fix Version/s: | Lustre 2.11.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Cliff White (Inactive) | Assignee: | Amir Shehata (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | soak | ||
| Environment: |
Soak performance cluster, version=2.10.56_84_gd645c72, RHEL 7.4 kernel |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
LBUG occurs immediately when we try to do any IO on clients. Multiple clients impacted Jan 4 21:57:38 soak-17 kernel: LNetError: 12570:0:(o2iblnd_cb.c:991:kiblnd_check_sends_locked()) ASSERTION( conn->ibc_nsends_posted <= conn->ibc_queue_depth ) failed: Jan 4 21:57:38 soak-17 kernel: LNetError: 12570:0:(o2iblnd_cb.c:991:kiblnd_check_sends_locked()) LBUG Jan 4 21:57:38 soak-17 kernel: Pid: 12570, comm: kiblnd_sd_00_00 Jan 4 21:57:38 soak-17 kernel: #012Call Trace: Jan 4 21:57:38 soak-17 kernel: [<ffffffffc097c7ae>] libcfs_call_trace+0x4e/0x60 [libcfs] Jan 4 21:57:38 soak-17 kernel: [<ffffffffc097c83c>] lbug_with_loc+0x4c/0xb0 [libcfs] Jan 4 21:57:38 soak-17 kernel: [<ffffffffc0c2666b>] kiblnd_check_sends_locked+0xd8b/0xd90 [ko2iblnd] Jan 4 21:57:38 soak-17 kernel: [<ffffffffc0538b5c>] ? mlx4_ib_post_recv+0x1dc/0x310 [mlx4_ib] Jan 4 21:57:38 soak-17 kernel: [<ffffffffc0c27f50>] kiblnd_post_rx+0x160/0x520 [ko2iblnd] Jan 4 21:57:38 soak-17 kernel: [<ffffffffc0c284ea>] kiblnd_recv+0x1da/0x7b0 [ko2iblnd] Jan 4 21:57:38 soak-17 kernel: [<ffffffffc0a00573>] lnet_ni_recv+0xc3/0x320 [lnet] Jan 4 21:57:38 soak-17 kernel: [<ffffffffc0a02e06>] lnet_parse_local+0x4c6/0xd40 [lnet] Jan 4 21:57:38 soak-17 kernel: [<ffffffff810c7705>] ? sched_clock_cpu+0x85/0xc0 Jan 4 21:57:38 soak-17 kernel: [<ffffffffc0a03f4a>] lnet_parse+0x8ca/0xfc0 [lnet] Jan 4 21:57:38 soak-17 kernel: [<ffffffffc0c261ac>] ? kiblnd_check_sends_locked+0x8cc/0xd90 [ko2iblnd] Jan 4 21:57:38 soak-17 kernel: [<ffffffff81029557>] ? __switch_to+0xd7/0x510 Jan 4 21:57:38 soak-17 kernel: [<ffffffffc0c28e63>] kiblnd_handle_rx+0x213/0x6b0 [ko2iblnd] Jan 4 21:57:38 soak-17 kernel: [<ffffffffc0c2facf>] kiblnd_scheduler+0xf0f/0x1150 [ko2iblnd] Jan 4 21:57:38 soak-17 kernel: [<ffffffff810ce55e>] ? dequeue_task_fair+0x41e/0x660 Jan 4 21:57:38 soak-17 kernel: [<ffffffff810c7705>] ? sched_clock_cpu+0x85/0xc0 Jan 4 21:57:38 soak-17 kernel: [<ffffffff810c4820>] ? default_wake_function+0x0/0x20 Jan 4 21:57:38 soak-17 kernel: [<ffffffffc0c2ebc0>] ? kiblnd_scheduler+0x0/0x1150 [ko2iblnd] Jan 4 21:57:38 soak-17 kernel: [<ffffffff810b099f>] kthread+0xcf/0xe0 Jan 4 21:57:38 soak-17 kernel: [<ffffffff810b08d0>] ? kthread+0x0/0xe0 Jan 4 21:57:38 soak-17 kernel: [<ffffffff816b4fd8>] ret_from_fork+0x58/0x90 Jan 4 21:57:38 soak-17 kernel: [<ffffffff810b08d0>] ? kthread+0x0/0xe0 Jan 4 21:57:38 soak-17 kernel: Multiple crash dumps available on Spirit |
| Comments |
| Comment by Amir Shehata (Inactive) [ 05/Jan/18 ] |
|
This is most likely related to: This only affects master. I'm investigating. |
| Comment by Gerrit Updater [ 05/Jan/18 ] |
|
Amir Shehata (amir.shehata@intel.com) uploaded a new patch: https://review.whamcloud.com/30751 |
| Comment by Cliff White (Inactive) [ 05/Jan/18 ] |
|
Testing the above patch on soak, appears to fix the immediate LBUG. |
| Comment by Amir Shehata (Inactive) [ 10/Jan/18 ] |
|
I checked b2_10, it doesn't look like |
| Comment by Cliff White (Inactive) [ 10/Jan/18 ] |
|
Ah, sorry wrong bug - my bad |
| Comment by Gerrit Updater [ 12/Jan/18 ] |
|
sorry, commit was added against the wrong ticket. |
| Comment by Cliff White (Inactive) [ 17/Jan/18 ] |
|
I am currently seeing this on a lustre-review-ib build version=2.10.56_86_gd8827a8 |
| Comment by Amir Shehata (Inactive) [ 18/Jan/18 ] |
|
The patch which fixes the issue hasn't landed yet. |
| Comment by Gerrit Updater [ 19/Jan/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30751/ |
| Comment by Joseph Gmitter (Inactive) [ 19/Jan/18 ] |
|
Landed to master for 2.11.0 |
| Comment by Gerrit Updater [ 12/Sep/18 ] |
|
Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33150 |