Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10459

LBUG o2iblnd_cb.c:991:kiblnd_check_sends_locked()) ASSERTION( conn->ibc_nsends_posted <= conn->ibc_queue_depth ) failed:

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.11.0
    • Lustre 2.11.0
    • Soak performance cluster, version=2.10.56_84_gd645c72, RHEL 7.4 kernel
    • 3
    • 9223372036854775807

    Description

      LBUG occurs immediately when we try to do any IO on clients. Multiple clients impacted

      Jan  4 21:57:38 soak-17 kernel: LNetError: 12570:0:(o2iblnd_cb.c:991:kiblnd_check_sends_locked()) ASSERTION( conn->ibc_nsends_posted <= conn->ibc_queue_depth ) failed:
      Jan  4 21:57:38 soak-17 kernel: LNetError: 12570:0:(o2iblnd_cb.c:991:kiblnd_check_sends_locked()) LBUG
      Jan  4 21:57:38 soak-17 kernel: Pid: 12570, comm: kiblnd_sd_00_00
      Jan  4 21:57:38 soak-17 kernel: #012Call Trace:
      Jan  4 21:57:38 soak-17 kernel: [<ffffffffc097c7ae>] libcfs_call_trace+0x4e/0x60 [libcfs]
      Jan  4 21:57:38 soak-17 kernel: [<ffffffffc097c83c>] lbug_with_loc+0x4c/0xb0 [libcfs]
      Jan  4 21:57:38 soak-17 kernel: [<ffffffffc0c2666b>] kiblnd_check_sends_locked+0xd8b/0xd90 [ko2iblnd]
      Jan  4 21:57:38 soak-17 kernel: [<ffffffffc0538b5c>] ? mlx4_ib_post_recv+0x1dc/0x310 [mlx4_ib]
      Jan  4 21:57:38 soak-17 kernel: [<ffffffffc0c27f50>] kiblnd_post_rx+0x160/0x520 [ko2iblnd]
      Jan  4 21:57:38 soak-17 kernel: [<ffffffffc0c284ea>] kiblnd_recv+0x1da/0x7b0 [ko2iblnd]
      Jan  4 21:57:38 soak-17 kernel: [<ffffffffc0a00573>] lnet_ni_recv+0xc3/0x320 [lnet]
      Jan  4 21:57:38 soak-17 kernel: [<ffffffffc0a02e06>] lnet_parse_local+0x4c6/0xd40 [lnet]
      Jan  4 21:57:38 soak-17 kernel: [<ffffffff810c7705>] ? sched_clock_cpu+0x85/0xc0 
      Jan  4 21:57:38 soak-17 kernel: [<ffffffffc0a03f4a>] lnet_parse+0x8ca/0xfc0 [lnet]
      Jan  4 21:57:38 soak-17 kernel: [<ffffffffc0c261ac>] ? kiblnd_check_sends_locked+0x8cc/0xd90 [ko2iblnd]
      Jan  4 21:57:38 soak-17 kernel: [<ffffffff81029557>] ? __switch_to+0xd7/0x510
      Jan  4 21:57:38 soak-17 kernel: [<ffffffffc0c28e63>] kiblnd_handle_rx+0x213/0x6b0 [ko2iblnd]
      Jan  4 21:57:38 soak-17 kernel: [<ffffffffc0c2facf>] kiblnd_scheduler+0xf0f/0x1150 [ko2iblnd]
      Jan  4 21:57:38 soak-17 kernel: [<ffffffff810ce55e>] ? dequeue_task_fair+0x41e/0x660
      Jan  4 21:57:38 soak-17 kernel: [<ffffffff810c7705>] ? sched_clock_cpu+0x85/0xc0
      Jan  4 21:57:38 soak-17 kernel: [<ffffffff810c4820>] ? default_wake_function+0x0/0x20
      Jan  4 21:57:38 soak-17 kernel: [<ffffffffc0c2ebc0>] ? kiblnd_scheduler+0x0/0x1150 [ko2iblnd]
      Jan  4 21:57:38 soak-17 kernel: [<ffffffff810b099f>] kthread+0xcf/0xe0
      Jan  4 21:57:38 soak-17 kernel: [<ffffffff810b08d0>] ? kthread+0x0/0xe0
      Jan  4 21:57:38 soak-17 kernel: [<ffffffff816b4fd8>] ret_from_fork+0x58/0x90
      Jan  4 21:57:38 soak-17 kernel: [<ffffffff810b08d0>] ? kthread+0x0/0xe0
      Jan  4 21:57:38 soak-17 kernel:
      

      Multiple crash dumps available on Spirit

      Attachments

        Issue Links

          Activity

            [LU-10459] LBUG o2iblnd_cb.c:991:kiblnd_check_sends_locked()) ASSERTION( conn->ibc_nsends_posted <= conn->ibc_queue_depth ) failed:

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33150
            Subject: LU-10459 lnd: throttle tx based on queue depth
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: dce2da916afe3fa474e2199b4993c91ced4e45cf

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33150 Subject: LU-10459 lnd: throttle tx based on queue depth Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: dce2da916afe3fa474e2199b4993c91ced4e45cf

            Landed to master for 2.11.0

            jgmitter Joseph Gmitter (Inactive) added a comment - Landed to master for 2.11.0

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30751/
            Subject: LU-10459 lnd: throttle tx based on queue depth
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e86f55798ca7bc8f7fe22dd48c9d9f52c1bb029a

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30751/ Subject: LU-10459 lnd: throttle tx based on queue depth Project: fs/lustre-release Branch: master Current Patch Set: Commit: e86f55798ca7bc8f7fe22dd48c9d9f52c1bb029a

            The patch which fixes the issue hasn't landed yet.

            ashehata Amir Shehata (Inactive) added a comment - The patch which fixes the issue hasn't landed yet.

            I am currently seeing this on a lustre-review-ib build version=2.10.56_86_gd8827a8

            cliffw Cliff White (Inactive) added a comment - I am currently seeing this on a lustre-review-ib build version=2.10.56_86_gd8827a8
            gerrit Gerrit Updater added a comment - - edited

            sorry, commit was added against the wrong ticket.

            gerrit Gerrit Updater added a comment - - edited sorry, commit was added against the wrong ticket.

            Ah, sorry wrong bug - my bad

            cliffw Cliff White (Inactive) added a comment - Ah, sorry wrong bug - my bad

            I checked b2_10, it doesn't look like LU-10291 lnd: remove concurrent_sends tunable was ported over, so I'm wondering if this is the same issue. That assert was hit due to the above patch.

            ashehata Amir Shehata (Inactive) added a comment - I checked b2_10, it doesn't look like LU-10291 lnd: remove concurrent_sends tunable was ported over, so I'm wondering if this is the same issue. That assert was hit due to the above patch.

            Testing the above patch on soak, appears to fix the immediate LBUG.
            Soak has been running now for about 30 minutes, will see how we do.

            cliffw Cliff White (Inactive) added a comment - Testing the above patch on soak, appears to fix the immediate LBUG. Soak has been running now for about 30 minutes, will see how we do.

            Amir Shehata (amir.shehata@intel.com) uploaded a new patch: https://review.whamcloud.com/30751
            Subject: LU-10459 lnd: throttle tx based on queue depth
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 86d289fed3c0eeccc3a0650d7e5a842391d11c3e

            gerrit Gerrit Updater added a comment - Amir Shehata (amir.shehata@intel.com) uploaded a new patch: https://review.whamcloud.com/30751 Subject: LU-10459 lnd: throttle tx based on queue depth Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 86d289fed3c0eeccc3a0650d7e5a842391d11c3e

            People

              ashehata Amir Shehata (Inactive)
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: