Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10459

LBUG o2iblnd_cb.c:991:kiblnd_check_sends_locked()) ASSERTION( conn->ibc_nsends_posted <= conn->ibc_queue_depth ) failed:

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.11.0
    • Lustre 2.11.0
    • Soak performance cluster, version=2.10.56_84_gd645c72, RHEL 7.4 kernel
    • 3
    • 9223372036854775807

    Description

      LBUG occurs immediately when we try to do any IO on clients. Multiple clients impacted

      Jan  4 21:57:38 soak-17 kernel: LNetError: 12570:0:(o2iblnd_cb.c:991:kiblnd_check_sends_locked()) ASSERTION( conn->ibc_nsends_posted <= conn->ibc_queue_depth ) failed:
      Jan  4 21:57:38 soak-17 kernel: LNetError: 12570:0:(o2iblnd_cb.c:991:kiblnd_check_sends_locked()) LBUG
      Jan  4 21:57:38 soak-17 kernel: Pid: 12570, comm: kiblnd_sd_00_00
      Jan  4 21:57:38 soak-17 kernel: #012Call Trace:
      Jan  4 21:57:38 soak-17 kernel: [<ffffffffc097c7ae>] libcfs_call_trace+0x4e/0x60 [libcfs]
      Jan  4 21:57:38 soak-17 kernel: [<ffffffffc097c83c>] lbug_with_loc+0x4c/0xb0 [libcfs]
      Jan  4 21:57:38 soak-17 kernel: [<ffffffffc0c2666b>] kiblnd_check_sends_locked+0xd8b/0xd90 [ko2iblnd]
      Jan  4 21:57:38 soak-17 kernel: [<ffffffffc0538b5c>] ? mlx4_ib_post_recv+0x1dc/0x310 [mlx4_ib]
      Jan  4 21:57:38 soak-17 kernel: [<ffffffffc0c27f50>] kiblnd_post_rx+0x160/0x520 [ko2iblnd]
      Jan  4 21:57:38 soak-17 kernel: [<ffffffffc0c284ea>] kiblnd_recv+0x1da/0x7b0 [ko2iblnd]
      Jan  4 21:57:38 soak-17 kernel: [<ffffffffc0a00573>] lnet_ni_recv+0xc3/0x320 [lnet]
      Jan  4 21:57:38 soak-17 kernel: [<ffffffffc0a02e06>] lnet_parse_local+0x4c6/0xd40 [lnet]
      Jan  4 21:57:38 soak-17 kernel: [<ffffffff810c7705>] ? sched_clock_cpu+0x85/0xc0 
      Jan  4 21:57:38 soak-17 kernel: [<ffffffffc0a03f4a>] lnet_parse+0x8ca/0xfc0 [lnet]
      Jan  4 21:57:38 soak-17 kernel: [<ffffffffc0c261ac>] ? kiblnd_check_sends_locked+0x8cc/0xd90 [ko2iblnd]
      Jan  4 21:57:38 soak-17 kernel: [<ffffffff81029557>] ? __switch_to+0xd7/0x510
      Jan  4 21:57:38 soak-17 kernel: [<ffffffffc0c28e63>] kiblnd_handle_rx+0x213/0x6b0 [ko2iblnd]
      Jan  4 21:57:38 soak-17 kernel: [<ffffffffc0c2facf>] kiblnd_scheduler+0xf0f/0x1150 [ko2iblnd]
      Jan  4 21:57:38 soak-17 kernel: [<ffffffff810ce55e>] ? dequeue_task_fair+0x41e/0x660
      Jan  4 21:57:38 soak-17 kernel: [<ffffffff810c7705>] ? sched_clock_cpu+0x85/0xc0
      Jan  4 21:57:38 soak-17 kernel: [<ffffffff810c4820>] ? default_wake_function+0x0/0x20
      Jan  4 21:57:38 soak-17 kernel: [<ffffffffc0c2ebc0>] ? kiblnd_scheduler+0x0/0x1150 [ko2iblnd]
      Jan  4 21:57:38 soak-17 kernel: [<ffffffff810b099f>] kthread+0xcf/0xe0
      Jan  4 21:57:38 soak-17 kernel: [<ffffffff810b08d0>] ? kthread+0x0/0xe0
      Jan  4 21:57:38 soak-17 kernel: [<ffffffff816b4fd8>] ret_from_fork+0x58/0x90
      Jan  4 21:57:38 soak-17 kernel: [<ffffffff810b08d0>] ? kthread+0x0/0xe0
      Jan  4 21:57:38 soak-17 kernel:
      

      Multiple crash dumps available on Spirit

      Attachments

        Issue Links

          Activity

            [LU-10459] LBUG o2iblnd_cb.c:991:kiblnd_check_sends_locked()) ASSERTION( conn->ibc_nsends_posted <= conn->ibc_queue_depth ) failed:

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33150
            Subject: LU-10459 lnd: throttle tx based on queue depth
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: dce2da916afe3fa474e2199b4993c91ced4e45cf

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33150 Subject: LU-10459 lnd: throttle tx based on queue depth Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: dce2da916afe3fa474e2199b4993c91ced4e45cf

            Landed to master for 2.11.0

            jgmitter Joseph Gmitter (Inactive) added a comment - Landed to master for 2.11.0

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30751/
            Subject: LU-10459 lnd: throttle tx based on queue depth
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e86f55798ca7bc8f7fe22dd48c9d9f52c1bb029a

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30751/ Subject: LU-10459 lnd: throttle tx based on queue depth Project: fs/lustre-release Branch: master Current Patch Set: Commit: e86f55798ca7bc8f7fe22dd48c9d9f52c1bb029a

            The patch which fixes the issue hasn't landed yet.

            ashehata Amir Shehata (Inactive) added a comment - The patch which fixes the issue hasn't landed yet.

            I am currently seeing this on a lustre-review-ib build version=2.10.56_86_gd8827a8

            cliffw Cliff White (Inactive) added a comment - I am currently seeing this on a lustre-review-ib build version=2.10.56_86_gd8827a8
            gerrit Gerrit Updater added a comment - - edited

            sorry, commit was added against the wrong ticket.

            gerrit Gerrit Updater added a comment - - edited sorry, commit was added against the wrong ticket.

            People

              ashehata Amir Shehata (Inactive)
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: