Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12569

IBLND_CREDITS_HIGHWATER does not check connection queue depth

Details

    • 3
    • 9223372036854775807

    Description

      The IBLND_CREDITS_HIGHWATER check is used to decide at what # of credits a NOOP message needs to be sent in order to return credits (this is for the case when we send many immediate messages, which consume credits but do not get an acknowledgment, so their credits are not automatically returned).

      However, the check uses a global tunable:

      lnd_peercredits_hiw 

      This tunable is checked against the global peer credits value to make sure it is sane (see kiblnd_tunables_setup()), which is to say that it is less than the total number of credits.

      However, individual connections can have a different queue depth than the global setting (total credits for a connection is equal to the connection queue depth).

      That means if a connection queue depth differs (See particularly kiblnd_create_conn(), the "queue depth reduced" warning message for one case where this can happen) from the global value, it is possible for the highwater mark to be higher than the total number of credits.

      In this case, no NOOP messages will be sent, and it is possible to stall out a connection if both sides send many immediate messages at once.  Essentially, if both ends of a connection send enough immediate messages to exhaust credits, then neither side will send any more messages.  The high water mark is supposed to prevent this by having them send a NOOP before they reach this state.

      But if the highwater mark is greater than the number of credits, this will not occur, and the connection will stall out until a ping or other event causes credits to be returned.

      The solution should be simple - The highwater mark check needs to take in to account the queue depth of an individual connection.

      Attachments

        Issue Links

          Activity

            [LU-12569] IBLND_CREDITS_HIGHWATER does not check connection queue depth

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36254/
            Subject: LU-12569 o2iblnd: Make credits hiw connection aware
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 90ba471e367754ea6ddb9a95060591f46b95b0b6

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36254/ Subject: LU-12569 o2iblnd: Make credits hiw connection aware Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 90ba471e367754ea6ddb9a95060591f46b95b0b6

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36254
            Subject: LU-12569 o2iblnd: Make credits hiw connection aware
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 08b218eca6fc01a468b050d6606034b293a8d727

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36254 Subject: LU-12569 o2iblnd: Make credits hiw connection aware Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 08b218eca6fc01a468b050d6606034b293a8d727
            pjones Peter Jones added a comment -

            Landed for 2.13

            pjones Peter Jones added a comment - Landed for 2.13

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35578/
            Subject: LU-12569 o2iblnd: Make credits hiw connection aware
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 1b87e8f61781e48c31b4da647214d66addf2b90c

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35578/ Subject: LU-12569 o2iblnd: Make credits hiw connection aware Project: fs/lustre-release Branch: master Current Patch Set: Commit: 1b87e8f61781e48c31b4da647214d66addf2b90c

            Amir is planning to rework this one as part of fixing LU-10213.

            pfarrell Patrick Farrell (Inactive) added a comment - Amir is planning to rework this one as part of fixing LU-10213 .

            It's probably possible to write a test for this, but I'm not really an IB guy.

            The tricky part is getting all the messages in flight at once.

            We could probably delay receipt?
            Like, just hang the thread responsible for the interrupt for a few seconds...?

            On both sides of the connection, though.

            So:
            Just hang whatever function it is that runs the IRQ when a ko2ib message is received, so "receipt" doesn't occur for a few seconds.

            Then send a bunch of immediate messages from both sides, which will exhaust the credits.

            The other aspect of this is, if we want to trigger this specific issue, we have to engineer a reduced queue depth. That normally happens as part of this:
            " do

            { init_qp_attr->cap.max_send_wr = kiblnd_send_wrs(conn); init_qp_attr->cap.max_recv_wr = IBLND_RECV_WRS(conn); rc = rdma_create_qp(cmid, conn->ibc_hdev->ibh_pd, init_qp_attr); if (!rc || conn->ibc_queue_depth < 2) break; conn->ibc_queue_depth--; }

            while (rc);
            "
            Which is basically reducing max_send_wr and max_recv_wr each time until rdma_create_qp is successful.
            (IBLND_RECV_WRS and kiblnd_send_wrs are functions of ibc_queue_depth)

            Or possibly just reducing queue depth as a hack would work - Just reduce it by using a fail_loc here,
            since I think it should be valid/safe/etc to use a smaller queue depth than what we asked the
            hardware for. (This is just a guess, but it seems likely.)

            pfarrell Patrick Farrell (Inactive) added a comment - It's probably possible to write a test for this, but I'm not really an IB guy. The tricky part is getting all the messages in flight at once. We could probably delay receipt? Like, just hang the thread responsible for the interrupt for a few seconds...? On both sides of the connection, though. So: Just hang whatever function it is that runs the IRQ when a ko2ib message is received, so "receipt" doesn't occur for a few seconds. Then send a bunch of immediate messages from both sides, which will exhaust the credits. The other aspect of this is, if we want to trigger this specific issue, we have to engineer a reduced queue depth. That normally happens as part of this: " do { init_qp_attr->cap.max_send_wr = kiblnd_send_wrs(conn); init_qp_attr->cap.max_recv_wr = IBLND_RECV_WRS(conn); rc = rdma_create_qp(cmid, conn->ibc_hdev->ibh_pd, init_qp_attr); if (!rc || conn->ibc_queue_depth < 2) break; conn->ibc_queue_depth--; } while (rc); " Which is basically reducing max_send_wr and max_recv_wr each time until rdma_create_qp is successful. (IBLND_RECV_WRS and kiblnd_send_wrs are functions of ibc_queue_depth) Or possibly just reducing queue depth as a hack would work - Just reduce it by using a fail_loc here, since I think it should be valid/safe/etc to use a smaller queue depth than what we asked the hardware for. (This is just a guess, but it seems likely.)

            Patrick Farrell (pfarrell@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35578
            Subject: LU-12569 ko2iblnd: Make credits hiw connection aware
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: c7b414eab00bbff1210b28ddb108aee081682835

            gerrit Gerrit Updater added a comment - Patrick Farrell (pfarrell@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35578 Subject: LU-12569 ko2iblnd: Make credits hiw connection aware Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c7b414eab00bbff1210b28ddb108aee081682835

            People

              ashehata Amir Shehata (Inactive)
              pfarrell Patrick Farrell (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: