Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12569

IBLND_CREDITS_HIGHWATER does not check connection queue depth

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: Lustre 2.13.0, Lustre 2.12.3
    • Labels:
      None
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      The IBLND_CREDITS_HIGHWATER check is used to decide at what # of credits a NOOP message needs to be sent in order to return credits (this is for the case when we send many immediate messages, which consume credits but do not get an acknowledgment, so their credits are not automatically returned).

      However, the check uses a global tunable:

      lnd_peercredits_hiw 

      This tunable is checked against the global peer credits value to make sure it is sane (see kiblnd_tunables_setup()), which is to say that it is less than the total number of credits.

      However, individual connections can have a different queue depth than the global setting (total credits for a connection is equal to the connection queue depth).

      That means if a connection queue depth differs (See particularly kiblnd_create_conn(), the "queue depth reduced" warning message for one case where this can happen) from the global value, it is possible for the highwater mark to be higher than the total number of credits.

      In this case, no NOOP messages will be sent, and it is possible to stall out a connection if both sides send many immediate messages at once.  Essentially, if both ends of a connection send enough immediate messages to exhaust credits, then neither side will send any more messages.  The high water mark is supposed to prevent this by having them send a NOOP before they reach this state.

      But if the highwater mark is greater than the number of credits, this will not occur, and the connection will stall out until a ping or other event causes credits to be returned.

      The solution should be simple - The highwater mark check needs to take in to account the queue depth of an individual connection.

        Attachments

          Activity

            People

            • Assignee:
              ashehata Amir Shehata
              Reporter:
              pfarrell Patrick Farrell (Inactive)
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: