Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16213

kfilnd: Optimize issuing of hello messages to a peer

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      kfilnd <> kfabric <> kcxi_prov <-> Cassini has the following issue. If LNet is trying to send to a kfilnd peer which is down, the Cassini retry handler can take up to 60 seconds to cancel the corresponding message. When retrying, the Cassini retry handler takes control of hardware resources and does not release them until the retries are complete. If enough of these messages are sent to down peers, it is possible the Cassini retry handler can take control of all available hardware resources. Once this happens, Cassini cannot process new RDMA commands and back pressure will start occuring in kcxi_prov which gets propagated to kfilnd as an -EAGAIN. As seen on JT, this can results in single RDMA operations taking minutes to complete.

      To help prevent this issue, kfilnd should only send to peers it knows are up.

      Looking at the kfilnd code today, I believe we can have multiple hello messages inflight to a single. If the peer is down, this can result in a build up of hello messages where each hello message will take the CXI retry handler 60 seconds to complete. There should only be a single hello message inflight per peer.

      As a part of this work, kiflnd transaction may have to be queued until a hello message comes back. If the hello message results in a failure, all queued transactions should be finalized.

      Attachments

        Activity

          People

            hornc Chris Horn
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: