Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14223

Potential deadlock in lnet_peer_data_present()

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Major
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Potential deadlock introduced by commit:

      commit ae0ac29348023b9d8df7783bff463d07e3762f82
      Author: Chris Horn <chris.horn@hpe.com>
      Date:   Thu Aug 6 16:39:27 2020 -0500
      
          LUS-9193 lnet: Transfer disc src NID when merging peers
      
                              struct lnet_peer *new_lp;
                              new_lp = lpni->lpni_peer_net->lpn_peer;
      ...
                              spin_lock(&lp->lp_lock);
                              spin_lock(&new_lp->lp_lock);
                              if (!(lp->lp_state & LNET_PEER_NO_DISCOVERY))
                                      new_lp->lp_state &= ~LNET_PEER_NO_DISCOVERY;
                              if (lp->lp_state & LNET_PEER_MULTI_RAIL)
                                      new_lp->lp_state |= LNET_PEER_MULTI_RAIL;
                              /* If we're processing a ping reply then we may be
                               * about to send a push to the peer that we ping'd.
                               * Since the ping reply that we're processing was
                               * received by lp, we need to set the discovery source
                               * NID for new_lp to the NID stored in lp.
                               */
                              if (lp->lp_disc_src_nid != LNET_NID_ANY)
                                      new_lp->lp_disc_src_nid = lp->lp_disc_src_nid;
                              spin_unlock(&new_lp->lp_lock);
                              spin_unlock(&lp->lp_lock);
      

      This logic reconciles a situation where the primary NID for a known peer has changed. It works for the case where we hadn't yet fully discovered a peer, but if the peer had been previously discovered, and then it deletes its primary NID, this logic results in both "lp" and "new_lp" pointing to the same peer object. Thus we attempt to lock the same lp_lock twice and we deadlock.

      Attachments

        Activity

          People

            hornc Chris Horn
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: