Details
-
Bug
-
Resolution: Not a Bug
-
Major
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
Potential deadlock introduced by commit:
commit ae0ac29348023b9d8df7783bff463d07e3762f82
Author: Chris Horn <chris.horn@hpe.com>
Date: Thu Aug 6 16:39:27 2020 -0500
LUS-9193 lnet: Transfer disc src NID when merging peers
struct lnet_peer *new_lp;
new_lp = lpni->lpni_peer_net->lpn_peer;
...
spin_lock(&lp->lp_lock);
spin_lock(&new_lp->lp_lock);
if (!(lp->lp_state & LNET_PEER_NO_DISCOVERY))
new_lp->lp_state &= ~LNET_PEER_NO_DISCOVERY;
if (lp->lp_state & LNET_PEER_MULTI_RAIL)
new_lp->lp_state |= LNET_PEER_MULTI_RAIL;
/* If we're processing a ping reply then we may be
* about to send a push to the peer that we ping'd.
* Since the ping reply that we're processing was
* received by lp, we need to set the discovery source
* NID for new_lp to the NID stored in lp.
*/
if (lp->lp_disc_src_nid != LNET_NID_ANY)
new_lp->lp_disc_src_nid = lp->lp_disc_src_nid;
spin_unlock(&new_lp->lp_lock);
spin_unlock(&lp->lp_lock);
This logic reconciles a situation where the primary NID for a known peer has changed. It works for the case where we hadn't yet fully discovered a peer, but if the peer had been previously discovered, and then it deletes its primary NID, this logic results in both "lp" and "new_lp" pointing to the same peer object. Thus we attempt to lock the same lp_lock twice and we deadlock.