[LU-14223] Potential deadlock in lnet_peer_data_present() Created: 15/Dec/20 Updated: 15/Dec/20 Resolved: 15/Dec/20 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Chris Horn | Assignee: | Chris Horn |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Potential deadlock introduced by commit: commit ae0ac29348023b9d8df7783bff463d07e3762f82
Author: Chris Horn <chris.horn@hpe.com>
Date: Thu Aug 6 16:39:27 2020 -0500
LUS-9193 lnet: Transfer disc src NID when merging peers
struct lnet_peer *new_lp;
new_lp = lpni->lpni_peer_net->lpn_peer;
...
spin_lock(&lp->lp_lock);
spin_lock(&new_lp->lp_lock);
if (!(lp->lp_state & LNET_PEER_NO_DISCOVERY))
new_lp->lp_state &= ~LNET_PEER_NO_DISCOVERY;
if (lp->lp_state & LNET_PEER_MULTI_RAIL)
new_lp->lp_state |= LNET_PEER_MULTI_RAIL;
/* If we're processing a ping reply then we may be
* about to send a push to the peer that we ping'd.
* Since the ping reply that we're processing was
* received by lp, we need to set the discovery source
* NID for new_lp to the NID stored in lp.
*/
if (lp->lp_disc_src_nid != LNET_NID_ANY)
new_lp->lp_disc_src_nid = lp->lp_disc_src_nid;
spin_unlock(&new_lp->lp_lock);
spin_unlock(&lp->lp_lock);
This logic reconciles a situation where the primary NID for a known peer has changed. It works for the case where we hadn't yet fully discovered a peer, but if the peer had been previously discovered, and then it deletes its primary NID, this logic results in both "lp" and "new_lp" pointing to the same peer object. Thus we attempt to lock the same lp_lock twice and we deadlock. |
| Comments |
| Comment by Chris Horn [ 15/Dec/20 ] |
|
I did not realize that the patch which introduced the regression hadn't yet landed for master. Closing as not a bug. (I'll fix the regression in the patch for |