[LU-9917] lnet_discover_peer_locked() must refresh lp after unlock and lock Created: 25/Aug/17 Updated: 27/Nov/17 Resolved: 10/Sep/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.11.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | John Hammond | Assignee: | Amir Shehata (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | lnet, multi-rail | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
In lnet_discover_peer_locked() after the loop we unlock and relock the LNet cpt lock. lnet_net_lock(LNET_LOCK_EX);
lnet_peer_decref_locked(lp);
/* Peer may have changed */
lp = lpni->lpni_peer_net->lpn_peer;
}
finish_wait(&lp->lp_dc_waitq, &wait);
lnet_net_unlock(LNET_LOCK_EX);
lnet_net_lock(cpt);
if (signal_pending(current))
rc = -EINTR;
else if (the_lnet.ln_dc_state != LNET_DC_STATE_RUNNING)
rc = -ESHUTDOWN;
else if (lp->lp_dc_error)
rc = lp->lp_dc_error;
else if (!block)
CDEBUG(D_NET, "non-blocking discovery\n");
else if (!lnet_peer_is_uptodate(lp))
goto again;
CDEBUG(D_NET, "peer %s NID %s: %d. %s\n",
(lp ? libcfs_nid2str(lp->lp_primary_nid) : "(none)"),
libcfs_nid2str(lpni->lpni_nid), rc,
(!block) ? "pending discovery" : "discovery complete");
return rc;
After relocking lp may be invalid and we need to refresh it from lpni. Or move the unlock and lock down and adjust the again label. Do we need LNET_LOCK_EX to access lp? |
| Comments |
| Comment by Gerrit Updater [ 28/Aug/17 ] |
|
Amir Shehata (amir.shehata@intel.com) uploaded a new patch: https://review.whamcloud.com/28772 |
| Comment by Gerrit Updater [ 10/Sep/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28772/ |
| Comment by Peter Jones [ 10/Sep/17 ] |
|
Landed for 2.11 |