Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
3
-
9223372036854775807
Description
In lnet_discover_peer_locked() after the loop we unlock and relock the LNet cpt lock.
lnet_net_lock(LNET_LOCK_EX); lnet_peer_decref_locked(lp); /* Peer may have changed */ lp = lpni->lpni_peer_net->lpn_peer; } finish_wait(&lp->lp_dc_waitq, &wait); lnet_net_unlock(LNET_LOCK_EX); lnet_net_lock(cpt); if (signal_pending(current)) rc = -EINTR; else if (the_lnet.ln_dc_state != LNET_DC_STATE_RUNNING) rc = -ESHUTDOWN; else if (lp->lp_dc_error) rc = lp->lp_dc_error; else if (!block) CDEBUG(D_NET, "non-blocking discovery\n"); else if (!lnet_peer_is_uptodate(lp)) goto again; CDEBUG(D_NET, "peer %s NID %s: %d. %s\n", (lp ? libcfs_nid2str(lp->lp_primary_nid) : "(none)"), libcfs_nid2str(lpni->lpni_nid), rc, (!block) ? "pending discovery" : "discovery complete"); return rc;
After relocking lp may be invalid and we need to refresh it from lpni. Or move the unlock and lock down and adjust the again label. Do we need LNET_LOCK_EX to access lp?
Attachments
Issue Links
- is related to
-
LU-10281 conf-sanity: test_54a hung at lnet_discover_peer_locked()
- Open