Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
None
-
3
-
9223372036854775807
Description
In 2.15, lnet_peer_add() takes an lnet_nid_t nid argument, but the backport uses an uninitalized large nid variable in various places. This results in incorrect behavior.
static int lnet_peer_add(lnet_nid_t nid4, unsigned int flags) { struct lnet_nid nid; ... lpni = lnet_find_peer_ni_locked(nid4); if (lpni) { /* A peer with this NID already exists. */ lp = lpni->lpni_peer_net->lpn_peer; lnet_peer_ni_decref_locked(lpni); if (lp->lp_state & LNET_PEER_CONFIGURED) { if (lnet_nid_to_nid4(&lp->lp_primary_nid) != nid4) rc = -EEXIST; else if ((lp->lp_state ^ flags) & LNET_PEER_MULTI_RAIL) rc = -EPERM; goto out; } else if (lp->lp_state & LNET_PEER_LOCK_PRIMARY) { if (nid_same(&lp->lp_primary_nid, &nid)) <<<<<<<<<<< 'nid' not initialized rc = -EEXIST; else rc = -EALREADY; goto out; } else if (!(flags & (LNET_PEER_LOCK_PRIMARY | LNET_PEER_CONFIGURED))) { if (nid_same(&lp->lp_primary_nid, &nid)) { <<<<<<<<<<< 'nid' not initialized rc = -EEXIST; goto out; } } rc = lnet_peer_del(lp); if (rc) goto out; } /* Create peer, peer_net, and peer_ni. */ rc = -ENOMEM; lnet_nid4_to_nid(nid4, &nid); <<<<<<<<<<<<< 'nid' initialized here
Three cases to consider w.r.t. impact of this bug:
- Statically defined peers.
- Peers that are defined statically will have LNET_PEER_CONFIGURED bit set in lp_state, so they will not hit the buggy code.
- lock_prim_nid enabled (default).
- In this case, peers will have LNET_PEER_LOCK_PRIMARY bit set in lp_state. The bug causes lnet_peer_add() to return -EALREADY instead of -EEXIST for some cases.
- Only LNetAddPeer() cares about this status. When LNetAddPeer() sees -EALREADY it will perform a lookup of "already existing" peer to grab its primary NID. In the EEXIST case, this is just the same peer so it is kind of a no-op. There should not be a problem here.
- lock_prim_nid disabled.
- In this case the bug prevents us from detecting when we are trying to add a peer that already exists. This causes us to unnecessarily delete and re-create peers. This should be benign as the DLC code is intended to allow deleting and creating peers at will. lock_prim_nid is enabled by default so we can avoid this case. At the same time, the presence of the bug will limit our ability to provide flexibility in enabling/disabling the feature.
Attachments
Issue Links
- is related to
-
LU-14668 LNet: do discovery in the background
- Resolved