[LU-9917] lnet_discover_peer_locked() must refresh lp after unlock and lock Created: 25/Aug/17  Updated: 27/Nov/17  Resolved: 10/Sep/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.11.0

Type: Bug Priority: Minor
Reporter: John Hammond Assignee: Amir Shehata (Inactive)
Resolution: Fixed Votes: 0
Labels: lnet, multi-rail

Issue Links:
Related
is related to LU-10281 conf-sanity: test_54a hung at lnet_di... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

In lnet_discover_peer_locked() after the loop we unlock and relock the LNet cpt lock.

                lnet_net_lock(LNET_LOCK_EX);
                lnet_peer_decref_locked(lp);
                /* Peer may have changed */
                lp = lpni->lpni_peer_net->lpn_peer;
        }
        finish_wait(&lp->lp_dc_waitq, &wait);

        lnet_net_unlock(LNET_LOCK_EX);
        lnet_net_lock(cpt);

        if (signal_pending(current))
                rc = -EINTR;
        else if (the_lnet.ln_dc_state != LNET_DC_STATE_RUNNING)
                rc = -ESHUTDOWN;
        else if (lp->lp_dc_error)
                rc = lp->lp_dc_error;
        else if (!block)
                CDEBUG(D_NET, "non-blocking discovery\n");
        else if (!lnet_peer_is_uptodate(lp))
                goto again;
        CDEBUG(D_NET, "peer %s NID %s: %d. %s\n",
               (lp ? libcfs_nid2str(lp->lp_primary_nid) : "(none)"),
               libcfs_nid2str(lpni->lpni_nid), rc,
               (!block) ? "pending discovery" : "discovery complete");

        return rc;

After relocking lp may be invalid and we need to refresh it from lpni. Or move the unlock and lock down and adjust the again label. Do we need LNET_LOCK_EX to access lp?



 Comments   
Comment by Gerrit Updater [ 28/Aug/17 ]

Amir Shehata (amir.shehata@intel.com) uploaded a new patch: https://review.whamcloud.com/28772
Subject: LU-9917 lnet: rediscover peer if it changed
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 274c7bdf1283dff46a7e8f41e06c5c4b199c98f5

Comment by Gerrit Updater [ 10/Sep/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28772/
Subject: LU-9917 lnet: rediscover peer if it changed
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 1fc4ed3ac40ab0e11b1c59d7d147a100636cbda0

Comment by Peter Jones [ 10/Sep/17 ]

Landed for 2.11

Generated at Sat Feb 10 02:30:27 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.