[LU-13277] Potential deadlock in lnet_peer_discovery Created: 20/Feb/20  Updated: 05/Mar/20  Resolved: 05/Mar/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Minor
Reporter: Chris Horn Assignee: Chris Horn
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Potential deadlock here when LNet is shutting down:

static int lnet_peer_discovery(void *arg)
...
        for (;;) {
...
                lnet_net_lock(LNET_LOCK_EX);
                if (the_lnet.ln_dc_state == LNET_DC_STATE_STOPPING)
                        break;
...
        }

        CDEBUG(D_NET, "stopping\n");
        /*
         * Clean up before telling lnet_peer_discovery_stop() that
         * we're done. Use wake_up() below to somewhat reduce the
         * size of the thundering herd if there are multiple threads
         * waiting on discovery of a single peer.
         */

        /* Queue cleanup 1: stop all pending pings and pushes. */
        lnet_net_lock(LNET_LOCK_EX); <<< Deadlock
...


 Comments   
Comment by Gerrit Updater [ 21/Feb/20 ]

Chris Horn (chris.horn@hpe.com) uploaded a new patch: https://review.whamcloud.com/37675
Subject: LU-13277 lnet: Discovery thread can deadlock on shutdown
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ea1eb3b2b7aa1f5f05498739a8a55da16bf9f4ac

Comment by Gerrit Updater [ 05/Mar/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37675/
Subject: LU-13277 lnet: Discovery thread can deadlock on shutdown
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 82bb93410fc6f74e32ad74339ece5b4f62dc9967

Comment by Peter Jones [ 05/Mar/20 ]

Landed for 2.14

Generated at Sat Feb 10 02:59:54 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.