[LU-12233] Deadlock on LNet shutdown Created: 26/Apr/19 Updated: 29/Oct/20 Resolved: 25/Sep/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.13.0, Lustre 2.12.1, Lustre 2.12.3 |
| Fix Version/s: | Lustre 2.14.0, Lustre 2.12.6 |
| Type: | Bug | Priority: | Major |
| Reporter: | Chris Horn | Assignee: | Serguei Smirnov |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
I reproduced this issue with master and Cray's 2.12 branch. For completeness I'll note that my master was slightly modified so that I can configure LNet on Cray's hardware, and I also applied the fix from Here's the relevant git log. Commit '8cb7ccf54e' is on master. 86ef522cac LU-11756 o2iblnd: kib_conn leak 888adb9340 MRP-342 lnet: add config file support d661c584c6 Revert "LU-11838 lnet: change lnet_ipaddr_enumerate() to use for_each_netdev()" 4c681cf4ee Revert "LU-11838 o2iblnd: get IP address more directly." f4fe014620 Revert "LU-6399 lnet: socket cleanup" 8cb7ccf54e LU-11986 lnet: properly cleanup lnet debugfs files LNetNIFini() takes the ln_api_mutex and then shuts down LNet. It doesn't release the mutex until all teardown functions have returned. The message receive path also takes the ln_api_mutex in lnet_nid2peerni_locked(). /*
* Get a peer_ni for the given nid, create it if necessary. Takes a
* hold on the peer_ni.
*/
struct lnet_peer_ni *
lnet_nid2peerni_locked(lnet_nid_t nid, lnet_nid_t pref, int cpt)
{
struct lnet_peer_ni *lpni = NULL;
int rc;
if (the_lnet.ln_state != LNET_STATE_RUNNING)
return ERR_PTR(-ESHUTDOWN);
/*
* find if a peer_ni already exists.
* If so then just return that.
*/
lpni = lnet_find_peer_ni_locked(nid);
if (lpni)
return lpni;
/*
* Slow path:
* use the lnet_api_mutex to serialize the creation of the peer_ni
* and the creation/deletion of the local ni/net. When a local ni is
* created, if there exists a set of peer_nis on that network,
* they need to be traversed and updated. When a local NI is
* deleted, which could result in a network being deleted, then
* all peer nis on that network need to be removed as well.
*
* Creation through traffic should also be serialized with
* creation through DLC.
*/
lnet_net_unlock(cpt);
mutex_lock(&the_lnet.ln_api_mutex);
int
LNetNIFini()
{
mutex_lock(&the_lnet.ln_api_mutex);
LASSERT(the_lnet.ln_refcount > 0);
if (the_lnet.ln_refcount != 1) {
the_lnet.ln_refcount--;
} else {
LASSERT(!the_lnet.ln_niinit_self);
lnet_fault_fini();
lnet_router_debugfs_init();
lnet_peer_discovery_stop();
lnet_push_target_fini();
lnet_monitor_thr_stop();
lnet_ping_target_fini();
/* Teardown fns that use my own API functions BEFORE here */
the_lnet.ln_refcount = 0;
lnet_acceptor_stop();
lnet_destroy_routes();
lnet_shutdown_lndnets(); <<< the_lnet.ln_state = LNET_STATE_STOPPING; happens here
lnet_unprepare();
}
mutex_unlock(&the_lnet.ln_api_mutex);
return 0;
}
EXPORT_SYMBOL(LNetNIFini);
We can see there is a decent sized window where the deadlock can be hit. It is easy to reproduce for me. [root@snx11922n000 ~]# pdsh -g lustre modprobe lnet; lctl net up ; lctl list_nids ; lctl ping 10.12.0.50@o2ib40 ; lctl net down ; lustre_rmmod Sometimes the command needs to be repeated a couple of times. I believe this regression was introduced by: commit fa8b4e6357c53ea457ef6624b0b19bece0b0fdde
Author: Amir Shehata <amir.shehata@intel.com>
Date: Thu May 26 15:42:39 2016 -0700
LU-7734 lnet: peer/peer_ni handling adjustments
|
| Comments |
| Comment by Chris Horn [ 26/Apr/19 ] |
|
I reported this issue internally at Cray, but I don't have the cycles to work on it right now. If someone else is able to work out a patch that'd be great, and I'll be sure to update this ticket if Cray begins work on this issue. |
| Comment by Chris Horn [ 29/Apr/19 ] |
|
The specific instance I found was with iblnd. Maybe it's specific to that. LNetNIFini()->lnet_shutdown_lndnets()->lnet_shutdown_lndnet()->lnet_shutdown_lndni()->lnet_clear_zombies_nis_locked()->kiblnd_shutdown() <<<< We're holding ln_api_mutex In kiblnd_shutdown() we're waiting for all peers to disconnect: while (atomic_read(&net->ibn_npeers) != 0) {
i++;
/* power of 2? */
CDEBUG(((i & (-i)) == i) ? D_WARNING : D_NET,
"%s: waiting for %d peers to disconnect\n",
libcfs_nid2str(ni->ni_nid),
atomic_read(&net->ibn_npeers));
set_current_state(TASK_UNINTERRUPTIBLE);
schedule_timeout(cfs_time_seconds(1));
}
But the ib_cm thread is stuck trying to acquire the ln_api_mutex: Apr 25 19:48:15 snx11922n002 kernel: INFO: task kworker/0:0:16461 blocked for more than 120 seconds. Apr 25 19:48:15 snx11922n002 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 25 19:48:15 snx11922n002 kernel: kworker/0:0 D ffff880172949fa0 0 16461 2 0x00000000 Apr 25 19:48:15 snx11922n002 kernel: Workqueue: ib_cm cm_work_handler [ib_cm] Apr 25 19:48:15 snx11922n002 kernel: Call Trace: Apr 25 19:48:15 snx11922n002 kernel: [<ffffffff816b5329>] schedule_preempt_disabled+0x29/0x70 Apr 25 19:48:15 snx11922n002 kernel: [<ffffffff816b30d7>] __mutex_lock_slowpath+0xc7/0x1d0 Apr 25 19:48:15 snx11922n002 kernel: [<ffffffff816b24bf>] mutex_lock+0x1f/0x2f Apr 25 19:48:15 snx11922n002 kernel: [<ffffffffc0b2ed91>] lnet_nid2peerni_locked+0x71/0x150 [lnet] Apr 25 19:48:15 snx11922n002 kernel: [<ffffffffc0b1ee34>] lnet_parse+0x794/0x1260 [lnet] Apr 25 19:48:15 snx11922n002 kernel: [<ffffffffc0a42163>] kiblnd_handle_rx+0x213/0x6b0 [ko2iblnd] Apr 25 19:48:15 snx11922n002 kernel: [<ffffffffc0a4267f>] kiblnd_handle_early_rxs+0x7f/0x120 [ko2iblnd] Apr 25 19:48:15 snx11922n002 kernel: [<ffffffffc0a43576>] kiblnd_connreq_done+0x286/0x6c0 [ko2iblnd] Apr 25 19:48:15 snx11922n002 kernel: [<ffffffffc0a46327>] kiblnd_cm_callback+0x11e7/0x2390 [ko2iblnd] Apr 25 19:48:15 snx11922n002 kernel: [<ffffffff810aa05a>] ? __queue_delayed_work+0xaa/0x1a0 Apr 25 19:48:15 snx11922n002 kernel: [<ffffffff810aa3c1>] ? try_to_grab_pending+0xb1/0x160 Apr 25 19:48:15 snx11922n002 kernel: [<ffffffffc05013e0>] cma_ib_handler+0xc0/0x290 [rdma_cm] Apr 25 19:48:15 snx11922n002 kernel: [<ffffffffc04ed5ab>] cm_process_work+0x2b/0x130 [ib_cm] Apr 25 19:48:15 snx11922n002 kernel: [<ffffffffc04ef943>] cm_work_handler+0xaa3/0x12db [ib_cm] Apr 25 19:48:15 snx11922n002 kernel: [<ffffffff810abe2f>] process_one_work+0x17f/0x440 Apr 25 19:48:15 snx11922n002 kernel: [<ffffffff810acaf6>] worker_thread+0x126/0x3c0 Apr 25 19:48:15 snx11922n002 kernel: [<ffffffff810ac9d0>] ? manage_workers.isra.24+0x2a0/0x2a0 Apr 25 19:48:15 snx11922n002 kernel: [<ffffffff810b4031>] kthread+0xd1/0xe0 Apr 25 19:48:15 snx11922n002 kernel: [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40 Apr 25 19:48:15 snx11922n002 kernel: [<ffffffff816c155d>] ret_from_fork+0x5d/0xb0 Apr 25 19:48:15 snx11922n002 kernel: [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40 Apr 25 19:49:10 snx11922n002 kernel: LNet: 15887:0:(o2iblnd.c:3032:kiblnd_shutdown()) 10.12.0.50@o2ib40: waiting for 1 peers to disconnect Another time I saw kiblnd_sd threads stuck: Apr 26 00:00:06 snx11922n002 kernel: INFO: task kiblnd_sd_02_00:32191 blocked for more than 120 seconds. Apr 26 00:00:06 snx11922n002 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 26 00:00:06 snx11922n002 kernel: kiblnd_sd_02_00 D ffff881fce3adee0 0 32191 2 0x00000000 Apr 26 00:00:06 snx11922n002 kernel: Call Trace: Apr 26 00:00:06 snx11922n002 kernel: [<ffffffff810cc938>] ? __enqueue_entity+0x78/0x80 Apr 26 00:00:06 snx11922n002 kernel: [<ffffffff816b5329>] schedule_preempt_disabled+0x29/0x70 Apr 26 00:00:06 snx11922n002 kernel: [<ffffffff816b30d7>] __mutex_lock_slowpath+0xc7/0x1d0 Apr 26 00:00:06 snx11922n002 kernel: [<ffffffff816b24bf>] mutex_lock+0x1f/0x2f Apr 26 00:00:06 snx11922n002 kernel: [<ffffffffc0b799d1>] lnet_nid2peerni_locked+0x71/0x150 [lnet] Apr 26 00:00:06 snx11922n002 kernel: [<ffffffffc0b66ba1>] lnet_parse+0x791/0x11e0 [lnet] Apr 26 00:00:06 snx11922n002 kernel: [<ffffffffc090b153>] kiblnd_handle_rx+0x213/0x6b0 [ko2iblnd] Apr 26 00:00:06 snx11922n002 kernel: [<ffffffffc091228c>] kiblnd_scheduler+0xf3c/0x1180 [ko2iblnd] Apr 26 00:00:06 snx11922n002 kernel: [<ffffffff810cb0c5>] ? sched_clock_cpu+0x85/0xc0 Apr 26 00:00:06 snx11922n002 kernel: [<ffffffff8102954d>] ? __switch_to+0xcd/0x500 Apr 26 00:00:06 snx11922n002 kernel: [<ffffffff810c7c80>] ? wake_up_state+0x20/0x20 Apr 26 00:00:06 snx11922n002 kernel: [<ffffffffc0911350>] ? kiblnd_cq_event+0x90/0x90 [ko2iblnd] Apr 26 00:00:06 snx11922n002 kernel: [<ffffffff810b4031>] kthread+0xd1/0xe0 Apr 26 00:00:06 snx11922n002 kernel: [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40 Apr 26 00:00:06 snx11922n002 kernel: [<ffffffff816c155d>] ret_from_fork+0x5d/0xb0 Apr 26 00:00:06 snx11922n002 kernel: [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40 |
| Comment by Chris Horn [ 29/Jul/19 ] |
|
Started hitting this internally a bit more often. Bumping priority. |
| Comment by Chris Horn [ 12/Sep/19 ] |
|
Maybe this would work? diff --git a/lnet/lnet/api-ni.c b/lnet/lnet/api-ni.c
index d50c9395e2..f4d224d39f 100644
--- a/lnet/lnet/api-ni.c
+++ b/lnet/lnet/api-ni.c
@@ -2155,11 +2155,10 @@ lnet_shutdown_lndnets(void)
/* NB called holding the global mutex */
/* All quiet on the API front */
- LASSERT(the_lnet.ln_state == LNET_STATE_RUNNING);
+ LASSERT(the_lnet.ln_state == LNET_STATE_STOPPING);
LASSERT(the_lnet.ln_refcount == 0);
lnet_net_lock(LNET_LOCK_EX);
- the_lnet.ln_state = LNET_STATE_STOPPING;
while (!list_empty(&the_lnet.ln_nets)) {
/*
@@ -2746,6 +2745,10 @@ EXPORT_SYMBOL(LNetNIInit);
int
LNetNIFini()
{
+ lnet_net_lock(LNET_LOCK_EX);
+ the_lnet.ln_state = LNET_STATE_STOPPING;
+ lnet_net_unlock(LNET_LOCK_EX);
+
mutex_lock(&the_lnet.ln_api_mutex);
LASSERT(the_lnet.ln_refcount > 0);
|
| Comment by Chris Horn [ 06/Aug/20 ] |
|
There's another potential deadlock with shutdown and the discovery thread. lnet_peer_data_present() tries to take the ln_api_mutex and then check ln_state, but shutdown thread could already be holding the ln_api_mutex. static int lnet_peer_data_present(struct lnet_peer *lp)
...
mutex_lock(&the_lnet.ln_api_mutex);
if (the_lnet.ln_state != LNET_STATE_RUNNING) {
rc = -ESHUTDOWN;
goto out;
}
|
| Comment by Gerrit Updater [ 16/Sep/20 ] |
|
Serguei Smirnov (ssmirnov@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39933 |
| Comment by Gerrit Updater [ 25/Sep/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39933/ |
| Comment by Peter Jones [ 25/Sep/20 ] |
|
Landed for 2.14 |
| Comment by Gerrit Updater [ 07/Oct/20 ] |
|
Serguei Smirnov (ssmirnov@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40171 |
| Comment by Gerrit Updater [ 29/Oct/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40171/ |