Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12233

Deadlock on LNet shutdown

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Lustre 2.13.0, Lustre 2.12.1, Lustre 2.12.3
    • Lustre 2.14.0, Lustre 2.12.6
    • None
    • 3
    • 9223372036854775807

    Description

      I reproduced this issue with master and Cray's 2.12 branch. For completeness I'll note that my master was slightly modified so that I can configure LNet on Cray's hardware, and I also applied the fix from LU-11756.

      Here's the relevant git log. Commit '8cb7ccf54e' is on master.

      86ef522cac LU-11756 o2iblnd: kib_conn leak
      888adb9340 MRP-342 lnet: add config file support
      d661c584c6 Revert "LU-11838 lnet: change lnet_ipaddr_enumerate() to use for_each_netdev()"
      4c681cf4ee Revert "LU-11838 o2iblnd: get IP address more directly."
      f4fe014620 Revert "LU-6399 lnet: socket cleanup"
      8cb7ccf54e LU-11986 lnet: properly cleanup lnet debugfs files
      

      LNetNIFini() takes the ln_api_mutex and then shuts down LNet. It doesn't release the mutex until all teardown functions have returned.

      The message receive path also takes the ln_api_mutex in lnet_nid2peerni_locked().
      kgnilnd_check_fma_rx->lnet_parse->lnet_nid2peerni_locked
      kiblnd_handle_rx->lnet_parse->lnet_nid2peerni_locked
      ksocknal_process_receive->lnet_parse->lnet_nid2peerni_locked

      /*
       * Get a peer_ni for the given nid, create it if necessary. Takes a
       * hold on the peer_ni.
       */
      struct lnet_peer_ni *
      lnet_nid2peerni_locked(lnet_nid_t nid, lnet_nid_t pref, int cpt)
      {
              struct lnet_peer_ni *lpni = NULL;
              int rc;
      
              if (the_lnet.ln_state != LNET_STATE_RUNNING)
                      return ERR_PTR(-ESHUTDOWN);
      
              /*
               * find if a peer_ni already exists.
               * If so then just return that.
               */
              lpni = lnet_find_peer_ni_locked(nid);
              if (lpni)
                      return lpni;
      
              /*
               * Slow path:
               * use the lnet_api_mutex to serialize the creation of the peer_ni
               * and the creation/deletion of the local ni/net. When a local ni is
               * created, if there exists a set of peer_nis on that network,
               * they need to be traversed and updated. When a local NI is
               * deleted, which could result in a network being deleted, then
               * all peer nis on that network need to be removed as well.
               *
               * Creation through traffic should also be serialized with
               * creation through DLC.
               */
              lnet_net_unlock(cpt);
              mutex_lock(&the_lnet.ln_api_mutex);
      
      int
      LNetNIFini()
      {
              mutex_lock(&the_lnet.ln_api_mutex);
      
              LASSERT(the_lnet.ln_refcount > 0);
      
              if (the_lnet.ln_refcount != 1) {
                      the_lnet.ln_refcount--;
              } else {
                      LASSERT(!the_lnet.ln_niinit_self);
      
                      lnet_fault_fini();
      
                      lnet_router_debugfs_init();
                      lnet_peer_discovery_stop();
                      lnet_push_target_fini();
                      lnet_monitor_thr_stop();
                      lnet_ping_target_fini();
      
                      /* Teardown fns that use my own API functions BEFORE here */
                      the_lnet.ln_refcount = 0;
      
                      lnet_acceptor_stop();
                      lnet_destroy_routes();
                      lnet_shutdown_lndnets(); <<<  the_lnet.ln_state = LNET_STATE_STOPPING; happens here
                      lnet_unprepare();
              }
      
              mutex_unlock(&the_lnet.ln_api_mutex);
              return 0;
      }
      EXPORT_SYMBOL(LNetNIFini);

      We can see there is a decent sized window where the deadlock can be hit.

      It is easy to reproduce for me.

      [root@snx11922n000 ~]# pdsh -g lustre modprobe lnet; lctl net up ; lctl list_nids ; lctl ping 10.12.0.50@o2ib40 ; lctl net down ; lustre_rmmod

      Sometimes the command needs to be repeated a couple of times.

      I believe this regression was introduced by:

      commit fa8b4e6357c53ea457ef6624b0b19bece0b0fdde
      Author: Amir Shehata <amir.shehata@intel.com>
      Date:   Thu May 26 15:42:39 2016 -0700
      
          LU-7734 lnet: peer/peer_ni handling adjustments
      

      Attachments

        Issue Links

          Activity

            People

              ssmirnov Serguei Smirnov
              hornc Chris Horn
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: