Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13362

Race between message finalize and discovery reply processing

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.14.0
    • Lustre 2.14.0, Lustre 2.12.5
    • None
    • 3
    • 9223372036854775807

    Description

      Two threads trying to acquire lnet_net_lock(1):

      crash_x86_64> p ((struct cfs_percpt_lock *)0xffff8808299ed880)->pcl_locks[1]
      $11 = (spinlock_t *) 0xffff88076447c640
      
      *hornc@cflosbld09 200312170801 $ grep -B 4 -A 8 ffff88076447c640 bt.all
      PID: 8971   TASK: ffff88076b3c4f00  CPU: 1   COMMAND: "kiblnd_sd_00_00"
          [exception RIP: queued_spin_lock_slowpath+283]
          RIP: ffffffff810a573b  RSP: ffffc90000ea7a80  RFLAGS: 00000202
          RAX: 0000000000300101  RBX: ffff8808299ed880  RCX: 0000000000000001
          RDX: 0000000000000101  RSI: 0000000000000001  RDI: ffff88076447c640
          RBP: ffffc90000ea7a80   R8: 0000000000000101   R9: ffffffffa05d1729
          R10: 0000000000000000  R11: ffffffffa05ebb48  R12: 0000000000000008
          R13: ffff88082be96600  R14: ffff88082be96600  R15: ffff88071da3e568
          CS: 0010  SS: 0018
       #0 [ffffc90000ea7a88] _raw_spin_lock at ffffffff816b5411
       #1 [ffffc90000ea7a98] cfs_percpt_lock at ffffffffa053c546 [libcfs]
       #2 [ffffc90000ea7ac0] lnet_handle_send at ffffffffa05a3bd2 [lnet]
       #3 [ffffc90000ea7b28] lnet_select_pathway at ffffffffa05a6e18 [lnet]
      --
      PID: 8983   TASK: ffff8807167a3080  CPU: 11  COMMAND: "lnet_discovery"
          [exception RIP: queued_spin_lock_slowpath+184]
          RIP: ffffffff810a56d8  RSP: ffffc900286e3e48  RFLAGS: 00000202
          RAX: 0000000000000000  RBX: ffff8808299ed880  RCX: ffff88085fae2a80
          RDX: 0000000000300101  RSI: 0000000000000101  RDI: ffff88076447c640
          RBP: ffffc900286e3e48   R8: 0000000000300000   R9: 0000000000000000
          R10: ffffc900286e3e90  R11: 0000000000000f2f  R12: 0000000000000008
          R13: ffff880828f021d0  R14: 0000000000000240  R15: 000500280a0c0035
          CS: 0010  SS: 0018
       #0 [ffffc900286e3e50] _raw_spin_lock at ffffffff816b5411
       #1 [ffffc900286e3e60] cfs_percpt_lock at ffffffffa053c546 [libcfs]
       #2 [ffffc900286e3e88] lnet_peer_discovery at ffffffffa05bc031 [lnet]
       #3 [ffffc900286e3f08] kthread at ffffffff8107e807
      *hornc@cflosbld09 200312170801 $
      

      A debug patch tells us which thread is holding this lock:

      crash_x86_64> p *((struct cfs_percpt_lock *)0xffff8808299ed880)->pcl_locks_info[1]
      $4 = {
        pcl_fn_ptr = 0xffffffffa05a69b7 <lnet_select_pathway+103>,
        pcl_thr_ptr = 0xffff88082cd6f040
      }
      
      crash_x86_64> task 0xffff88082cd6f040 | grep PID
      PID: 8965   TASK: ffff88082cd6f040  CPU: 10  COMMAND: "kgnilnd_sd_06"
      crash_x86_64> bt 8965
      PID: 8965   TASK: ffff88082cd6f040  CPU: 10  COMMAND: "kgnilnd_sd_06"
          [exception RIP: queued_spin_lock_slowpath+180]
          RIP: ffffffff810a56d4  RSP: ffffc90000d73ad8  RFLAGS: 00000202
          RAX: 0000000000000000  RBX: ffff880828f02100  RCX: ffff88085faa2a80
          RDX: 00000000002c0101  RSI: 0000000000000101  RDI: ffff880828f02184
          RBP: ffffc90000d73ad8   R8: 00000000002c0000   R9: 0000000000000000
          R10: 0000000000000000  R11: 0000000000000000  R12: ffff88071e12d568
          R13: ffff880828f02100  R14: 0000000000000001  R15: ffffffffffffffff
          CS: 0010  SS: 0018
       #0 [ffffc90000d73ae0] _raw_spin_lock at ffffffff816b5411
       #1 [ffffc90000d73af0] lnet_peer_gw_discovery at ffffffffa05bd329 [lnet]
       #2 [ffffc90000d73b08] lnet_initiate_peer_discovery at ffffffffa05a31c3 [lnet]
       #3 [ffffc90000d73b40] lnet_select_pathway at ffffffffa05a6a4c [lnet]
       #4 [ffffc90000d73c40] lnet_send at ffffffffa05a7af5 [lnet]
       #5 [ffffc90000d73c60] lnet_finalize at ffffffffa05999fa [lnet]
       #6 [ffffc90000d73cc0] kgnilnd_tx_done at ffffffffa064cba4 [kgnilnd]
       #7 [ffffc90000d73d20] kgnilnd_check_fma_send_cq at ffffffffa0655884 [kgnilnd]
       #8 [ffffc90000d73e78] kgnilnd_scheduler at ffffffffa065a6ca [kgnilnd]
       #9 [ffffc90000d73f08] kthread at ffffffff8107e807
      #10 [ffffc90000d73f50] ret_from_fork at ffffffff8180024a
      crash_x86_64>
      

      This thread is trying to get lp_lock:

      bool
      lnet_peer_gw_discovery(struct lnet_peer *lp)
      {
              bool rc = false;
      
              spin_lock(&lp->lp_lock); <<<< Thread here
      

      The peer in question:

      crash_x86_64> struct -o lnet_peer | grep lp_lock
        [132] spinlock_t lp_lock;
      crash_x86_64> eval ffff880828f02184 - 132
      hexadecimal: ffff880828f02100
          decimal: 18446612167360782592  (-131906348769024)
            octal: 1777774200405074020400
           binary: 1111111111111111100010000000100000101000111100000010000100000000
      crash_x86_64>
      

      From backtrace info in the dump we can see a few threads with this address in their stack.

      One is a kgnilnd_sd thread:

      PID: 8960   TASK: ffff8807644aa0c0  CPU: 13  COMMAND: "kgnilnd_sd_01"
      

      This thread is finalizing a message. It is stuck acquiring an lp_lock in lnet_return_rx_credits_locked().

      lnet_return_rx_credits_locked(struct lnet_msg *msg)
      {
      ...
                      lp = rxpeerni->lpni_peer_net->lpn_peer;
      
                      /* give back peer router credits */
                      msg->msg_peerrtrcredit = 0;
      
                      spin_lock(&rxpeerni->lpni_lock);
                      spin_lock(&lp->lp_lock);  <<<<<<<<<< Thread is here
      

      Meanwhile, a kiblnd_sd thread is processing a discovery reply:

      crash_x86_64> bt 8977
      PID: 8977   TASK: ffff88076550afc0  CPU: 15  COMMAND: "kiblnd_sd_03_00"
          [exception RIP: queued_spin_lock_slowpath+281]
          RIP: ffffffff810a5739  RSP: ffffc90000f13ae8  RFLAGS: 00000202
          RAX: 0000000000000101  RBX: ffff88082cfe3200  RCX: 0000000000000001
          RDX: 0000000000000101  RSI: 0000000000000001  RDI: ffff88082cfe32a4
          RBP: ffffc90000f13ae8   R8: 0000000000000101   R9: 000000000000000f
          R10: 0000000000000001  R11: ffff8806e12b7eea  R12: ffff880828f02100
          R13: ffff880828cb2209  R14: ffff880828f02184  R15: ffff880828cb2200
          CS: 0010  SS: 0018
       #0 [ffffc90000f13af0] _raw_spin_lock at ffffffff816b5411
       #1 [ffffc90000f13b00] lnet_peer_ni_clr_non_mr_pref_nid at ffffffffa05b81eb [lnet]
       #2 [ffffc90000f13b20] lnet_peer_clr_non_mr_pref_nids at ffffffffa05b82db [lnet]
       #3 [ffffc90000f13b40] lnet_discovery_event_handler at ffffffffa05be2cf [lnet]
      

      It is holding lp_lock.

      lnet_discovery_event_reply(struct lnet_peer *lp, struct lnet_event *ev)
      {
      ...
              spin_lock(&lp->lp_lock);
      ...
                              CDEBUG(D_NET, "peer %s(%p) is MR capable\n",
                                     libcfs_nid2str(lp->lp_primary_nid), lp);
                              lp->lp_state |= LNET_PEER_MULTI_RAIL;
                              lnet_peer_clr_non_mr_pref_nids(lp);
      
      void
      lnet_peer_clr_non_mr_pref_nids(struct lnet_peer *lp)
      {
              struct lnet_peer_ni *lpni = NULL;
      
              while ((lpni = lnet_get_next_peer_ni_locked(lp, NULL, lpni)) != NULL)
                      lnet_peer_ni_clr_non_mr_pref_nid(lpni);
      }
      
      lnet_peer_ni_clr_non_mr_pref_nid(struct lnet_peer_ni *lpni)
      {
      ...
              spin_lock(&lpni->lpni_lock); <<<< Thread is here
      

      Thread A: spin_lock(lp_lock)
      Thread B: spin_lock(lpni_lock)
      Thread A: spin_lock(lpni_lock)
      Thread B: spin_lock(lp_lock)
      Deadlock.

      Since these two threads are deadlocked, this is preventing PID 8965 (which is holding lnet_net_lock(1)) from making progress. Thus causing further deadlocks with threads trying to acquire lnet_net_lock(1).

      Same issue exists with lnet_post_routed_recv_locked()

      Attachments

        Activity

          [LU-13362] Race between message finalize and discovery reply processing
          pjones Peter Jones added a comment -

          Landed for 2.14

          pjones Peter Jones added a comment - Landed for 2.14

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37937/
          Subject: LU-13362 lnet: Disc reply race with finalize and routed recv
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: c700b4a410ab5542391d006ce541023ecf9b7a5d

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37937/ Subject: LU-13362 lnet: Disc reply race with finalize and routed recv Project: fs/lustre-release Branch: master Current Patch Set: Commit: c700b4a410ab5542391d006ce541023ecf9b7a5d

          Chris Horn (chris.horn@hpe.com) uploaded a new patch: https://review.whamcloud.com/37937
          Subject: LU-13362 lnet: Disc reply race with finalize and routed recv
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 1453f3fdf7c3d6bba99a3b464b0a5b19997db1f5

          gerrit Gerrit Updater added a comment - Chris Horn (chris.horn@hpe.com) uploaded a new patch: https://review.whamcloud.com/37937 Subject: LU-13362 lnet: Disc reply race with finalize and routed recv Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 1453f3fdf7c3d6bba99a3b464b0a5b19997db1f5

          People

            hornc Chris Horn
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: