Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12568

LNetError: 28086:0:(lib-move.c:2862:lnet_detach_rsp_tracker()) ASSERTION( rspt->rspt_cpt == cpt ) failed

    XMLWordPrintable

Details

    • 3
    • 9223372036854775807

    Description

      There is a use after free in the LNet response tracking code.

      If an MD is unlinked with a non-zero refcount the lnet_libhandle is invalidated so that future lookups of the MD are failed.

      /* must be called with lnet_res_lock held */
      void
      lnet_md_unlink(struct lnet_libmd *md)
      {
              if ((md->md_flags & LNET_MD_FLAG_ZOMBIE) == 0) {
                      /* first unlink attempt... */
                      struct lnet_me *me = md->md_me;
      
                      md->md_flags |= LNET_MD_FLAG_ZOMBIE;
      
                      /* Disassociate from ME (if any), and unlink it if it was created
                       * with LNET_UNLINK */
                      if (me != NULL) {
                              /* detach MD from portal */
                              lnet_ptl_detach_md(me, md);
                              if (me->me_unlink == LNET_UNLINK)
                                      lnet_me_unlink(me);
                      }
      
                      /* ensure all future handle lookups fail */
                      lnet_res_lh_invalidate(&md->md_lh);
              }
      
              if (md->md_refcount != 0) {
                      CDEBUG(D_NET, "Queueing unlink of md %p\n", md);
                      return;
              }
      

      If a response tracker is attached to such an MD then it is possible for the lnet_finalize_expired_responses loop to free the rspt before it has been detached from the MD.

      static void
      lnet_finalize_expired_responses(bool force)
      {
      <snip>
                             if (ktime_compare(ktime_get(), rspt->rspt_deadline) >= 0 ||
                                  force) {
                                      struct lnet_peer_ni *lpni;
                                      lnet_nid_t nid;
      
                                      md = lnet_handle2md(&rspt->rspt_mdh);
                                      if (!md) {
                                              LNetInvalidateMDHandle(&rspt->rspt_mdh);
                                              lnet_res_unlock(i);
                                              list_del_init(&rspt->rspt_on_list);
                                              lnet_rspt_free(rspt, i);
                                              continue;
                                      }
      

      When the final operation on the MD completes the MD is detached from the lnet_msg, the response tracker is detached from the MD, and the assert can be tripped:

      lnet_finalize()->lnet_msg_detach_md()->lnet_detach_rsp_tracker()
      
      void
      lnet_detach_rsp_tracker(struct lnet_libmd *md, int cpt)
      {
              struct lnet_rsp_tracker *rspt;
      
              /*
               * msg has a refcount on the MD so the MD is not going away.
               * The rspt queue for the cpt is protected by
               * the lnet_net_lock(cpt). cpt is the cpt of the MD cookie.
               */
              if (!md->md_rspt_ptr)
                      return;
      
              rspt = md->md_rspt_ptr;
              md->md_rspt_ptr = NULL;
      
              /* debug code */
              LASSERT(rspt->rspt_cpt == cpt);
      

      Attachments

        Issue Links

          Activity

            People

              hornc Chris Horn
              hornc Chris Horn
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: