Details
-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
with landing an LU-12678, ptlrpc hold an object pointer without reference to it (lnet_me don't have a reference).
Scenario is
lnet monitor_thread found an expired response and start a kill MD once no references MD start to kill an ME entry, but ptlrpc have a reference to the ME object and try to kill ME itself.
void lnet_md_unlink(struct lnet_libmd *md) { if ((md->md_flags & LNET_MD_FLAG_ZOMBIE) == 0) { /* first unlink attempt... */ struct lnet_me *me = md->md_me; md->md_flags |= LNET_MD_FLAG_ZOMBIE; /* Disassociate from ME (if any), and unlink it if it was created * with LNET_UNLINK */ if (me != NULL) { /* detach MD from portal */ lnet_ptl_detach_md(me, md); if (me->me_unlink == LNET_UNLINK) lnet_me_unlink(me); } /* ensure all future handle lookups fail */ lnet_res_lh_invalidate(&md->md_lh); } if (md->md_refcount != 0) { CDEBUG(D_NET, "Queueing unlink of md %p\n", md); return; }
so lnet_me isn't protected by MD reference.