Details
-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
with landing an LU-12678, ptlrpc hold an object pointer without reference to it (lnet_me don't have a reference).
Scenario is
lnet monitor_thread found an expired response and start a kill MD once no references MD start to kill an ME entry, but ptlrpc have a reference to the ME object and try to kill ME itself.
void
lnet_md_unlink(struct lnet_libmd *md)
{
if ((md->md_flags & LNET_MD_FLAG_ZOMBIE) == 0) {
/* first unlink attempt... */
struct lnet_me *me = md->md_me;
md->md_flags |= LNET_MD_FLAG_ZOMBIE;
/* Disassociate from ME (if any), and unlink it if it was created
* with LNET_UNLINK */
if (me != NULL) {
/* detach MD from portal */
lnet_ptl_detach_md(me, md);
if (me->me_unlink == LNET_UNLINK)
lnet_me_unlink(me);
}
/* ensure all future handle lookups fail */
lnet_res_lh_invalidate(&md->md_lh);
}
if (md->md_refcount != 0) {
CDEBUG(D_NET, "Queueing unlink of md %p\n", md);
return;
}
so lnet_me isn't protected by MD reference.