Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.13.0, Lustre 2.12.2
-
None
-
3
-
9223372036854775807
Description
This leads to some false-positive "timeouts". In the excerpt below we see the response tracker attached to the ping msg, we see the reply, but we never unlink the md so the response tracker does not get detached.
00000400:00000200:26.0:1560633919.659270:0:2482:0:(router.c:1045:lnet_ping_router_locked()) Check: 12345-93@gni4 00000400:00000200:26.0:1560633919.659290:0:2482:0:(lib-move.c:4844:LNetGet()) LNetGet msg ffff8807832cdc00 -> 12345-93@gni4 00000400:00000200:26.0:1560633919.659291:0:2482:0:(lib-msg.c:364:lnet_msg_attach_md()) attached md ffff88078a861f68 to msg ffff8807832cdc00 00000400:00000200:26.0:1560633919.659293:0:2482:0:(lib-move.c:4505:lnet_attach_rsp_tracker()) Add rspt ffff88078b5ef000 to md ffff88078a861f68 dl 1560633969s ne false 00000400:00000200:6.0:1560633919.659345:0:2479:0:(lib-msg.c:775:lnet_msg_detach_md()) ffff88078a861f68 ref 0 fl 2 thr -1 opt 10 off 0 size 0 len 272 msg ffff8807832cdc00 unlink false 00000400:00000200:6.0:1560633919.659466:0:2479:0:(lib-move.c:3890:lnet_parse_reply()) 60@gni4: Reply msg ffff88078c803800 from 12345-93@gni4 of length 80/80 into md 0x65931 00000400:00000200:6.0:1560633919.659467:0:2479:0:(lib-msg.c:364:lnet_msg_attach_md()) attached md ffff88078a861f68 to msg ffff88078c803800 00000400:00000200:6.0:1560633919.659474:0:2479:0:(router.c:120:lnet_notify_locked()) Old news 00000400:00000200:6.0:1560633919.659475:0:2479:0:(lib-msg.c:775:lnet_msg_detach_md()) ffff88078a861f68 ref 0 fl 2 thr -1 opt 10 off 0 size 0 len 272 msg ffff88078c803800 unlink false 00000400:00000200:6.0:1560633919.659507:0:2479:0:(router.c:120:lnet_notify_locked()) Old news 00000400:00000200:6.0:1560633919.659578:0:2479:0:(router.c:120:lnet_notify_locked()) Old news 00000400:00000200:6.0:1560633919.659619:0:2479:0:(router.c:120:lnet_notify_locked()) Old news 00000400:00000100:26.0:1560633977.003290:0:2482:0:(lib-move.c:2781:lnet_finalize_expired_responses()) Response timed out: md = ffff88078a861f68: nid = 93@gni4
Attachments
Issue Links
- is related to
-
LU-12568 LNetError: 28086:0:(lib-move.c:2862:lnet_detach_rsp_tracker()) ASSERTION( rspt->rspt_cpt == cpt ) failed
- Resolved
-
LU-12906 LBUG ASSERTION( rspt->rspt_cpt == cpt ) failed
- Resolved
-
LU-12907 LNet routers: LNetError: 14141:0:(lib-msg.c:894:lnet_finalize()) ASSERTION( !(((current_thread_info()->preempt_count) & ((((1UL << (10))-1) << ((0 + 8) + 8)) | (((1UL << (8))-1) << (0 + 8)) | (((1UL << (1))-1) << (((0 + 8) + 8) + 10)))))
- Resolved