00000400:00000200:21.0F:1644346380.762084:0:170493:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.8@o2ib100, cpt = 1 00000400:00000200:21.0:1644346380.762093:0:170493:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.8@o2ib100: 0 00000400:00000200:21.0:1644346380.762094:0:170493:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:21.0:1644346380.762096:0:170493:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:21.0:1644346380.762098:0:170493:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.8@o2ib100 NID 172.19.2.8@o2ib100: 0. pending discovery 00000400:00000200:26.0F:1644346380.762101:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:26.0:1644346380.762111:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.8@o2ib100(00000000031d8f5d) state 0x36060 00000400:00000200:26.0:1644346380.762120:0:170492:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.8@o2ib100 00000400:00000200:26.0:1644346380.762124:0:170492:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.8@o2ib100 local destination 00000400:00000200:26.0:1644346380.762129:0:170492:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.8@o2ib100 00000400:00000200:26.0:1644346380.762136:0:170492:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.8@o2ib100(172.19.2.8@o2ib100:172.19.2.8@o2ib100) : GET try# 0 00000800:00000200:26.0:1644346380.762140:0:170492:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.8@o2ib100 00000800:00000200:26.0:1644346380.762145:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000050bc1d78] -> 172.19.2.8@o2ib100 (2) version: 0 00000800:00000200:26.0:1644346380.762148:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000050bc1d78] -> 172.19.2.8@o2ib100 (2) version: 0 00000400:00000200:26.0:1644346380.762150:0:170492:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.8@o2ib100 00000400:00000200:26.0:1644346380.762152:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.8@o2ib100(00000000031d8f5d) state 0x34260 rc 0 00000800:00000200:17.2F:1644346383.769656:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[00000000c99f1f45] (20)++ 00000800:00000200:18.0F:1644346383.769724:0:170491:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[00000000c99f1f45] (21)++ 00000800:00000200:18.0:1644346383.769740:0:170491:0:(o2iblnd_cb.c:343:kiblnd_handle_rx()) Received d1[0] from 172.19.1.55@o2ib100 00000400:00000200:18.0:1644346383.769746:0:170491:0:(lib-move.c:4287:lnet_parse()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100) <- 192.168.128.103@o2ib18 : PUT - for me 00000400:00000200:18.0:1644346383.769756:0:170491:0:(lib-ptl.c:571:lnet_ptl_match_md()) Request from 12345-192.168.128.103@o2ib18 of length 224 into portal 28 MB=0x61afc2a845c40 00000400:00000200:18.0:1644346383.769763:0:170491:0:(lib-ptl.c:200:lnet_try_match_md()) Incoming put index 1c from 12345-192.168.128.103@o2ib18 of length 224/224 into md 0x411011 [8] + 8960 00000400:00000200:18.0:1644346383.769768:0:170491:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:18.0:1644346383.769772:0:170491:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.1.55@o2ib100: PUT: OK 00000100:00000200:18.0:1644346383.769776:0:170491:0:(events.c:313:request_in_callback()) event type 2, status 0, service ost 00000800:00000200:18.0:1644346383.769788:0:170491:0:(o2iblnd_cb.c:205:kiblnd_post_rx()) conn[00000000c99f1f45] (22)++ 00000800:00000200:31.0F:1644346383.769789:0:170488:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[00000000c99f1f45] (22)-- 00000800:00000200:18.0:1644346383.769792:0:170491:0:(o2iblnd_cb.c:239:kiblnd_post_rx()) conn[00000000c99f1f45] (23)-- 00000800:00000200:18.0:1644346383.769797:0:170491:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[00000000c99f1f45] (21)-- 00000100:00000200:0.0F:1644346383.769867:0:170701:0:(service.c:2304:ptlrpc_server_handle_request()) got req 1718520207662144 00010000:00000200:0.0:1644346383.769887:0:170701:0:(ldlm_lib.c:3215:target_send_reply_msg()) @@@ sending reply req@00000000595cfaac x1718520207662144/t0(0) o400->e941be7c-6bba-b5a3-5d49-5e2cdc2d2e99@192.168.128.103@o2ib18:234/0 lens 224/224 e 0 to 0 dl 1644346444 ref 1 fl Interpret:H/0/0 rc 0/0 job:'kworker/52:0.0' 00000100:00000200:0.0:1644346383.769902:0:170701:0:(niobuf.c:87:ptl_send_buf()) Sending 224 bytes to portal 4, xid 1718520207662144, offset 224 00000400:00000200:0.0:1644346383.769908:0:170701:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-192.168.128.103@o2ib18 00000400:00000200:0.0:1644346383.769928:0:170701:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source Specified: 172.19.1.138@o2ib100 to NMR: 192.168.128.103@o2ib18 routed destination 00000400:00000200:0.0:1644346383.769930:0:170701:0:(lib-move.c:2014:lnet_handle_find_routed_path()) using src nid 172.19.1.138@o2ib100 for route restriction 00000400:00000200:0.0:1644346383.769932:0:170701:0:(lib-move.c:1336:lnet_select_peer_ni()) 172.19.1.138@o2ib100 ni_is_pref = 1 00000400:00000200:0.0:1644346383.769933:0:170701:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 192.168.128.103@o2ib18 00000400:00000200:0.0:1644346383.769934:0:170701:0:(lib-move.c:1474:lnet_find_route_locked()) Looking up a route to o2ib18, from o2ib100 00000400:00000200:0.0:1644346383.769936:0:170701:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 172.19.1.54@o2ib100 00000400:00000200:0.0:1644346383.769937:0:170701:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 172.19.1.55@o2ib100 00000400:00000200:0.0:1644346383.769942:0:170701:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:172.19.1.138@o2ib100) -> 192.168.128.103@o2ib18(192.168.128.103@o2ib18:172.19.1.55@o2ib100) : PUT try# 0 00000800:00000200:0.0:1644346383.769944:0:170701:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 224 bytes in 1 frags to 12345-172.19.1.55@o2ib100 00000800:00000200:0.0:1644346383.769948:0:170701:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000cd903220] -> 172.19.1.55@o2ib100 (3) version: 12 00000800:00000200:0.0:1644346383.769950:0:170701:0:(o2iblnd_cb.c:1519:kiblnd_launch_tx()) conn[00000000c99f1f45] (20)++ 00000800:00000200:0.0:1644346383.769951:0:170701:0:(o2iblnd_cb.c:1265:kiblnd_queue_tx_locked()) conn[00000000c99f1f45] (21)++ 00000800:00000200:0.0:1644346383.769957:0:170701:0:(o2iblnd_cb.c:1525:kiblnd_launch_tx()) conn[00000000c99f1f45] (22)-- 00000800:00000200:17.2:1644346383.769967:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[00000000c99f1f45] (21)++ 00000800:00000200:17.0F:1644346383.769982:0:170490:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[00000000c99f1f45] (22)++ 00000800:00000200:17.0:1644346383.769990:0:170490:0:(o2iblnd_cb.c:75:kiblnd_tx_done()) conn[00000000c99f1f45] (23)-- 00000400:00000200:17.0:1644346383.770006:0:170490:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:17.0:1644346383.770012:0:170490:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.1.55@o2ib100: PUT: OK 00000800:00000400:63.0F:1644346383.770019:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.1.134@o2ib100: 693658 seconds 00000400:00000200:63.0:1644346383.770026:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:17.0:1644346383.770028:0:170490:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000aec52716 00000400:00000200:63.0:1644346383.770031:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.1.134@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346383.770034:0:170482:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000800:00000200:17.0:1644346383.770034:0:170490:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[00000000c99f1f45] (22)-- 00000800:00000200:17.0:1644346383.770037:0:170490:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[00000000c99f1f45] (21)-- 00000400:00000200:63.0:1644346383.770038:0:170482:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.1.134@o2ib100: 1 00000400:00000200:26.0:1644346383.770119:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:26.0:1644346383.770122:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.134@o2ib100(000000000110a640) state 0x4860 00000400:00000200:26.0:1644346383.770124:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000008b9ce072 00000400:00000200:26.0:1644346383.770125:0:170492:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:26.0:1644346383.770128:0:170492:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.1.134@o2ib100:-110 00000400:00000200:26.0:1644346383.770129:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.134@o2ib100(000000000110a640) state 0x6060 rc -110 00000400:00000200:26.0:1644346383.770131:0:170492:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.1.134@o2ib100: -110 00000400:00000200:26.0:1644346383.770132:0:170492:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.1.134@o2ib100 00000400:00000200:26.0:1644346383.770134:0:170492:0:(lib-msg.c:1012:lnet_is_health_check()) msg 000000002372b454 not committed for send or receive 00000400:00000200:26.0:1644346383.770135:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000a9d9fca1 00000100:00000200:26.0:1644346383.770138:0:170492:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@000000006dc49c8b x1723495557508928/t0(0) o250->MGC172.19.1.133@o2ib100@172.19.1.134@o2ib100:26/25 lens 520/544 e 0 to 0 dl 1644346425 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000400:00000200:26.0:1644346383.770147:0:170492:0:(lib-msg.c:1012:lnet_is_health_check()) msg 0000000032dd374a not committed for send or receive 00000400:00000200:26.0:1644346383.770147:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000b86463ba 00000100:00000200:26.0:1644346383.770149:0:170492:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@00000000aa049739 x1723495557508992/t0(0) o38->lflood-MDT0000-lwp-OST0001@172.19.1.134@o2ib100:12/10 lens 520/544 e 0 to 0 dl 1644346425 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000400:00000200:26.0:1644346383.770153:0:170492:0:(lib-msg.c:1012:lnet_is_health_check()) msg 00000000d26ab01a not committed for send or receive 00000400:00000200:26.0:1644346383.770153:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000008acf46e9 00000100:00000200:26.0:1644346383.770155:0:170492:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@00000000c0212436 x1723495557509056/t0(0) o38->lflood-MDT0001-lwp-OST0001@172.19.1.134@o2ib100:12/10 lens 520/544 e 0 to 0 dl 1644346425 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000400:00000200:8.0F:1644346383.770216:0:170496:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000efdc62cd 00000100:00000200:8.0:1644346383.770226:0:170496:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@00000000aa049739 x1723495557508992/t0(0) o38->lflood-MDT0000-lwp-OST0001@172.19.1.134@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346425 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:8.0:1644346383.770250:0:170496:0:(events.c:122:reply_in_callback()) @@@ unlink req@00000000aa049739 x1723495557508992/t0(0) o38->lflood-MDT0000-lwp-OST0001@172.19.1.134@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346425 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000400:00000200:8.0:1644346383.770259:0:170496:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000757ae77f 00000100:00000200:8.0:1644346383.770261:0:170496:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@00000000c0212436 x1723495557509056/t0(0) o38->lflood-MDT0001-lwp-OST0001@172.19.1.134@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346425 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:8.0:1644346383.770263:0:170496:0:(events.c:122:reply_in_callback()) @@@ unlink req@00000000c0212436 x1723495557509056/t0(0) o38->lflood-MDT0001-lwp-OST0001@172.19.1.134@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346425 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000400:00000200:8.0:1644346383.770267:0:170496:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000418bf410 00000100:00000200:8.0:1644346383.770269:0:170496:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@000000006dc49c8b x1723495557508928/t0(0) o250->MGC172.19.1.133@o2ib100@172.19.1.134@o2ib100:26/25 lens 520/544 e 0 to 1 dl 1644346425 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:8.0:1644346383.770271:0:170496:0:(events.c:122:reply_in_callback()) @@@ unlink req@000000006dc49c8b x1723495557508928/t0(0) o250->MGC172.19.1.133@o2ib100@172.19.1.134@o2ib100:26/25 lens 520/544 e 0 to 1 dl 1644346425 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000100:00000200:8.0:1644346383.770342:0:170496:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495557509120, portal 25 00000100:00000200:8.0:1644346383.770344:0:170496:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 26, xid 1723495557509120, offset 0 00000400:00000200:8.0:1644346383.770347:0:170496:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.133@o2ib100 00000400:00000200:8.0:1644346383.770353:0:170496:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.133@o2ib100: 0 00000400:00000200:8.0:1644346383.770354:0:170496:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:8.0:1644346383.770354:0:170496:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:8.0:1644346383.770355:0:170496:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.133@o2ib100 NID 172.19.1.133@o2ib100: 0. pending discovery 00000400:00000200:8.0:1644346383.770356:0:170496:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 000000004d232f20 delayed. 172.19.1.133@o2ib100 pending discovery 00000400:00000200:26.0:1644346383.770357:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:26.0:1644346383.770362:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.133@o2ib100(000000001de69c6b) state 0x6060 00000100:00000200:8.0:1644346383.770362:0:170496:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495557509184, portal 10 00000100:00000200:8.0:1644346383.770363:0:170496:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 12, xid 1723495557509184, offset 0 00000400:00000200:8.0:1644346383.770366:0:170496:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.133@o2ib100 00000400:00000200:26.0:1644346383.770367:0:170492:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.1.133@o2ib100 00000400:00000200:8.0:1644346383.770368:0:170496:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.133@o2ib100: -114 00000400:00000200:8.0:1644346383.770369:0:170496:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:8.0:1644346383.770369:0:170496:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:8.0:1644346383.770370:0:170496:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.133@o2ib100 NID 172.19.1.133@o2ib100: 0. pending discovery 00000400:00000200:26.0:1644346383.770371:0:170492:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.1.133@o2ib100 local destination 00000400:00000200:8.0:1644346383.770372:0:170496:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 0000000033787a36 delayed. 172.19.1.133@o2ib100 pending discovery 00000400:00000200:26.0:1644346383.770374:0:170492:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.1.133@o2ib100 00000100:00000200:8.0:1644346383.770375:0:170496:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495557509248, portal 10 00000400:00000200:26.0:1644346383.770376:0:170492:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.1.133@o2ib100(172.19.1.133@o2ib100:172.19.1.133@o2ib100) : GET try# 0 00000800:00000200:26.0:1644346383.770378:0:170492:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.1.133@o2ib100 00000100:00000200:8.0:1644346383.770378:0:170496:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 12, xid 1723495557509248, offset 0 00000400:00000200:8.0:1644346383.770380:0:170496:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.133@o2ib100 00000800:00000200:26.0:1644346383.770381:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000c9497bbd] -> 172.19.1.133@o2ib100 (2) version: 0 00000400:00000200:8.0:1644346383.770382:0:170496:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.133@o2ib100: -114 00000800:00000200:26.0:1644346383.770383:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000c9497bbd] -> 172.19.1.133@o2ib100 (2) version: 0 00000400:00000200:8.0:1644346383.770383:0:170496:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:8.0:1644346383.770383:0:170496:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:26.0:1644346383.770384:0:170492:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.1.133@o2ib100 00000400:00000200:8.0:1644346383.770385:0:170496:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.133@o2ib100 NID 172.19.1.133@o2ib100: 0. pending discovery 00000400:00000200:26.0:1644346383.770386:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.133@o2ib100(000000001de69c6b) state 0x4260 rc 0 00000400:00000200:8.0:1644346383.770387:0:170496:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 0000000032565a6a delayed. 172.19.1.133@o2ib100 pending discovery 00000800:00000400:63.0:1644346386.778063:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.1@o2ib100: 693661 seconds 00000800:00000400:63.0:1644346386.778065:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.1@o2ib100: 693661 seconds 00000400:00000200:63.0:1644346386.778068:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346386.778069:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.1@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346386.778076:0:170482:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:63.0:1644346386.778077:0:170482:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.1@o2ib100: 1 00000400:00000200:63.0:1644346386.778082:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346386.778083:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.1@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346386.778085:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346386.778086:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.1@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346386.778088:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.1@o2ib100) recovery failed with -110 00000800:00000400:63.0:1644346386.778090:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.2@o2ib100: 9 seconds 00000800:00000400:63.0:1644346386.778091:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.2@o2ib100: 1 seconds 00000400:00000200:63.0:1644346386.778091:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346386.778092:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346386.778093:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.2@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346386.778094:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.2@o2ib100) recovery failed with -110 00000400:00000200:63.0:1644346386.778095:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346386.778096:0:170482:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:63.0:1644346386.778097:0:170482:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.2@o2ib100: 1 00000800:00000400:63.0:1644346386.778099:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.3@o2ib100: 0 seconds 00000400:00000200:63.0:1644346386.778100:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346386.778101:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.3@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346386.778101:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346386.778102:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.3@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346386.778103:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.3@o2ib100) recovery failed with -110 00000800:00000400:63.0:1644346386.778104:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.4@o2ib100: 693661 seconds 00000800:00000400:63.0:1644346386.778108:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.4@o2ib100: 693661 seconds 00000400:00000200:63.0:1644346386.778109:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346386.778110:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.4@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346386.778111:0:170482:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:63.0:1644346386.778111:0:170482:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.4@o2ib100: 1 00000400:00000200:63.0:1644346386.778113:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346386.778114:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.4@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346386.778114:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346386.778115:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.4@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346386.778116:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.4@o2ib100) recovery failed with -110 00000800:00000400:63.0:1644346386.778117:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.5@o2ib100: 88 seconds 00000400:00000200:63.0:1644346386.778118:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346386.778119:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.5@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346386.778120:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346386.778120:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.5@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346386.778121:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.5@o2ib100) recovery failed with -110 00000800:00000400:63.0:1644346386.778122:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.6@o2ib100: 693661 seconds 00000800:00000400:63.0:1644346386.778123:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.6@o2ib100: 693661 seconds 00000400:00000200:63.0:1644346386.778124:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:26.0:1644346386.778124:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:63.0:1644346386.778125:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.6@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346386.778125:0:170482:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:63.0:1644346386.778126:0:170482:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.6@o2ib100: 1 00000400:00000200:63.0:1644346386.778127:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346386.778128:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.6@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:26.0:1644346386.778128:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.1@o2ib100(0000000031931eda) state 0x34860 00000400:00000200:63.0:1644346386.778129:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346386.778129:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.6@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346386.778130:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.6@o2ib100) recovery failed with -110 00000400:00000200:26.0:1644346386.778143:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000966c56af 00000400:00000200:26.0:1644346386.778145:0:170492:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:26.0:1644346386.778147:0:170492:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.1@o2ib100:-110 00000400:00000200:26.0:1644346386.778148:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.1@o2ib100(0000000031931eda) state 0x36060 rc -110 00000400:00000200:26.0:1644346386.778149:0:170492:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.1@o2ib100: -110 00000400:00000200:26.0:1644346386.778151:0:170492:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.1@o2ib100 00000400:00000200:26.0:1644346386.778152:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.2@o2ib100(000000001205a4c1) state 0x34860 00000400:00000200:26.0:1644346386.778153:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000344909f0 00000400:00000200:26.0:1644346386.778154:0:170492:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:26.0:1644346386.778155:0:170492:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.2@o2ib100:-110 00000400:00000200:26.0:1644346386.778156:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.2@o2ib100(000000001205a4c1) state 0x36060 rc -110 00000400:00000200:26.0:1644346386.778156:0:170492:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.2@o2ib100: -110 00000400:00000200:26.0:1644346386.778157:0:170492:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.2@o2ib100 00000400:00000200:26.0:1644346386.778158:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.4@o2ib100(000000009954ed50) state 0x34860 00000400:00000200:26.0:1644346386.778159:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000030cf9031 00000400:00000200:26.0:1644346386.778159:0:170492:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:26.0:1644346386.778160:0:170492:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.4@o2ib100:-110 00000400:00000200:26.0:1644346386.778160:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.4@o2ib100(000000009954ed50) state 0x36060 rc -110 00000400:00000200:26.0:1644346386.778161:0:170492:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.4@o2ib100: -110 00000400:00000200:26.0:1644346386.778161:0:170492:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.4@o2ib100 00000400:00000200:26.0:1644346386.778165:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.6@o2ib100(00000000c5c06062) state 0x34860 00000400:00000200:26.0:1644346386.778166:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000c3704f3b 00000400:00000200:26.0:1644346386.778166:0:170492:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:26.0:1644346386.778167:0:170492:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.6@o2ib100:-110 00000400:00000200:26.0:1644346386.778167:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.6@o2ib100(00000000c5c06062) state 0x36060 rc -110 00000400:00000200:26.0:1644346386.778168:0:170492:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.6@o2ib100: -110 00000400:00000200:26.0:1644346386.778169:0:170492:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.6@o2ib100 00000400:00000200:21.0:1644346386.906090:0:170493:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.2@o2ib100, cpt = 1 00000400:00000200:21.0:1644346386.906098:0:170493:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.2@o2ib100: 0 00000400:00000200:21.0:1644346386.906100:0:170493:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:21.0:1644346386.906101:0:170493:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:21.0:1644346386.906104:0:170493:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.2@o2ib100 NID 172.19.2.2@o2ib100: 0. pending discovery 00000400:00000200:21.0:1644346386.906110:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000a37ad97c 00000400:00000200:21.0:1644346386.906113:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346386.906115:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.6@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346386.906123:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.6@o2ib100 00000400:00000200:21.0:1644346386.906127:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.6@o2ib100 local destination 00000400:00000200:21.0:1644346386.906131:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.6@o2ib100 00000400:00000200:21.0:1644346386.906138:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.6@o2ib100(172.19.2.6@o2ib100:172.19.2.6@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346386.906142:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.6@o2ib100 00000800:00000200:21.0:1644346386.906147:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000006ae19065] -> 172.19.2.6@o2ib100 (2) version: 0 00000400:00000200:26.0:1644346386.906148:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000800:00000200:21.0:1644346386.906150:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000006ae19065] -> 172.19.2.6@o2ib100 (2) version: 0 00000400:00000200:26.0:1644346386.906153:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.2@o2ib100(000000001205a4c1) state 0x36060 00000400:00000200:21.0:1644346386.906153:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000e3f098fd 00000400:00000200:21.0:1644346386.906154:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346386.906156:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.4@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346386.906162:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.4@o2ib100 00000400:00000200:26.0:1644346386.906165:0:170492:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.2@o2ib100 00000400:00000200:21.0:1644346386.906169:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.4@o2ib100 local destination 00000400:00000200:21.0:1644346386.906171:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.4@o2ib100 00000400:00000200:26.0:1644346386.906173:0:170492:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.2@o2ib100 local destination 00000400:00000200:21.0:1644346386.906177:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.4@o2ib100(172.19.2.4@o2ib100:172.19.2.4@o2ib100) : GET try# 0 00000400:00000200:26.0:1644346386.906178:0:170492:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.2@o2ib100 00000800:00000200:21.0:1644346386.906180:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.4@o2ib100 00000400:00000200:26.0:1644346386.906184:0:170492:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.2@o2ib100(172.19.2.2@o2ib100:172.19.2.2@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346386.906188:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000673851dd] -> 172.19.2.4@o2ib100 (2) version: 0 00000800:00000200:26.0:1644346386.906189:0:170492:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.2@o2ib100 00000800:00000200:21.0:1644346386.906191:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000673851dd] -> 172.19.2.4@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346386.906192:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000c66d5622 00000800:00000200:26.0:1644346386.906193:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000000c3ff7ee] -> 172.19.2.2@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346386.906193:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000800:00000200:26.0:1644346386.906196:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000000c3ff7ee] -> 172.19.2.2@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346386.906198:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.5@o2ib100 recovery ping unlinked 00000400:00000200:26.0:1644346386.906199:0:170492:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.2@o2ib100 00000400:00000200:26.0:1644346386.906201:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.2@o2ib100(000000001205a4c1) state 0x34260 rc 0 00000400:00000200:21.0:1644346386.906203:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.5@o2ib100 00000400:00000200:21.0:1644346386.906205:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.5@o2ib100 local destination 00000400:00000200:21.0:1644346386.906208:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.5@o2ib100 00000400:00000200:21.0:1644346386.906213:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.5@o2ib100(172.19.2.5@o2ib100:172.19.2.5@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346386.906216:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.5@o2ib100 00000800:00000200:21.0:1644346386.906219:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000675c9457] -> 172.19.2.5@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346386.906221:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000675c9457] -> 172.19.2.5@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346386.906222:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000006fd4441a 00000400:00000200:21.0:1644346386.906223:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346386.906225:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.1@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346386.906229:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.1@o2ib100 00000400:00000200:21.0:1644346386.906230:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.1@o2ib100 local destination 00000400:00000200:21.0:1644346386.906233:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.1@o2ib100 00000400:00000200:21.0:1644346386.906237:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.1@o2ib100(172.19.2.1@o2ib100:172.19.2.1@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346386.906240:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.1@o2ib100 00000800:00000200:21.0:1644346386.906242:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000005251bd3f] -> 172.19.2.1@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346386.906244:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000005251bd3f] -> 172.19.2.1@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346386.906246:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000b76935cb 00000400:00000200:21.0:1644346386.906247:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346386.906248:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.3@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346386.906252:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.3@o2ib100 00000400:00000200:21.0:1644346386.906254:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.3@o2ib100 local destination 00000400:00000200:21.0:1644346386.906256:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.3@o2ib100 00000400:00000200:21.0:1644346386.906261:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.3@o2ib100(172.19.2.3@o2ib100:172.19.2.3@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346386.906263:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.3@o2ib100 00000800:00000200:21.0:1644346386.906265:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f57ddac4] -> 172.19.2.3@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346386.906267:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f57ddac4] -> 172.19.2.3@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346386.906269:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000b90bda72 00000400:00000200:21.0:1644346386.906270:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346386.906274:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.2@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346386.906277:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.2@o2ib100 00000400:00000200:21.0:1644346386.906279:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.2@o2ib100 local destination 00000400:00000200:21.0:1644346386.906281:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.2@o2ib100 00000400:00000200:21.0:1644346386.906286:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.2@o2ib100(172.19.2.2@o2ib100:172.19.2.2@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346386.906289:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.2@o2ib100 00000800:00000200:21.0:1644346386.906291:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000000c3ff7ee] -> 172.19.2.2@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346386.906307:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000000c3ff7ee] -> 172.19.2.2@o2ib100 (2) version: 0 00000800:00000400:63.0:1644346387.802072:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.7@o2ib100: 693662 seconds 00000800:00000400:63.0:1644346387.802074:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.7@o2ib100: 693662 seconds 00000400:00000200:63.0:1644346387.802076:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346387.802077:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.7@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346387.802082:0:170482:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:63.0:1644346387.802083:0:170482:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.7@o2ib100: 1 00000400:00000200:63.0:1644346387.802086:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346387.802087:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.7@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346387.802088:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346387.802089:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.7@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346387.802090:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.7@o2ib100) recovery failed with -110 00000800:00000400:63.0:1644346387.802092:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.8@o2ib100: 35 seconds 00000800:00000400:63.0:1644346387.802093:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.8@o2ib100: 158 seconds 00000400:00000200:63.0:1644346387.802093:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346387.802095:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.8@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346387.802095:0:170482:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:63.0:1644346387.802096:0:170482:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.8@o2ib100: 1 00000400:00000200:63.0:1644346387.802097:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346387.802098:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.8@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346387.802099:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346387.802099:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.8@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346387.802100:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.8@o2ib100) recovery failed with -110 00000800:00000400:63.0:1644346387.802102:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.9@o2ib100: 693662 seconds 00000800:00000400:63.0:1644346387.802102:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.9@o2ib100: 693662 seconds 00000400:00000200:63.0:1644346387.802103:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346387.802121:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.9@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346387.802122:0:170482:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:63.0:1644346387.802123:0:170482:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.9@o2ib100: 1 00000400:00000200:63.0:1644346387.802123:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346387.802124:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.9@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346387.802125:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346387.802125:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.9@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346387.802126:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.9@o2ib100) recovery failed with -110 00000800:00000400:63.0:1644346387.802127:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.10@o2ib100: 31 seconds 00000400:00000200:63.0:1644346387.802128:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346387.802130:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.10@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346387.802131:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346387.802131:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.10@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346387.802132:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.10@o2ib100) recovery failed with -110 00000400:00000200:26.0:1644346387.802132:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000800:00000400:63.0:1644346387.802133:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.11@o2ib100: 112 seconds 00000400:00000200:63.0:1644346387.802134:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346387.802135:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.11@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:26.0:1644346387.802135:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.7@o2ib100(0000000088a898f7) state 0x34860 00000400:00000200:63.0:1644346387.802136:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346387.802136:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.11@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346387.802137:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.11@o2ib100) recovery failed with -110 00000400:00000200:26.0:1644346387.802137:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000004004410f 00000400:00000200:26.0:1644346387.802138:0:170492:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000800:00000400:63.0:1644346387.802139:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.12@o2ib100: 693662 seconds 00000800:00000400:63.0:1644346387.802139:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.12@o2ib100: 693662 seconds 00000400:00000200:63.0:1644346387.802140:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:26.0:1644346387.802140:0:170492:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.7@o2ib100:-110 00000400:00000200:63.0:1644346387.802141:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.12@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346387.802151:0:170482:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:63.0:1644346387.802151:0:170482:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.12@o2ib100: 1 00000400:00000200:63.0:1644346387.802152:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346387.802153:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.12@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:26.0:1644346387.802153:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.7@o2ib100(0000000088a898f7) state 0x36060 rc -110 00000400:00000200:26.0:1644346387.802154:0:170492:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.7@o2ib100: -110 00000400:00000200:26.0:1644346387.802155:0:170492:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.7@o2ib100 00000400:00000200:63.0:1644346387.802156:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:26.0:1644346387.802156:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.8@o2ib100(00000000031d8f5d) state 0x34860 00000400:00000200:63.0:1644346387.802157:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.12@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346387.802157:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.12@o2ib100) recovery failed with -110 00000400:00000200:26.0:1644346387.802157:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000f3937d34 00000400:00000200:26.0:1644346387.802158:0:170492:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:26.0:1644346387.802159:0:170492:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.8@o2ib100:-110 00000400:00000200:26.0:1644346387.802159:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.8@o2ib100(00000000031d8f5d) state 0x36060 rc -110 00000400:00000200:26.0:1644346387.802160:0:170492:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.8@o2ib100: -110 00000400:00000200:26.0:1644346387.802162:0:170492:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.8@o2ib100 00000400:00000200:26.0:1644346387.802163:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.9@o2ib100(000000005b998561) state 0x34860 00000400:00000200:26.0:1644346387.802164:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000f617b232 00000400:00000200:26.0:1644346387.802164:0:170492:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:26.0:1644346387.802165:0:170492:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.9@o2ib100:-110 00000400:00000200:26.0:1644346387.802166:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.9@o2ib100(000000005b998561) state 0x36060 rc -110 00000400:00000200:26.0:1644346387.802166:0:170492:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.9@o2ib100: -110 00000400:00000200:26.0:1644346387.802167:0:170492:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.9@o2ib100 00000400:00000200:26.0:1644346387.802168:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.12@o2ib100(00000000e223ed87) state 0x34860 00000400:00000200:26.0:1644346387.802168:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000088b879bf 00000400:00000200:26.0:1644346387.802169:0:170492:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:26.0:1644346387.802169:0:170492:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.12@o2ib100:-110 00000400:00000200:26.0:1644346387.802170:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.12@o2ib100(00000000e223ed87) state 0x36060 rc -110 00000400:00000200:26.0:1644346387.802171:0:170492:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.12@o2ib100: -110 00000400:00000200:26.0:1644346387.802171:0:170492:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.12@o2ib100 00000400:00000200:21.0:1644346387.930094:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000c1c01860 00000400:00000200:21.0:1644346387.930097:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346387.930101:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.12@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346387.930109:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.12@o2ib100 00000400:00000200:21.0:1644346387.930112:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.12@o2ib100 local destination 00000400:00000200:21.0:1644346387.930116:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.12@o2ib100 00000400:00000200:21.0:1644346387.930122:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.12@o2ib100(172.19.2.12@o2ib100:172.19.2.12@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346387.930126:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.12@o2ib100 00000800:00000200:21.0:1644346387.930131:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a282465c] -> 172.19.2.12@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346387.930134:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a282465c] -> 172.19.2.12@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346387.930136:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000006ecec57c 00000400:00000200:21.0:1644346387.930137:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346387.930139:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.11@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346387.930143:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.11@o2ib100 00000400:00000200:21.0:1644346387.930145:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.11@o2ib100 local destination 00000400:00000200:21.0:1644346387.930148:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.11@o2ib100 00000400:00000200:21.0:1644346387.930153:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.11@o2ib100(172.19.2.11@o2ib100:172.19.2.11@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346387.930156:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.11@o2ib100 00000800:00000200:21.0:1644346387.930159:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a013326d] -> 172.19.2.11@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346387.930161:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a013326d] -> 172.19.2.11@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346387.930163:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000007d7d62d7 00000400:00000200:21.0:1644346387.930164:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346387.930171:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.10@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346387.930175:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.10@o2ib100 00000400:00000200:21.0:1644346387.930177:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.10@o2ib100 local destination 00000400:00000200:21.0:1644346387.930179:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.10@o2ib100 00000400:00000200:21.0:1644346387.930184:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.10@o2ib100(172.19.2.10@o2ib100:172.19.2.10@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346387.930187:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.10@o2ib100 00000800:00000200:21.0:1644346387.930189:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000028023f80] -> 172.19.2.10@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346387.930191:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000028023f80] -> 172.19.2.10@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346387.930193:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000b7e9b1b6 00000400:00000200:21.0:1644346387.930193:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346387.930195:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.9@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346387.930199:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.9@o2ib100 00000400:00000200:21.0:1644346387.930201:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.9@o2ib100 local destination 00000400:00000200:21.0:1644346387.930203:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.9@o2ib100 00000400:00000200:21.0:1644346387.930208:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.9@o2ib100(172.19.2.9@o2ib100:172.19.2.9@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346387.930211:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.9@o2ib100 00000800:00000200:21.0:1644346387.930213:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000063275d68] -> 172.19.2.9@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346387.930215:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000063275d68] -> 172.19.2.9@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346387.930217:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000001746cb78 00000400:00000200:21.0:1644346387.930218:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346387.930219:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.7@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346387.930223:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.7@o2ib100 00000400:00000200:21.0:1644346387.930224:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.7@o2ib100 local destination 00000400:00000200:21.0:1644346387.930226:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.7@o2ib100 00000400:00000200:21.0:1644346387.930231:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.7@o2ib100(172.19.2.7@o2ib100:172.19.2.7@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346387.930234:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.7@o2ib100 00000800:00000200:21.0:1644346387.930236:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000084975cc7] -> 172.19.2.7@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346387.930238:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000084975cc7] -> 172.19.2.7@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346387.930240:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000fae22045 00000400:00000200:21.0:1644346387.930240:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346387.930244:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.8@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346387.930248:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.8@o2ib100 00000400:00000200:21.0:1644346387.930250:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.8@o2ib100 local destination 00000400:00000200:21.0:1644346387.930252:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.8@o2ib100 00000400:00000200:21.0:1644346387.930257:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.8@o2ib100(172.19.2.8@o2ib100:172.19.2.8@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346387.930260:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.8@o2ib100 00000800:00000200:21.0:1644346387.930262:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000050bc1d78] -> 172.19.2.8@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346387.930264:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000050bc1d78] -> 172.19.2.8@o2ib100 (2) version: 0 00000800:00000200:48.2F:1644346391.513984:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[0000000068a22ef3] (20)++ 00000800:00000200:31.0:1644346391.514070:0:170488:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[0000000068a22ef3] (21)++ 00000800:00000200:31.0:1644346391.514083:0:170488:0:(o2iblnd_cb.c:343:kiblnd_handle_rx()) Received d1[0] from 172.19.1.54@o2ib100 00000400:00000200:31.0:1644346391.514090:0:170488:0:(lib-move.c:4287:lnet_parse()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100) <- 192.168.128.103@o2ib18 : PUT - for me 00000400:00000200:31.0:1644346391.514100:0:170488:0:(lib-ptl.c:571:lnet_ptl_match_md()) Request from 12345-192.168.128.103@o2ib18 of length 224 into portal 28 MB=0x61afc2a846240 00000400:00000200:31.0:1644346391.514107:0:170488:0:(lib-ptl.c:200:lnet_try_match_md()) Incoming put index 1c from 12345-192.168.128.103@o2ib18 of length 224/224 into md 0x411011 [8] + 9184 00000400:00000200:31.0:1644346391.514113:0:170488:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:31.0:1644346391.514117:0:170488:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.1.54@o2ib100: PUT: OK 00000100:00000200:31.0:1644346391.514120:0:170488:0:(events.c:313:request_in_callback()) event type 2, status 0, service ost 00000800:00000200:18.0:1644346391.514131:0:170491:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[0000000068a22ef3] (22)-- 00000800:00000200:31.0:1644346391.514132:0:170488:0:(o2iblnd_cb.c:205:kiblnd_post_rx()) conn[0000000068a22ef3] (22)++ 00000800:00000200:31.0:1644346391.514136:0:170488:0:(o2iblnd_cb.c:239:kiblnd_post_rx()) conn[0000000068a22ef3] (22)-- 00000800:00000200:31.0:1644346391.514137:0:170488:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[0000000068a22ef3] (21)-- 00000100:00000200:0.0:1644346391.514218:0:170701:0:(service.c:2304:ptlrpc_server_handle_request()) got req 1718520207663680 00010000:00000200:0.0:1644346391.514227:0:170701:0:(ldlm_lib.c:3215:target_send_reply_msg()) @@@ sending reply req@000000008bfcd833 x1718520207663680/t0(0) o400->e941be7c-6bba-b5a3-5d49-5e2cdc2d2e99@192.168.128.103@o2ib18:242/0 lens 224/224 e 0 to 0 dl 1644346452 ref 1 fl Interpret:H/0/0 rc 0/0 job:'kworker/52:0.0' 00000100:00000200:0.0:1644346391.514234:0:170701:0:(niobuf.c:87:ptl_send_buf()) Sending 224 bytes to portal 4, xid 1718520207663680, offset 224 00000400:00000200:0.0:1644346391.514236:0:170701:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-192.168.128.103@o2ib18 00000400:00000200:0.0:1644346391.514240:0:170701:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source Specified: 172.19.1.138@o2ib100 to NMR: 192.168.128.103@o2ib18 routed destination 00000400:00000200:0.0:1644346391.514242:0:170701:0:(lib-move.c:2014:lnet_handle_find_routed_path()) using src nid 172.19.1.138@o2ib100 for route restriction 00000400:00000200:0.0:1644346391.514244:0:170701:0:(lib-move.c:1336:lnet_select_peer_ni()) 172.19.1.138@o2ib100 ni_is_pref = 1 00000400:00000200:0.0:1644346391.514245:0:170701:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 192.168.128.103@o2ib18 00000400:00000200:0.0:1644346391.514245:0:170701:0:(lib-move.c:1474:lnet_find_route_locked()) Looking up a route to o2ib18, from o2ib100 00000400:00000200:0.0:1644346391.514247:0:170701:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 172.19.1.54@o2ib100 00000400:00000200:0.0:1644346391.514248:0:170701:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 172.19.1.55@o2ib100 00000400:00000200:0.0:1644346391.514252:0:170701:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:172.19.1.138@o2ib100) -> 192.168.128.103@o2ib18(192.168.128.103@o2ib18:172.19.1.54@o2ib100) : PUT try# 0 00000800:00000200:0.0:1644346391.514257:0:170701:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 224 bytes in 1 frags to 12345-172.19.1.54@o2ib100 00000800:00000200:0.0:1644346391.514261:0:170701:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000055d51421] -> 172.19.1.54@o2ib100 (3) version: 12 00000800:00000200:0.0:1644346391.514262:0:170701:0:(o2iblnd_cb.c:1519:kiblnd_launch_tx()) conn[0000000068a22ef3] (20)++ 00000800:00000200:0.0:1644346391.514264:0:170701:0:(o2iblnd_cb.c:1265:kiblnd_queue_tx_locked()) conn[0000000068a22ef3] (21)++ 00000800:00000200:0.0:1644346391.514269:0:170701:0:(o2iblnd_cb.c:1525:kiblnd_launch_tx()) conn[0000000068a22ef3] (22)-- 00000800:00000200:48.2:1644346391.514341:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[0000000068a22ef3] (21)++ 00000800:00000200:17.0:1644346391.514417:0:170490:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[0000000068a22ef3] (22)++ 00000800:00000200:17.0:1644346391.514437:0:170490:0:(o2iblnd_cb.c:75:kiblnd_tx_done()) conn[0000000068a22ef3] (23)-- 00000400:00000200:17.0:1644346391.514438:0:170490:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:17.0:1644346391.514441:0:170490:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.1.54@o2ib100: PUT: OK 00000400:00000200:17.0:1644346391.514443:0:170490:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000009e1a4eca 00000800:00000200:17.0:1644346391.514448:0:170490:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[0000000068a22ef3] (22)-- 00000800:00000200:17.0:1644346391.514449:0:170490:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[0000000068a22ef3] (21)-- 00000400:00000200:21.0:1644346393.050086:0:170493:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.10@o2ib100, cpt = 1 00000400:00000200:21.0:1644346393.050095:0:170493:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.10@o2ib100: 0 00000400:00000200:21.0:1644346393.050096:0:170493:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:21.0:1644346393.050098:0:170493:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:21.0:1644346393.050101:0:170493:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.10@o2ib100 NID 172.19.2.10@o2ib100: 0. pending discovery 00000400:00000200:26.0:1644346393.050144:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:26.0:1644346393.050151:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.10@o2ib100(000000009fe0e532) state 0x36060 00000400:00000200:26.0:1644346393.050161:0:170492:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.10@o2ib100 00000400:00000200:26.0:1644346393.050166:0:170492:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.10@o2ib100 local destination 00000400:00000200:26.0:1644346393.050170:0:170492:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.10@o2ib100 00000400:00000200:26.0:1644346393.050176:0:170492:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.10@o2ib100(172.19.2.10@o2ib100:172.19.2.10@o2ib100) : GET try# 0 00000800:00000200:26.0:1644346393.050181:0:170492:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.10@o2ib100 00000800:00000200:26.0:1644346393.050186:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000028023f80] -> 172.19.2.10@o2ib100 (2) version: 0 00000800:00000200:26.0:1644346393.050189:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000028023f80] -> 172.19.2.10@o2ib100 (2) version: 0 00000400:00000200:26.0:1644346393.050192:0:170492:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.10@o2ib100 00000400:00000200:26.0:1644346393.050194:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.10@o2ib100(000000009fe0e532) state 0x34260 rc 0 00000800:00000200:17.2:1644346396.569815:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[00000000c99f1f45] (20)++ 00000800:00000200:18.0:1644346396.569881:0:170491:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[00000000c99f1f45] (21)++ 00000800:00000200:18.0:1644346396.569893:0:170491:0:(o2iblnd_cb.c:343:kiblnd_handle_rx()) Received d1[1] from 172.19.1.55@o2ib100 00000400:00000200:18.0:1644346396.569900:0:170491:0:(lib-move.c:4287:lnet_parse()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100) <- 192.168.128.103@o2ib18 : PUT - for me 00000400:00000200:18.0:1644346396.569911:0:170491:0:(lib-ptl.c:571:lnet_ptl_match_md()) Request from 12345-192.168.128.103@o2ib18 of length 224 into portal 28 MB=0x61afc2a846800 00000400:00000200:18.0:1644346396.569918:0:170491:0:(lib-ptl.c:200:lnet_try_match_md()) Incoming put index 1c from 12345-192.168.128.103@o2ib18 of length 224/224 into md 0x411011 [8] + 9408 00000400:00000200:18.0:1644346396.569924:0:170491:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:18.0:1644346396.569928:0:170491:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.1.55@o2ib100: PUT: OK 00000100:00000200:18.0:1644346396.569931:0:170491:0:(events.c:313:request_in_callback()) event type 2, status 0, service ost 00000800:00000200:18.0:1644346396.569942:0:170491:0:(o2iblnd_cb.c:205:kiblnd_post_rx()) conn[00000000c99f1f45] (22)++ 00000800:00000200:31.0:1644346396.569943:0:170488:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[00000000c99f1f45] (22)-- 00000800:00000200:18.0:1644346396.569946:0:170491:0:(o2iblnd_cb.c:239:kiblnd_post_rx()) conn[00000000c99f1f45] (23)-- 00000800:00000200:18.0:1644346396.569948:0:170491:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[00000000c99f1f45] (21)-- 00000100:00000200:0.0:1644346396.570049:0:170701:0:(service.c:2304:ptlrpc_server_handle_request()) got req 1718520207665152 00010000:00000200:0.0:1644346396.570059:0:170701:0:(ldlm_lib.c:3215:target_send_reply_msg()) @@@ sending reply req@000000007781a136 x1718520207665152/t0(0) o400->e941be7c-6bba-b5a3-5d49-5e2cdc2d2e99@192.168.128.103@o2ib18:247/0 lens 224/224 e 0 to 0 dl 1644346457 ref 1 fl Interpret:H/0/0 rc 0/0 job:'kworker/52:0.0' 00000100:00000200:0.0:1644346396.570066:0:170701:0:(niobuf.c:87:ptl_send_buf()) Sending 224 bytes to portal 4, xid 1718520207665152, offset 224 00000400:00000200:0.0:1644346396.570069:0:170701:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-192.168.128.103@o2ib18 00000400:00000200:0.0:1644346396.570073:0:170701:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source Specified: 172.19.1.138@o2ib100 to NMR: 192.168.128.103@o2ib18 routed destination 00000400:00000200:0.0:1644346396.570075:0:170701:0:(lib-move.c:2014:lnet_handle_find_routed_path()) using src nid 172.19.1.138@o2ib100 for route restriction 00000400:00000200:0.0:1644346396.570076:0:170701:0:(lib-move.c:1336:lnet_select_peer_ni()) 172.19.1.138@o2ib100 ni_is_pref = 1 00000400:00000200:0.0:1644346396.570077:0:170701:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 192.168.128.103@o2ib18 00000400:00000200:0.0:1644346396.570078:0:170701:0:(lib-move.c:1474:lnet_find_route_locked()) Looking up a route to o2ib18, from o2ib100 00000400:00000200:0.0:1644346396.570080:0:170701:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 172.19.1.54@o2ib100 00000400:00000200:0.0:1644346396.570081:0:170701:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 172.19.1.55@o2ib100 00000400:00000200:0.0:1644346396.570085:0:170701:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:172.19.1.138@o2ib100) -> 192.168.128.103@o2ib18(192.168.128.103@o2ib18:172.19.1.55@o2ib100) : PUT try# 0 00000800:00000200:0.0:1644346396.570088:0:170701:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 224 bytes in 1 frags to 12345-172.19.1.55@o2ib100 00000800:00000200:0.0:1644346396.570091:0:170701:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000cd903220] -> 172.19.1.55@o2ib100 (3) version: 12 00000800:00000200:0.0:1644346396.570093:0:170701:0:(o2iblnd_cb.c:1519:kiblnd_launch_tx()) conn[00000000c99f1f45] (20)++ 00000800:00000200:0.0:1644346396.570094:0:170701:0:(o2iblnd_cb.c:1265:kiblnd_queue_tx_locked()) conn[00000000c99f1f45] (21)++ 00000800:00000200:0.0:1644346396.570099:0:170701:0:(o2iblnd_cb.c:1525:kiblnd_launch_tx()) conn[00000000c99f1f45] (22)-- 00000800:00000200:17.2:1644346396.570172:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[00000000c99f1f45] (21)++ 00000800:00000200:17.0:1644346396.570193:0:170490:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[00000000c99f1f45] (22)++ 00000800:00000200:17.0:1644346396.570199:0:170490:0:(o2iblnd_cb.c:75:kiblnd_tx_done()) conn[00000000c99f1f45] (23)-- 00000400:00000200:17.0:1644346396.570202:0:170490:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:17.0:1644346396.570208:0:170490:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.1.55@o2ib100: PUT: OK 00000400:00000200:17.0:1644346396.570211:0:170490:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000065908944 00000800:00000200:17.0:1644346396.570217:0:170490:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[00000000c99f1f45] (22)-- 00000800:00000200:17.0:1644346396.570220:0:170490:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[00000000c99f1f45] (21)-- 00000400:00000200:21.0:1644346397.146100:0:170493:0:(router.c:1231:lnet_check_routers()) discover 172.19.1.55@o2ib100, cpt = 1 00000400:00000200:21.0:1644346397.146104:0:170493:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.55@o2ib100: 0 00000400:00000200:21.0:1644346397.146105:0:170493:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:21.0:1644346397.146106:0:170493:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:21.0:1644346397.146107:0:170493:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.55@o2ib100 NID 172.19.1.55@o2ib100: 0. pending discovery 00000400:00000200:21.0:1644346397.146109:0:170493:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.5@o2ib100, cpt = 1 00000400:00000200:21.0:1644346397.146110:0:170493:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.5@o2ib100: 0 00000400:00000200:21.0:1644346397.146110:0:170493:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:21.0:1644346397.146110:0:170493:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:21.0:1644346397.146111:0:170493:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.5@o2ib100 NID 172.19.2.5@o2ib100: 0. pending discovery 00000400:00000200:26.0:1644346397.146148:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:26.0:1644346397.146151:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.55@o2ib100(000000008b200298) state 0x36056 00000400:00000200:26.0:1644346397.146158:0:170492:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.1.55@o2ib100 00000400:00000200:26.0:1644346397.146161:0:170492:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.1.55@o2ib100 local destination 00000400:00000200:26.0:1644346397.146163:0:170492:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.1.55@o2ib100 00000400:00000200:26.0:1644346397.146166:0:170492:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.1.55@o2ib100(172.19.1.55@o2ib100:172.19.1.55@o2ib100) : GET try# 0 00000800:00000200:26.0:1644346397.146170:0:170492:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.1.55@o2ib100 00000800:00000200:26.0:1644346397.146173:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000cd903220] -> 172.19.1.55@o2ib100 (3) version: 12 00000800:00000200:26.0:1644346397.146174:0:170492:0:(o2iblnd_cb.c:1519:kiblnd_launch_tx()) conn[00000000c99f1f45] (20)++ 00000800:00000200:26.0:1644346397.146176:0:170492:0:(o2iblnd_cb.c:1265:kiblnd_queue_tx_locked()) conn[00000000c99f1f45] (21)++ 00000800:00000200:26.0:1644346397.146180:0:170492:0:(o2iblnd_cb.c:1525:kiblnd_launch_tx()) conn[00000000c99f1f45] (22)-- 00000400:00000200:26.0:1644346397.146181:0:170492:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.1.55@o2ib100 00000400:00000200:26.0:1644346397.146182:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.55@o2ib100(000000008b200298) state 0x34256 rc 0 00000400:00000200:26.0:1644346397.146183:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.5@o2ib100(00000000cd43bc4b) state 0x36060 00000400:00000200:26.0:1644346397.146185:0:170492:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.5@o2ib100 00000400:00000200:26.0:1644346397.146186:0:170492:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.5@o2ib100 local destination 00000400:00000200:26.0:1644346397.146187:0:170492:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.5@o2ib100 00000400:00000200:26.0:1644346397.146189:0:170492:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.5@o2ib100(172.19.2.5@o2ib100:172.19.2.5@o2ib100) : GET try# 0 00000800:00000200:26.0:1644346397.146190:0:170492:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.5@o2ib100 00000800:00000200:26.0:1644346397.146191:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000675c9457] -> 172.19.2.5@o2ib100 (2) version: 0 00000800:00000200:26.0:1644346397.146192:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000675c9457] -> 172.19.2.5@o2ib100 (2) version: 0 00000400:00000200:26.0:1644346397.146193:0:170492:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.5@o2ib100 00000400:00000200:26.0:1644346397.146194:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.5@o2ib100(00000000cd43bc4b) state 0x34260 rc 0 00000800:00000200:17.2:1644346397.146229:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[00000000c99f1f45] (21)++ 00000800:00000200:31.0:1644346397.146289:0:170488:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[00000000c99f1f45] (22)++ 00000800:00000200:31.0:1644346397.146308:0:170488:0:(o2iblnd_cb.c:75:kiblnd_tx_done()) conn[00000000c99f1f45] (23)-- 00000400:00000200:31.0:1644346397.146310:0:170488:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:31.0:1644346397.146312:0:170488:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.1.55@o2ib100: GET: OK 00000400:00000200:31.0:1644346397.146314:0:170488:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:31.0:1644346397.146315:0:170488:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.1.55@o2ib100: 0 00000800:00000200:31.0:1644346397.146318:0:170488:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[00000000c99f1f45] (22)-- 00000800:00000200:31.0:1644346397.146320:0:170488:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[00000000c99f1f45] (21)++ 00000800:00000200:31.0:1644346397.146323:0:170488:0:(o2iblnd_cb.c:343:kiblnd_handle_rx()) Received d1[2] from 172.19.1.55@o2ib100 00000400:00000200:31.0:1644346397.146325:0:170488:0:(lib-move.c:4287:lnet_parse()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100) <- 172.19.1.55@o2ib100 : REPLY - for me 00000400:00000200:31.0:1644346397.146329:0:170488:0:(lib-move.c:4115:lnet_parse_reply()) 172.19.1.138@o2ib100: Reply from 12345-172.19.1.55@o2ib100 of length 64/64 into md 0x70fbdd 00000800:00000200:17.0:1644346397.146329:0:170490:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[00000000c99f1f45] (22)-- 00000400:00000200:31.0:1644346397.146332:0:170488:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:31.0:1644346397.146333:0:170488:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.1.55@o2ib100: REPLY: OK 00000400:00000200:31.0:1644346397.146334:0:170488:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000f617b232 00000400:00000200:31.0:1644346397.146335:0:170488:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 3 00000400:00000200:31.0:1644346397.146337:0:170488:0:(peer.c:2351:lnet_discovery_event_reply()) Peer 172.19.1.55@o2ib100 has discovery disabled 00000400:00000200:31.0:1644346397.146338:0:170488:0:(peer.c:2374:lnet_discovery_event_reply()) peer 172.19.1.55@o2ib100(000000008b200298) not MR: DD disabled remotely 00000400:00000200:31.0:1644346397.146338:0:170488:0:(peer.c:2432:lnet_discovery_event_reply()) peer 172.19.1.55@o2ib100 data present 0. state = 0x34256 00000400:00000200:31.0:1644346397.146340:0:170488:0:(router.c:457:lnet_router_discovery_ping_reply()) Discovery is disabled. Processing reply for gw: 172.19.1.55@o2ib100:3 00000800:00000200:31.0:1644346397.146344:0:170488:0:(o2iblnd_cb.c:205:kiblnd_post_rx()) conn[00000000c99f1f45] (21)++ 00000400:00000200:26.0:1644346397.146346:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:26.0:1644346397.146347:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.55@o2ib100(000000008b200298) state 0x340d6 00000800:00000200:31.0:1644346397.146349:0:170488:0:(o2iblnd_cb.c:239:kiblnd_post_rx()) conn[00000000c99f1f45] (22)-- 00000800:00000200:31.0:1644346397.146350:0:170488:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[00000000c99f1f45] (21)-- 00000400:00000200:26.0:1644346397.146350:0:170492:0:(peer.c:2727:lnet_peer_merge_data()) peer 172.19.1.55@o2ib100 (000000008b200298): 0 00000400:00000200:26.0:1644346397.146351:0:170492:0:(peer.c:2922:lnet_peer_data_present()) peer 172.19.1.55@o2ib100(000000008b200298): 0. state = 0x34156 00000400:00000200:26.0:1644346397.146352:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.55@o2ib100(000000008b200298) state 0x34156 rc 1 00000400:00000200:26.0:1644346397.146353:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.55@o2ib100(000000008b200298) state 0x34156 00000400:00000200:26.0:1644346397.146354:0:170492:0:(peer.c:3086:lnet_peer_discovered()) peer 172.19.1.55@o2ib100 00000400:00000200:26.0:1644346397.146355:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.55@o2ib100(000000008b200298) state 0x30116 rc 0 00000400:00000200:26.0:1644346397.146356:0:170492:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.1.55@o2ib100 00000800:00000400:63.0:1644346398.810076:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.1@o2ib100: 693673 seconds 00000400:00000200:63.0:1644346398.810080:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346398.810082:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.1@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346398.810084:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346398.810085:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.1@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346398.810088:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.1@o2ib100) recovery failed with -110 00000400:00000200:21.0:1644346399.194095:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000006fd4441a 00000400:00000200:21.0:1644346399.194098:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346399.194102:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.1@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346399.194110:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.1@o2ib100 00000400:00000200:21.0:1644346399.194114:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.1@o2ib100 local destination 00000400:00000200:21.0:1644346399.194119:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.1@o2ib100 00000400:00000200:21.0:1644346399.194125:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.1@o2ib100(172.19.2.1@o2ib100:172.19.2.1@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346399.194129:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.1@o2ib100 00000800:00000200:21.0:1644346399.194134:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000005251bd3f] -> 172.19.2.1@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346399.194140:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000005251bd3f] -> 172.19.2.1@o2ib100 (2) version: 0 00000800:00000400:63.0:1644346399.770083:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.2@o2ib100: 101 seconds 00000800:00000400:63.0:1644346399.770087:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.2@o2ib100: 14 seconds 00000400:00000200:63.0:1644346399.770090:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346399.770098:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.2@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346399.770101:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346399.770104:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.2@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346399.770107:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.2@o2ib100) recovery failed with -110 00000400:00000200:63.0:1644346399.770110:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346399.770113:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.2@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346399.770116:0:170482:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:63.0:1644346399.770118:0:170482:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.2@o2ib100: 1 00000800:00000400:63.0:1644346399.770125:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.3@o2ib100: 4 seconds 00000800:00000400:63.0:1644346399.770127:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.3@o2ib100: 22 seconds 00000400:00000200:63.0:1644346399.770129:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346399.770132:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.3@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346399.770133:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346399.770135:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.3@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346399.770137:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.3@o2ib100) recovery failed with -110 00000400:00000200:63.0:1644346399.770139:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346399.770140:0:170482:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:63.0:1644346399.770142:0:170482:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.3@o2ib100: 1 00000800:00000400:63.0:1644346399.770145:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.4@o2ib100: 693674 seconds 00000400:00000200:63.0:1644346399.770147:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346399.770150:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.4@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346399.770151:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346399.770153:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.4@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346399.770155:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.4@o2ib100) recovery failed with -110 00000800:00000400:63.0:1644346399.770158:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.5@o2ib100: 13 seconds 00000800:00000400:63.0:1644346399.770160:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.5@o2ib100: 7 seconds 00000400:00000200:63.0:1644346399.770161:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346399.770163:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.5@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346399.770165:0:170482:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:63.0:1644346399.770166:0:170482:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.5@o2ib100: 1 00000400:00000200:63.0:1644346399.770168:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346399.770172:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.5@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346399.770174:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:26.0:1644346399.770174:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:63.0:1644346399.770175:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.5@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346399.770177:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.5@o2ib100) recovery failed with -110 00000800:00000400:63.0:1644346399.770180:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.6@o2ib100: 693674 seconds 00000400:00000200:26.0:1644346399.770180:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.2@o2ib100(000000001205a4c1) state 0x34860 00000400:00000200:63.0:1644346399.770182:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:26.0:1644346399.770184:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000c3704f3b 00000400:00000200:63.0:1644346399.770185:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.6@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346399.770186:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:26.0:1644346399.770186:0:170492:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:63.0:1644346399.770188:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.6@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346399.770190:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.6@o2ib100) recovery failed with -110 00000400:00000200:26.0:1644346399.770191:0:170492:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.2@o2ib100:-110 00000800:00000400:63.0:1644346399.770193:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.7@o2ib100: 693674 seconds 00000400:00000200:63.0:1644346399.770194:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346399.770210:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.7@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:26.0:1644346399.770210:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.2@o2ib100(000000001205a4c1) state 0x36060 rc -110 00000400:00000200:26.0:1644346399.770211:0:170492:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.2@o2ib100: -110 00000400:00000200:26.0:1644346399.770212:0:170492:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.2@o2ib100 00000400:00000200:63.0:1644346399.770213:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:26.0:1644346399.770213:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.3@o2ib100(00000000a8d5cd1f) state 0x34860 00000400:00000200:63.0:1644346399.770214:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.7@o2ib100 recovery message sent unsuccessfully:-110 00000400:00000200:26.0:1644346399.770214:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000004db571d4 00000400:00020000:63.0:1644346399.770215:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.7@o2ib100) recovery failed with -110 00000400:00000200:26.0:1644346399.770215:0:170492:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000800:00000400:63.0:1644346399.770217:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.8@o2ib100: 47 seconds 00000400:00000200:63.0:1644346399.770217:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:26.0:1644346399.770217:0:170492:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.3@o2ib100:-110 00000400:00000200:26.0:1644346399.770217:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.3@o2ib100(00000000a8d5cd1f) state 0x36060 rc -110 00000400:00000200:26.0:1644346399.770218:0:170492:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.3@o2ib100: -110 00000400:00000200:63.0:1644346399.770219:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.8@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:26.0:1644346399.770219:0:170492:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.3@o2ib100 00000400:00000200:63.0:1644346399.770220:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:26.0:1644346399.770220:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.5@o2ib100(00000000cd43bc4b) state 0x34860 00000400:00000200:63.0:1644346399.770221:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.8@o2ib100 recovery message sent unsuccessfully:-110 00000400:00000200:26.0:1644346399.770221:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000f3937d34 00000400:00000200:26.0:1644346399.770221:0:170492:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00020000:63.0:1644346399.770222:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.8@o2ib100) recovery failed with -110 00000400:00000200:26.0:1644346399.770222:0:170492:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.5@o2ib100:-110 00000800:00000400:63.0:1644346399.770223:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.9@o2ib100: 693674 seconds 00000400:00000200:26.0:1644346399.770223:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.5@o2ib100(00000000cd43bc4b) state 0x36060 rc -110 00000400:00000200:63.0:1644346399.770224:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:26.0:1644346399.770224:0:170492:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.5@o2ib100: -110 00000400:00000200:26.0:1644346399.770224:0:170492:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.5@o2ib100 00000400:00000200:63.0:1644346399.770225:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.9@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346399.770226:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346399.770226:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.9@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346399.770227:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.9@o2ib100) recovery failed with -110 00000400:00000200:21.0:1644346400.218096:0:170493:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.3@o2ib100, cpt = 1 00000400:00000200:21.0:1644346400.218100:0:170493:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.3@o2ib100: 0 00000400:00000200:21.0:1644346400.218100:0:170493:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:21.0:1644346400.218101:0:170493:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:21.0:1644346400.218101:0:170493:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.3@o2ib100 NID 172.19.2.3@o2ib100: 0. pending discovery 00000400:00000200:21.0:1644346400.218104:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000b7e9b1b6 00000400:00000200:21.0:1644346400.218105:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346400.218106:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.9@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346400.218110:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.9@o2ib100 00000400:00000200:21.0:1644346400.218112:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.9@o2ib100 local destination 00000400:00000200:21.0:1644346400.218113:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.9@o2ib100 00000400:00000200:21.0:1644346400.218116:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.9@o2ib100(172.19.2.9@o2ib100:172.19.2.9@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346400.218117:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.9@o2ib100 00000800:00000200:21.0:1644346400.218119:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000063275d68] -> 172.19.2.9@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346400.218120:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000063275d68] -> 172.19.2.9@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346400.218121:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000001746cb78 00000400:00000200:21.0:1644346400.218121:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346400.218122:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.7@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346400.218124:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.7@o2ib100 00000400:00000200:21.0:1644346400.218125:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.7@o2ib100 local destination 00000400:00000200:21.0:1644346400.218126:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.7@o2ib100 00000400:00000200:21.0:1644346400.218127:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.7@o2ib100(172.19.2.7@o2ib100:172.19.2.7@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346400.218128:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.7@o2ib100 00000800:00000200:21.0:1644346400.218130:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000084975cc7] -> 172.19.2.7@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346400.218130:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000084975cc7] -> 172.19.2.7@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346400.218131:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000fae22045 00000400:00000200:21.0:1644346400.218131:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346400.218132:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.8@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346400.218134:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.8@o2ib100 00000400:00000200:21.0:1644346400.218135:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.8@o2ib100 local destination 00000400:00000200:21.0:1644346400.218138:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.8@o2ib100 00000400:00000200:21.0:1644346400.218140:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.8@o2ib100(172.19.2.8@o2ib100:172.19.2.8@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346400.218141:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.8@o2ib100 00000400:00000200:26.0:1644346400.218142:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000800:00000200:21.0:1644346400.218142:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000050bc1d78] -> 172.19.2.8@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346400.218143:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000050bc1d78] -> 172.19.2.8@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346400.218143:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000a37ad97c 00000400:00000200:21.0:1644346400.218144:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:26.0:1644346400.218145:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.3@o2ib100(00000000a8d5cd1f) state 0x36060 00000400:00000200:21.0:1644346400.218145:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.6@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346400.218148:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.6@o2ib100 00000400:00000200:21.0:1644346400.218148:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.6@o2ib100 local destination 00000400:00000200:21.0:1644346400.218149:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.6@o2ib100 00000400:00000200:21.0:1644346400.218151:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.6@o2ib100(172.19.2.6@o2ib100:172.19.2.6@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346400.218152:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.6@o2ib100 00000800:00000200:21.0:1644346400.218153:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000006ae19065] -> 172.19.2.6@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346400.218154:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000006ae19065] -> 172.19.2.6@o2ib100 (2) version: 0 00000400:00000200:26.0:1644346400.218163:0:170492:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.3@o2ib100 00000400:00000200:21.0:1644346400.218164:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000e3f098fd 00000400:00000200:21.0:1644346400.218164:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346400.218165:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.4@o2ib100 recovery ping unlinked 00000400:00000200:26.0:1644346400.218166:0:170492:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.3@o2ib100 local destination 00000400:00000200:26.0:1644346400.218168:0:170492:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.3@o2ib100 00000400:00000200:26.0:1644346400.218170:0:170492:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.3@o2ib100(172.19.2.3@o2ib100:172.19.2.3@o2ib100) : GET try# 0 00000800:00000200:26.0:1644346400.218172:0:170492:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.3@o2ib100 00000400:00000200:21.0:1644346400.218173:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.4@o2ib100 00000800:00000200:26.0:1644346400.218174:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f57ddac4] -> 172.19.2.3@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346400.218175:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.4@o2ib100 local destination 00000400:00000200:21.0:1644346400.218175:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.4@o2ib100 00000800:00000200:26.0:1644346400.218176:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f57ddac4] -> 172.19.2.3@o2ib100 (2) version: 0 00000400:00000200:26.0:1644346400.218177:0:170492:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.3@o2ib100 00000400:00000200:26.0:1644346400.218178:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.3@o2ib100(00000000a8d5cd1f) state 0x34260 rc 0 00000400:00000200:21.0:1644346400.218179:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.4@o2ib100(172.19.2.4@o2ib100:172.19.2.4@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346400.218180:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.4@o2ib100 00000800:00000200:21.0:1644346400.218181:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000673851dd] -> 172.19.2.4@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346400.218181:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000673851dd] -> 172.19.2.4@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346400.218182:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000c66d5622 00000400:00000200:21.0:1644346400.218182:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346400.218183:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.5@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346400.218185:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.5@o2ib100 00000400:00000200:21.0:1644346400.218185:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.5@o2ib100 local destination 00000400:00000200:21.0:1644346400.218188:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.5@o2ib100 00000400:00000200:21.0:1644346400.218190:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.5@o2ib100(172.19.2.5@o2ib100:172.19.2.5@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346400.218191:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.5@o2ib100 00000800:00000200:21.0:1644346400.218192:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000675c9457] -> 172.19.2.5@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346400.218193:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000675c9457] -> 172.19.2.5@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346400.218193:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000b76935cb 00000400:00000200:21.0:1644346400.218194:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346400.218194:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.3@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346400.218196:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.3@o2ib100 00000400:00000200:21.0:1644346400.218197:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.3@o2ib100 local destination 00000400:00000200:21.0:1644346400.218197:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.3@o2ib100 00000400:00000200:21.0:1644346400.218199:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.3@o2ib100(172.19.2.3@o2ib100:172.19.2.3@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346400.218200:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.3@o2ib100 00000800:00000200:21.0:1644346400.218201:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f57ddac4] -> 172.19.2.3@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346400.218201:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f57ddac4] -> 172.19.2.3@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346400.218202:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000b90bda72 00000400:00000200:21.0:1644346400.218202:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346400.218203:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.2@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346400.218204:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.2@o2ib100 00000400:00000200:21.0:1644346400.218205:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.2@o2ib100 local destination 00000400:00000200:21.0:1644346400.218206:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.2@o2ib100 00000400:00000200:21.0:1644346400.218207:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.2@o2ib100(172.19.2.2@o2ib100:172.19.2.2@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346400.218208:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.2@o2ib100 00000800:00000200:21.0:1644346400.218209:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000000c3ff7ee] -> 172.19.2.2@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346400.218210:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000000c3ff7ee] -> 172.19.2.2@o2ib100 (2) version: 0 00000800:00000400:63.0:1644346400.794101:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.10@o2ib100: 44 seconds 00000400:00000200:63.0:1644346400.794105:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346400.794113:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.10@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346400.794116:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346400.794119:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.10@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346400.794122:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.10@o2ib100) recovery failed with -110 00000800:00000400:63.0:1644346400.794127:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.11@o2ib100: 125 seconds 00000400:00000200:63.0:1644346400.794129:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346400.794132:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.11@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346400.794133:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346400.794135:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.11@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346400.794137:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.11@o2ib100) recovery failed with -110 00000800:00000400:63.0:1644346400.794140:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.12@o2ib100: 693675 seconds 00000400:00000200:63.0:1644346400.794142:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346400.794144:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.12@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346400.794146:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346400.794148:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.12@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346400.794150:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.12@o2ib100) recovery failed with -110 00000400:00000200:21.0:1644346401.242087:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000c1c01860 00000400:00000200:21.0:1644346401.242091:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346401.242095:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.12@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346401.242103:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.12@o2ib100 00000400:00000200:21.0:1644346401.242106:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.12@o2ib100 local destination 00000400:00000200:21.0:1644346401.242113:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.12@o2ib100 00000400:00000200:21.0:1644346401.242120:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.12@o2ib100(172.19.2.12@o2ib100:172.19.2.12@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346401.242124:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.12@o2ib100 00000800:00000200:21.0:1644346401.242128:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a282465c] -> 172.19.2.12@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346401.242131:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a282465c] -> 172.19.2.12@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346401.242133:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000006ecec57c 00000400:00000200:21.0:1644346401.242134:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346401.242136:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.11@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346401.242140:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.11@o2ib100 00000400:00000200:21.0:1644346401.242143:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.11@o2ib100 local destination 00000400:00000200:21.0:1644346401.242145:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.11@o2ib100 00000400:00000200:21.0:1644346401.242151:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.11@o2ib100(172.19.2.11@o2ib100:172.19.2.11@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346401.242154:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.11@o2ib100 00000800:00000200:21.0:1644346401.242156:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a013326d] -> 172.19.2.11@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346401.242159:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a013326d] -> 172.19.2.11@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346401.242160:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000007d7d62d7 00000400:00000200:21.0:1644346401.242161:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346401.242163:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.10@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346401.242166:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.10@o2ib100 00000400:00000200:21.0:1644346401.242169:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.10@o2ib100 local destination 00000400:00000200:21.0:1644346401.242171:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.10@o2ib100 00000400:00000200:21.0:1644346401.242176:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.10@o2ib100(172.19.2.10@o2ib100:172.19.2.10@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346401.242178:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.10@o2ib100 00000800:00000200:21.0:1644346401.242181:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000028023f80] -> 172.19.2.10@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346401.242183:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000028023f80] -> 172.19.2.10@o2ib100 (2) version: 0 00000800:00000200:17.2:1644346401.753747:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[00000000c99f1f45] (20)++ 00000800:00000200:20.0F:1644346401.753811:0:170489:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[00000000c99f1f45] (21)++ 00000800:00000200:20.0:1644346401.753827:0:170489:0:(o2iblnd_cb.c:343:kiblnd_handle_rx()) Received d1[0] from 172.19.1.55@o2ib100 00000400:00000200:20.0:1644346401.753834:0:170489:0:(lib-move.c:4287:lnet_parse()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100) <- 192.168.128.103@o2ib18 : PUT - for me 00000800:00000200:17.0:1644346401.753834:0:170490:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[00000000c99f1f45] (22)-- 00000400:00000200:20.0:1644346401.753844:0:170489:0:(lib-ptl.c:571:lnet_ptl_match_md()) Request from 12345-192.168.128.103@o2ib18 of length 224 into portal 28 MB=0x61afc2a846dc0 00000400:00000200:20.0:1644346401.753851:0:170489:0:(lib-ptl.c:200:lnet_try_match_md()) Incoming put index 1c from 12345-192.168.128.103@o2ib18 of length 224/224 into md 0x411011 [8] + 9632 00000400:00000200:20.0:1644346401.753857:0:170489:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:20.0:1644346401.753861:0:170489:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.1.55@o2ib100: PUT: OK 00000100:00000200:20.0:1644346401.753864:0:170489:0:(events.c:313:request_in_callback()) event type 2, status 0, service ost 00000800:00000200:20.0:1644346401.753876:0:170489:0:(o2iblnd_cb.c:205:kiblnd_post_rx()) conn[00000000c99f1f45] (21)++ 00000800:00000200:20.0:1644346401.753879:0:170489:0:(o2iblnd_cb.c:239:kiblnd_post_rx()) conn[00000000c99f1f45] (22)-- 00000800:00000200:20.0:1644346401.753881:0:170489:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[00000000c99f1f45] (21)-- 00000100:00000200:43.0F:1644346401.753946:0:3173235:0:(service.c:2304:ptlrpc_server_handle_request()) got req 1718520207666624 00010000:00000200:43.0:1644346401.753956:0:3173235:0:(ldlm_lib.c:3215:target_send_reply_msg()) @@@ sending reply req@00000000591e5921 x1718520207666624/t0(0) o400->e941be7c-6bba-b5a3-5d49-5e2cdc2d2e99@192.168.128.103@o2ib18:252/0 lens 224/224 e 0 to 0 dl 1644346462 ref 1 fl Interpret:H/0/0 rc 0/0 job:'kworker/52:0.0' 00000100:00000200:43.0:1644346401.753962:0:3173235:0:(niobuf.c:87:ptl_send_buf()) Sending 224 bytes to portal 4, xid 1718520207666624, offset 224 00000400:00000200:43.0:1644346401.753982:0:3173235:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-192.168.128.103@o2ib18 00000400:00000200:43.0:1644346401.753986:0:3173235:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source Specified: 172.19.1.138@o2ib100 to NMR: 192.168.128.103@o2ib18 routed destination 00000400:00000200:43.0:1644346401.753987:0:3173235:0:(lib-move.c:2014:lnet_handle_find_routed_path()) using src nid 172.19.1.138@o2ib100 for route restriction 00000400:00000200:43.0:1644346401.753989:0:3173235:0:(lib-move.c:1336:lnet_select_peer_ni()) 172.19.1.138@o2ib100 ni_is_pref = 1 00000400:00000200:43.0:1644346401.753990:0:3173235:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 192.168.128.103@o2ib18 00000400:00000200:43.0:1644346401.753991:0:3173235:0:(lib-move.c:1474:lnet_find_route_locked()) Looking up a route to o2ib18, from o2ib100 00000400:00000200:43.0:1644346401.753999:0:3173235:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 172.19.1.54@o2ib100 00000400:00000200:43.0:1644346401.754000:0:3173235:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 172.19.1.55@o2ib100 00000400:00000200:43.0:1644346401.754004:0:3173235:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:172.19.1.138@o2ib100) -> 192.168.128.103@o2ib18(192.168.128.103@o2ib18:172.19.1.54@o2ib100) : PUT try# 0 00000800:00000200:43.0:1644346401.754006:0:3173235:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 224 bytes in 1 frags to 12345-172.19.1.54@o2ib100 00000800:00000200:43.0:1644346401.754009:0:3173235:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000055d51421] -> 172.19.1.54@o2ib100 (3) version: 12 00000800:00000200:43.0:1644346401.754011:0:3173235:0:(o2iblnd_cb.c:1519:kiblnd_launch_tx()) conn[0000000068a22ef3] (20)++ 00000800:00000200:43.0:1644346401.754012:0:3173235:0:(o2iblnd_cb.c:1265:kiblnd_queue_tx_locked()) conn[0000000068a22ef3] (21)++ 00000800:00000200:43.0:1644346401.754017:0:3173235:0:(o2iblnd_cb.c:1525:kiblnd_launch_tx()) conn[0000000068a22ef3] (22)-- 00000800:00000200:48.2:1644346401.754045:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[0000000068a22ef3] (21)++ 00000800:00000200:18.0:1644346401.754110:0:170491:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[0000000068a22ef3] (22)++ 00000800:00000200:18.0:1644346401.754121:0:170491:0:(o2iblnd_cb.c:75:kiblnd_tx_done()) conn[0000000068a22ef3] (23)-- 00000400:00000200:18.0:1644346401.754124:0:170491:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:18.0:1644346401.754130:0:170491:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.1.54@o2ib100: PUT: OK 00000400:00000200:18.0:1644346401.754133:0:170491:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000002bcc65f 00000800:00000200:18.0:1644346401.754139:0:170491:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[0000000068a22ef3] (22)-- 00000800:00000200:18.0:1644346401.754142:0:170491:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[0000000068a22ef3] (21)-- 00000400:00000200:21.0:1644346402.266018:0:170493:0:(router.c:1231:lnet_check_routers()) discover 172.19.1.54@o2ib100, cpt = 1 00000400:00000200:21.0:1644346402.266026:0:170493:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.54@o2ib100: 0 00000400:00000200:21.0:1644346402.266027:0:170493:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:21.0:1644346402.266029:0:170493:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:21.0:1644346402.266031:0:170493:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.54@o2ib100 NID 172.19.1.54@o2ib100: 0. pending discovery 00000400:00000200:26.0:1644346402.266065:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:26.0:1644346402.266068:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.54@o2ib100(000000004cb39078) state 0x36056 00000400:00000200:26.0:1644346402.266074:0:170492:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.1.54@o2ib100 00000400:00000200:26.0:1644346402.266076:0:170492:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.1.54@o2ib100 local destination 00000400:00000200:26.0:1644346402.266081:0:170492:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.1.54@o2ib100 00000400:00000200:26.0:1644346402.266083:0:170492:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.1.54@o2ib100(172.19.1.54@o2ib100:172.19.1.54@o2ib100) : GET try# 0 00000800:00000200:26.0:1644346402.266085:0:170492:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.1.54@o2ib100 00000800:00000200:26.0:1644346402.266088:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000055d51421] -> 172.19.1.54@o2ib100 (3) version: 12 00000800:00000200:26.0:1644346402.266089:0:170492:0:(o2iblnd_cb.c:1519:kiblnd_launch_tx()) conn[0000000068a22ef3] (20)++ 00000800:00000200:26.0:1644346402.266090:0:170492:0:(o2iblnd_cb.c:1265:kiblnd_queue_tx_locked()) conn[0000000068a22ef3] (21)++ 00000800:00000200:26.0:1644346402.266094:0:170492:0:(o2iblnd_cb.c:1525:kiblnd_launch_tx()) conn[0000000068a22ef3] (22)-- 00000400:00000200:26.0:1644346402.266095:0:170492:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.1.54@o2ib100 00000400:00000200:26.0:1644346402.266096:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.54@o2ib100(000000004cb39078) state 0x34256 rc 0 00000800:00000200:48.2:1644346402.266105:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[0000000068a22ef3] (21)++ 00000800:00000200:17.0:1644346402.266157:0:170490:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[0000000068a22ef3] (22)++ 00000800:00000200:17.0:1644346402.266163:0:170490:0:(o2iblnd_cb.c:75:kiblnd_tx_done()) conn[0000000068a22ef3] (23)-- 00000400:00000200:17.0:1644346402.266176:0:170490:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:17.0:1644346402.266179:0:170490:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.1.54@o2ib100: GET: OK 00000400:00000200:17.0:1644346402.266180:0:170490:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:17.0:1644346402.266181:0:170490:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.1.54@o2ib100: 0 00000800:00000200:17.0:1644346402.266184:0:170490:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[0000000068a22ef3] (22)-- 00000800:00000200:17.0:1644346402.266185:0:170490:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[0000000068a22ef3] (21)-- 00000800:00000200:48.2:1644346402.266220:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[0000000068a22ef3] (20)++ 00000800:00000200:18.0:1644346402.266238:0:170491:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[0000000068a22ef3] (21)++ 00000800:00000200:18.0:1644346402.266244:0:170491:0:(o2iblnd_cb.c:343:kiblnd_handle_rx()) Received d1[3] from 172.19.1.54@o2ib100 00000400:00000200:18.0:1644346402.266247:0:170491:0:(lib-move.c:4287:lnet_parse()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100) <- 172.19.1.54@o2ib100 : REPLY - for me 00000400:00000200:18.0:1644346402.266253:0:170491:0:(lib-move.c:4115:lnet_parse_reply()) 172.19.1.138@o2ib100: Reply from 12345-172.19.1.54@o2ib100 of length 64/64 into md 0x70fc55 00000400:00000200:18.0:1644346402.266256:0:170491:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:18.0:1644346402.266257:0:170491:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.1.54@o2ib100: REPLY: OK 00000400:00000200:18.0:1644346402.266258:0:170491:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000c3704f3b 00000400:00000200:18.0:1644346402.266259:0:170491:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 3 00000400:00000200:18.0:1644346402.266260:0:170491:0:(peer.c:2351:lnet_discovery_event_reply()) Peer 172.19.1.54@o2ib100 has discovery disabled 00000400:00000200:18.0:1644346402.266261:0:170491:0:(peer.c:2374:lnet_discovery_event_reply()) peer 172.19.1.54@o2ib100(000000004cb39078) not MR: DD disabled remotely 00000400:00000200:18.0:1644346402.266262:0:170491:0:(peer.c:2432:lnet_discovery_event_reply()) peer 172.19.1.54@o2ib100 data present 0. state = 0x34256 00000400:00000200:18.0:1644346402.266264:0:170491:0:(router.c:457:lnet_router_discovery_ping_reply()) Discovery is disabled. Processing reply for gw: 172.19.1.54@o2ib100:3 00000800:00000200:18.0:1644346402.266267:0:170491:0:(o2iblnd_cb.c:205:kiblnd_post_rx()) conn[0000000068a22ef3] (22)++ 00000400:00000200:26.0:1644346402.266269:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000800:00000200:18.0:1644346402.266269:0:170491:0:(o2iblnd_cb.c:239:kiblnd_post_rx()) conn[0000000068a22ef3] (23)-- 00000400:00000200:26.0:1644346402.266270:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.54@o2ib100(000000004cb39078) state 0x340d6 00000800:00000200:18.0:1644346402.266270:0:170491:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[0000000068a22ef3] (22)-- 00000800:00000200:18.0:1644346402.266271:0:170491:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[0000000068a22ef3] (21)-- 00000400:00000200:26.0:1644346402.266273:0:170492:0:(peer.c:2727:lnet_peer_merge_data()) peer 172.19.1.54@o2ib100 (000000004cb39078): 0 00000400:00000200:26.0:1644346402.266274:0:170492:0:(peer.c:2922:lnet_peer_data_present()) peer 172.19.1.54@o2ib100(000000004cb39078): 0. state = 0x34156 00000400:00000200:26.0:1644346402.266275:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.54@o2ib100(000000004cb39078) state 0x34156 rc 1 00000400:00000200:26.0:1644346402.266276:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.54@o2ib100(000000004cb39078) state 0x34156 00000400:00000200:26.0:1644346402.266277:0:170492:0:(peer.c:3086:lnet_peer_discovered()) peer 172.19.1.54@o2ib100 00000400:00000200:26.0:1644346402.266277:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.54@o2ib100(000000004cb39078) state 0x30116 rc 0 00000400:00000200:26.0:1644346402.266278:0:170492:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.1.54@o2ib100 00000800:00000200:48.2:1644346406.809788:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[0000000068a22ef3] (20)++ 00000800:00000200:17.0:1644346406.809855:0:170490:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[0000000068a22ef3] (21)++ 00000800:00000200:17.0:1644346406.809867:0:170490:0:(o2iblnd_cb.c:343:kiblnd_handle_rx()) Received d1[0] from 172.19.1.54@o2ib100 00000400:00000200:17.0:1644346406.809874:0:170490:0:(lib-move.c:4287:lnet_parse()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100) <- 192.168.128.103@o2ib18 : PUT - for me 00000400:00000200:17.0:1644346406.809884:0:170490:0:(lib-ptl.c:571:lnet_ptl_match_md()) Request from 12345-192.168.128.103@o2ib18 of length 224 into portal 28 MB=0x61afc2a847400 00000400:00000200:17.0:1644346406.809892:0:170490:0:(lib-ptl.c:200:lnet_try_match_md()) Incoming put index 1c from 12345-192.168.128.103@o2ib18 of length 224/224 into md 0x411011 [8] + 9856 00000400:00000200:17.0:1644346406.809916:0:170490:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:17.0:1644346406.809917:0:170490:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.1.54@o2ib100: PUT: OK 00000100:00000200:17.0:1644346406.809919:0:170490:0:(events.c:313:request_in_callback()) event type 2, status 0, service ost 00000800:00000200:20.0:1644346406.809922:0:170489:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[0000000068a22ef3] (22)-- 00000800:00000200:17.0:1644346406.809928:0:170490:0:(o2iblnd_cb.c:205:kiblnd_post_rx()) conn[0000000068a22ef3] (21)++ 00000800:00000200:17.0:1644346406.809930:0:170490:0:(o2iblnd_cb.c:239:kiblnd_post_rx()) conn[0000000068a22ef3] (22)-- 00000800:00000200:17.0:1644346406.809931:0:170490:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[0000000068a22ef3] (21)-- 00000100:00000200:43.0:1644346406.810018:0:3173235:0:(service.c:2304:ptlrpc_server_handle_request()) got req 1718520207668224 00010000:00000200:43.0:1644346406.810033:0:3173235:0:(ldlm_lib.c:3215:target_send_reply_msg()) @@@ sending reply req@000000005c3d585c x1718520207668224/t0(0) o400->e941be7c-6bba-b5a3-5d49-5e2cdc2d2e99@192.168.128.103@o2ib18:257/0 lens 224/224 e 0 to 0 dl 1644346467 ref 1 fl Interpret:H/0/0 rc 0/0 job:'kworker/52:0.0' 00000100:00000200:43.0:1644346406.810054:0:3173235:0:(niobuf.c:87:ptl_send_buf()) Sending 224 bytes to portal 4, xid 1718520207668224, offset 224 00000400:00000200:43.0:1644346406.810059:0:3173235:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-192.168.128.103@o2ib18 00000400:00000200:43.0:1644346406.810066:0:3173235:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source Specified: 172.19.1.138@o2ib100 to NMR: 192.168.128.103@o2ib18 routed destination 00000400:00000200:43.0:1644346406.810070:0:3173235:0:(lib-move.c:2014:lnet_handle_find_routed_path()) using src nid 172.19.1.138@o2ib100 for route restriction 00000400:00000200:43.0:1644346406.810073:0:3173235:0:(lib-move.c:1336:lnet_select_peer_ni()) 172.19.1.138@o2ib100 ni_is_pref = 1 00000400:00000200:43.0:1644346406.810075:0:3173235:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 192.168.128.103@o2ib18 00000400:00000200:43.0:1644346406.810077:0:3173235:0:(lib-move.c:1474:lnet_find_route_locked()) Looking up a route to o2ib18, from o2ib100 00000400:00000200:43.0:1644346406.810083:0:3173235:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 172.19.1.54@o2ib100 00000400:00000200:43.0:1644346406.810085:0:3173235:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 172.19.1.55@o2ib100 00000400:00000200:43.0:1644346406.810094:0:3173235:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:172.19.1.138@o2ib100) -> 192.168.128.103@o2ib18(192.168.128.103@o2ib18:172.19.1.55@o2ib100) : PUT try# 0 00000800:00000200:43.0:1644346406.810101:0:3173235:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 224 bytes in 1 frags to 12345-172.19.1.55@o2ib100 00000800:00000200:43.0:1644346406.810108:0:3173235:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000cd903220] -> 172.19.1.55@o2ib100 (3) version: 12 00000800:00000200:43.0:1644346406.810111:0:3173235:0:(o2iblnd_cb.c:1519:kiblnd_launch_tx()) conn[00000000c99f1f45] (20)++ 00000800:00000200:43.0:1644346406.810113:0:3173235:0:(o2iblnd_cb.c:1265:kiblnd_queue_tx_locked()) conn[00000000c99f1f45] (21)++ 00000800:00000200:43.0:1644346406.810131:0:3173235:0:(o2iblnd_cb.c:1525:kiblnd_launch_tx()) conn[00000000c99f1f45] (22)-- 00000800:00000200:17.2:1644346406.810143:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[00000000c99f1f45] (21)++ 00000800:00000200:18.0:1644346406.810197:0:170491:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[00000000c99f1f45] (22)++ 00000800:00000200:18.0:1644346406.810204:0:170491:0:(o2iblnd_cb.c:75:kiblnd_tx_done()) conn[00000000c99f1f45] (23)-- 00000800:00000200:31.0:1644346406.810217:0:170488:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[00000000c99f1f45] (23)-- 00000400:00000200:18.0:1644346406.810218:0:170491:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:18.0:1644346406.810221:0:170491:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.1.55@o2ib100: PUT: OK 00000400:00000200:18.0:1644346406.810223:0:170491:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000032a5ffd7 00000800:00000200:18.0:1644346406.810227:0:170491:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[00000000c99f1f45] (21)-- 00000400:00000200:21.0:1644346407.386086:0:170493:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.1@o2ib100, cpt = 1 00000400:00000200:21.0:1644346407.386092:0:170493:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.1@o2ib100: 0 00000400:00000200:21.0:1644346407.386094:0:170493:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:21.0:1644346407.386095:0:170493:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:21.0:1644346407.386098:0:170493:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.1@o2ib100 NID 172.19.2.1@o2ib100: 0. pending discovery 00000400:00000200:21.0:1644346407.386100:0:170493:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.4@o2ib100, cpt = 1 00000400:00000200:26.0:1644346407.386103:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:21.0:1644346407.386104:0:170493:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.4@o2ib100: 0 00000400:00000200:21.0:1644346407.386105:0:170493:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:21.0:1644346407.386106:0:170493:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:21.0:1644346407.386108:0:170493:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.4@o2ib100 NID 172.19.2.4@o2ib100: 0. pending discovery 00000400:00000200:21.0:1644346407.386110:0:170493:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.6@o2ib100, cpt = 1 00000400:00000200:26.0:1644346407.386115:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.1@o2ib100(0000000031931eda) state 0x36060 00000400:00000200:21.0:1644346407.386115:0:170493:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.6@o2ib100: 0 00000400:00000200:21.0:1644346407.386116:0:170493:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:21.0:1644346407.386117:0:170493:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:21.0:1644346407.386119:0:170493:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.6@o2ib100 NID 172.19.2.6@o2ib100: 0. pending discovery 00000400:00000200:21.0:1644346407.386121:0:170493:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.7@o2ib100, cpt = 1 00000400:00000200:21.0:1644346407.386124:0:170493:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.7@o2ib100: 0 00000400:00000200:21.0:1644346407.386125:0:170493:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:21.0:1644346407.386126:0:170493:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:21.0:1644346407.386128:0:170493:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.7@o2ib100 NID 172.19.2.7@o2ib100: 0. pending discovery 00000400:00000200:21.0:1644346407.386130:0:170493:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.9@o2ib100, cpt = 1 00000400:00000200:21.0:1644346407.386133:0:170493:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.9@o2ib100: 0 00000400:00000200:26.0:1644346407.386134:0:170492:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.1@o2ib100 00000400:00000200:21.0:1644346407.386134:0:170493:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:21.0:1644346407.386135:0:170493:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:21.0:1644346407.386137:0:170493:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.9@o2ib100 NID 172.19.2.9@o2ib100: 0. pending discovery 00000400:00000200:21.0:1644346407.386140:0:170493:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.12@o2ib100, cpt = 1 00000400:00000200:21.0:1644346407.386143:0:170493:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.12@o2ib100: 0 00000400:00000200:21.0:1644346407.386144:0:170493:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:21.0:1644346407.386145:0:170493:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:21.0:1644346407.386147:0:170493:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.12@o2ib100 NID 172.19.2.12@o2ib100: 0. pending discovery 00000400:00000200:26.0:1644346407.386151:0:170492:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.1@o2ib100 local destination 00000400:00000200:26.0:1644346407.386156:0:170492:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.1@o2ib100 00000400:00000200:26.0:1644346407.386162:0:170492:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.1@o2ib100(172.19.2.1@o2ib100:172.19.2.1@o2ib100) : GET try# 0 00000800:00000200:26.0:1644346407.386167:0:170492:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.1@o2ib100 00000800:00000200:26.0:1644346407.386172:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000005251bd3f] -> 172.19.2.1@o2ib100 (2) version: 0 00000800:00000200:26.0:1644346407.386175:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000005251bd3f] -> 172.19.2.1@o2ib100 (2) version: 0 00000400:00000200:26.0:1644346407.386178:0:170492:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.1@o2ib100 00000400:00000200:26.0:1644346407.386180:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.1@o2ib100(0000000031931eda) state 0x34260 rc 0 00000400:00000200:26.0:1644346407.386183:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.4@o2ib100(000000009954ed50) state 0x36060 00000400:00000200:26.0:1644346407.386186:0:170492:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.4@o2ib100 00000400:00000200:26.0:1644346407.386193:0:170492:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.4@o2ib100 local destination 00000400:00000200:26.0:1644346407.386195:0:170492:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.4@o2ib100 00000400:00000200:26.0:1644346407.386201:0:170492:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.4@o2ib100(172.19.2.4@o2ib100:172.19.2.4@o2ib100) : GET try# 0 00000800:00000200:26.0:1644346407.386204:0:170492:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.4@o2ib100 00000800:00000200:26.0:1644346407.386207:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000673851dd] -> 172.19.2.4@o2ib100 (2) version: 0 00000800:00000200:26.0:1644346407.386209:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000673851dd] -> 172.19.2.4@o2ib100 (2) version: 0 00000400:00000200:26.0:1644346407.386211:0:170492:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.4@o2ib100 00000400:00000200:26.0:1644346407.386213:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.4@o2ib100(000000009954ed50) state 0x34260 rc 0 00000400:00000200:26.0:1644346407.386228:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.6@o2ib100(00000000c5c06062) state 0x36060 00000400:00000200:26.0:1644346407.386230:0:170492:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.6@o2ib100 00000400:00000200:26.0:1644346407.386231:0:170492:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.6@o2ib100 local destination 00000400:00000200:26.0:1644346407.386232:0:170492:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.6@o2ib100 00000400:00000200:26.0:1644346407.386234:0:170492:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.6@o2ib100(172.19.2.6@o2ib100:172.19.2.6@o2ib100) : GET try# 0 00000800:00000200:26.0:1644346407.386235:0:170492:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.6@o2ib100 00000800:00000200:26.0:1644346407.386236:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000006ae19065] -> 172.19.2.6@o2ib100 (2) version: 0 00000800:00000200:26.0:1644346407.386237:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000006ae19065] -> 172.19.2.6@o2ib100 (2) version: 0 00000400:00000200:26.0:1644346407.386238:0:170492:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.6@o2ib100 00000400:00000200:26.0:1644346407.386238:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.6@o2ib100(00000000c5c06062) state 0x34260 rc 0 00000400:00000200:26.0:1644346407.386239:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.7@o2ib100(0000000088a898f7) state 0x36060 00000400:00000200:26.0:1644346407.386241:0:170492:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.7@o2ib100 00000400:00000200:26.0:1644346407.386242:0:170492:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.7@o2ib100 local destination 00000400:00000200:26.0:1644346407.386243:0:170492:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.7@o2ib100 00000400:00000200:26.0:1644346407.386244:0:170492:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.7@o2ib100(172.19.2.7@o2ib100:172.19.2.7@o2ib100) : GET try# 0 00000800:00000200:26.0:1644346407.386245:0:170492:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.7@o2ib100 00000800:00000200:26.0:1644346407.386246:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000084975cc7] -> 172.19.2.7@o2ib100 (2) version: 0 00000800:00000200:26.0:1644346407.386247:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000084975cc7] -> 172.19.2.7@o2ib100 (2) version: 0 00000400:00000200:26.0:1644346407.386248:0:170492:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.7@o2ib100 00000400:00000200:26.0:1644346407.386248:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.7@o2ib100(0000000088a898f7) state 0x34260 rc 0 00000400:00000200:26.0:1644346407.386249:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.9@o2ib100(000000005b998561) state 0x36060 00000400:00000200:26.0:1644346407.386252:0:170492:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.9@o2ib100 00000400:00000200:26.0:1644346407.386253:0:170492:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.9@o2ib100 local destination 00000400:00000200:26.0:1644346407.386254:0:170492:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.9@o2ib100 00000400:00000200:26.0:1644346407.386256:0:170492:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.9@o2ib100(172.19.2.9@o2ib100:172.19.2.9@o2ib100) : GET try# 0 00000800:00000200:26.0:1644346407.386257:0:170492:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.9@o2ib100 00000800:00000200:26.0:1644346407.386258:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000063275d68] -> 172.19.2.9@o2ib100 (2) version: 0 00000800:00000200:26.0:1644346407.386258:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000063275d68] -> 172.19.2.9@o2ib100 (2) version: 0 00000400:00000200:26.0:1644346407.386259:0:170492:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.9@o2ib100 00000400:00000200:26.0:1644346407.386260:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.9@o2ib100(000000005b998561) state 0x34260 rc 0 00000400:00000200:26.0:1644346407.386261:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.12@o2ib100(00000000e223ed87) state 0x36060 00000400:00000200:26.0:1644346407.386262:0:170492:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.12@o2ib100 00000400:00000200:26.0:1644346407.386263:0:170492:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.12@o2ib100 local destination 00000400:00000200:26.0:1644346407.386264:0:170492:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.12@o2ib100 00000400:00000200:26.0:1644346407.386266:0:170492:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.12@o2ib100(172.19.2.12@o2ib100:172.19.2.12@o2ib100) : GET try# 0 00000800:00000200:26.0:1644346407.386267:0:170492:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.12@o2ib100 00000800:00000200:26.0:1644346407.386268:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a282465c] -> 172.19.2.12@o2ib100 (2) version: 0 00000800:00000200:26.0:1644346407.386269:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a282465c] -> 172.19.2.12@o2ib100 (2) version: 0 00000400:00000200:26.0:1644346407.386270:0:170492:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.12@o2ib100 00000400:00000200:26.0:1644346407.386270:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.12@o2ib100(00000000e223ed87) state 0x34260 rc 0 00000800:00000400:63.0:1644346408.794080:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.1.135@o2ib100: 1 seconds 00000400:00000200:63.0:1644346408.794084:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346408.794089:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.1.135@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346408.794092:0:170482:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:63.0:1644346408.794095:0:170482:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.1.135@o2ib100: 1 00000400:00000200:26.0:1644346408.794151:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:26.0:1644346408.794156:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.135@o2ib100(00000000670d4d09) state 0x4860 00000400:00000200:26.0:1644346408.794161:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000dc6203d0 00000400:00000200:26.0:1644346408.794163:0:170492:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:26.0:1644346408.794167:0:170492:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.1.135@o2ib100:-110 00000400:00000200:26.0:1644346408.794169:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.135@o2ib100(00000000670d4d09) state 0x6060 rc -110 00000400:00000200:26.0:1644346408.794172:0:170492:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.1.135@o2ib100: -110 00000400:00000200:26.0:1644346408.794175:0:170492:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.1.135@o2ib100 00000400:00000200:26.0:1644346408.794177:0:170492:0:(lib-msg.c:1012:lnet_is_health_check()) msg 000000008fd7c267 not committed for send or receive 00000400:00000200:26.0:1644346408.794180:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000073ea0491 00000100:00000200:26.0:1644346408.794185:0:170492:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@00000000a47225e0 x1723495557508800/t0(0) o38->lflood-MDT0003-lwp-OST0001@172.19.1.135@o2ib100:12/10 lens 520/544 e 0 to 0 dl 1644346413 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000400:00000200:26.0:1644346408.794210:0:170492:0:(lib-msg.c:1012:lnet_is_health_check()) msg 000000002fad5e42 not committed for send or receive 00000400:00000200:26.0:1644346408.794211:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000040b9b21 00000100:00000200:26.0:1644346408.794215:0:170492:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@00000000acc94f2f x1723495557508864/t0(0) o38->lflood-MDT0002-lwp-OST0001@172.19.1.135@o2ib100:12/10 lens 520/544 e 0 to 0 dl 1644346413 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000400:00000200:8.0:1644346408.794270:0:170496:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000dbd50fbd 00000100:00000200:8.0:1644346408.794278:0:170496:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@00000000a47225e0 x1723495557508800/t0(0) o38->lflood-MDT0003-lwp-OST0001@172.19.1.135@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346413 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:8.0:1644346408.794289:0:170496:0:(events.c:122:reply_in_callback()) @@@ unlink req@00000000a47225e0 x1723495557508800/t0(0) o38->lflood-MDT0003-lwp-OST0001@172.19.1.135@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346413 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000400:00000200:8.0:1644346408.794305:0:170496:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000006161234d 00000100:00000200:8.0:1644346408.794308:0:170496:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@00000000acc94f2f x1723495557508864/t0(0) o38->lflood-MDT0002-lwp-OST0001@172.19.1.135@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346413 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:8.0:1644346408.794315:0:170496:0:(events.c:122:reply_in_callback()) @@@ unlink req@00000000acc94f2f x1723495557508864/t0(0) o38->lflood-MDT0002-lwp-OST0001@172.19.1.135@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346413 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000100:00000200:8.0:1644346408.794397:0:170496:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495557509312, portal 10 00000100:00000200:8.0:1644346408.794401:0:170496:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 12, xid 1723495557509312, offset 0 00000400:00000200:8.0:1644346408.794406:0:170496:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.136@o2ib100 00000400:00000200:8.0:1644346408.794415:0:170496:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.136@o2ib100: 0 00000400:00000200:8.0:1644346408.794417:0:170496:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:8.0:1644346408.794418:0:170496:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:26.0:1644346408.794419:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:8.0:1644346408.794420:0:170496:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.136@o2ib100 NID 172.19.1.136@o2ib100: 0. pending discovery 00000400:00000200:8.0:1644346408.794423:0:170496:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 0000000085587921 delayed. 172.19.1.136@o2ib100 pending discovery 00000400:00000200:26.0:1644346408.794427:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.136@o2ib100(00000000e24e7935) state 0x6060 00000400:00000200:26.0:1644346408.794432:0:170492:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.1.136@o2ib100 00000400:00000200:26.0:1644346408.794434:0:170492:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.1.136@o2ib100 local destination 00000100:00000200:8.0:1644346408.794434:0:170496:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495557509376, portal 10 00000400:00000200:26.0:1644346408.794436:0:170492:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.1.136@o2ib100 00000100:00000200:8.0:1644346408.794437:0:170496:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 12, xid 1723495557509376, offset 0 00000400:00000200:26.0:1644346408.794438:0:170492:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.1.136@o2ib100(172.19.1.136@o2ib100:172.19.1.136@o2ib100) : GET try# 0 00000800:00000200:26.0:1644346408.794439:0:170492:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.1.136@o2ib100 00000400:00000200:8.0:1644346408.794441:0:170496:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.136@o2ib100 00000800:00000200:26.0:1644346408.794442:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000d8f15834] -> 172.19.1.136@o2ib100 (2) version: 0 00000800:00000200:26.0:1644346408.794443:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000d8f15834] -> 172.19.1.136@o2ib100 (2) version: 0 00000400:00000200:26.0:1644346408.794443:0:170492:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.1.136@o2ib100 00000400:00000200:26.0:1644346408.794445:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.136@o2ib100(00000000e24e7935) state 0x4260 rc 0 00000400:00000200:8.0:1644346408.794445:0:170496:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.136@o2ib100: -114 00000400:00000200:8.0:1644346408.794446:0:170496:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:8.0:1644346408.794447:0:170496:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:8.0:1644346408.794453:0:170496:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.136@o2ib100 NID 172.19.1.136@o2ib100: 0. pending discovery 00000400:00000200:8.0:1644346408.794456:0:170496:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 000000000adcf2e7 delayed. 172.19.1.136@o2ib100 pending discovery 00000800:00000200:48.2:1644346408.921969:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[0000000068a22ef3] (20)++ 00000800:00000200:20.0:1644346408.922031:0:170489:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[0000000068a22ef3] (21)++ 00000800:00000200:20.0:1644346408.922043:0:170489:0:(o2iblnd_cb.c:343:kiblnd_handle_rx()) Received d1[0] from 172.19.1.54@o2ib100 00000400:00000200:20.0:1644346408.922050:0:170489:0:(lib-move.c:4287:lnet_parse()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100) <- 192.168.128.103@o2ib18 : PUT - for me 00000400:00000200:20.0:1644346408.922058:0:170489:0:(lib-ptl.c:571:lnet_ptl_match_md()) Request from 12345-192.168.128.103@o2ib18 of length 224 into portal 28 MB=0x61afc2a847a80 00000400:00000200:20.0:1644346408.922064:0:170489:0:(lib-ptl.c:200:lnet_try_match_md()) Incoming put index 1c from 12345-192.168.128.103@o2ib18 of length 224/224 into md 0x411011 [8] + 10080 00000400:00000200:20.0:1644346408.922069:0:170489:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:20.0:1644346408.922073:0:170489:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.1.54@o2ib100: PUT: OK 00000100:00000200:20.0:1644346408.922076:0:170489:0:(events.c:313:request_in_callback()) event type 2, status 0, service ost 00000800:00000200:20.0:1644346408.922085:0:170489:0:(o2iblnd_cb.c:205:kiblnd_post_rx()) conn[0000000068a22ef3] (22)++ 00000800:00000200:20.0:1644346408.922089:0:170489:0:(o2iblnd_cb.c:239:kiblnd_post_rx()) conn[0000000068a22ef3] (23)-- 00000800:00000200:20.0:1644346408.922090:0:170489:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[0000000068a22ef3] (22)-- 00000800:00000200:17.0:1644346408.922092:0:170490:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[0000000068a22ef3] (21)-- 00000100:00000200:43.0:1644346408.922160:0:3173235:0:(service.c:2304:ptlrpc_server_handle_request()) got req 1718520207669888 00010000:00000200:43.0:1644346408.922173:0:3173235:0:(ldlm_lib.c:3215:target_send_reply_msg()) @@@ sending reply req@00000000d83f6d4b x1718520207669888/t0(0) o400->e941be7c-6bba-b5a3-5d49-5e2cdc2d2e99@192.168.128.103@o2ib18:259/0 lens 224/224 e 0 to 0 dl 1644346469 ref 1 fl Interpret:H/0/0 rc 0/0 job:'kworker/52:0.0' 00000100:00000200:43.0:1644346408.922185:0:3173235:0:(niobuf.c:87:ptl_send_buf()) Sending 224 bytes to portal 4, xid 1718520207669888, offset 224 00000400:00000200:43.0:1644346408.922190:0:3173235:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-192.168.128.103@o2ib18 00000400:00000200:43.0:1644346408.922197:0:3173235:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source Specified: 172.19.1.138@o2ib100 to NMR: 192.168.128.103@o2ib18 routed destination 00000400:00000200:43.0:1644346408.922200:0:3173235:0:(lib-move.c:2014:lnet_handle_find_routed_path()) using src nid 172.19.1.138@o2ib100 for route restriction 00000400:00000200:43.0:1644346408.922203:0:3173235:0:(lib-move.c:1336:lnet_select_peer_ni()) 172.19.1.138@o2ib100 ni_is_pref = 1 00000400:00000200:43.0:1644346408.922205:0:3173235:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 192.168.128.103@o2ib18 00000400:00000200:43.0:1644346408.922207:0:3173235:0:(lib-move.c:1474:lnet_find_route_locked()) Looking up a route to o2ib18, from o2ib100 00000400:00000200:43.0:1644346408.922210:0:3173235:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 172.19.1.54@o2ib100 00000400:00000200:43.0:1644346408.922212:0:3173235:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 172.19.1.55@o2ib100 00000400:00000200:43.0:1644346408.922220:0:3173235:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:172.19.1.138@o2ib100) -> 192.168.128.103@o2ib18(192.168.128.103@o2ib18:172.19.1.54@o2ib100) : PUT try# 0 00000800:00000200:43.0:1644346408.922225:0:3173235:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 224 bytes in 1 frags to 12345-172.19.1.54@o2ib100 00000800:00000200:43.0:1644346408.922230:0:3173235:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000055d51421] -> 172.19.1.54@o2ib100 (3) version: 12 00000800:00000200:43.0:1644346408.922233:0:3173235:0:(o2iblnd_cb.c:1519:kiblnd_launch_tx()) conn[0000000068a22ef3] (20)++ 00000800:00000200:43.0:1644346408.922235:0:3173235:0:(o2iblnd_cb.c:1265:kiblnd_queue_tx_locked()) conn[0000000068a22ef3] (21)++ 00000800:00000200:43.0:1644346408.922249:0:3173235:0:(o2iblnd_cb.c:1525:kiblnd_launch_tx()) conn[0000000068a22ef3] (22)-- 00000800:00000200:48.2:1644346408.922257:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[0000000068a22ef3] (21)++ 00000800:00000200:31.0:1644346408.922269:0:170488:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[0000000068a22ef3] (22)++ 00000800:00000200:31.0:1644346408.922276:0:170488:0:(o2iblnd_cb.c:75:kiblnd_tx_done()) conn[0000000068a22ef3] (23)-- 00000400:00000200:31.0:1644346408.922278:0:170488:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:31.0:1644346408.922280:0:170488:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.1.54@o2ib100: PUT: OK 00000800:00000200:18.0:1644346408.922281:0:170491:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[0000000068a22ef3] (22)-- 00000400:00000200:31.0:1644346408.922282:0:170488:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000085b1a3b4 00000800:00000200:31.0:1644346408.922286:0:170488:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[0000000068a22ef3] (21)-- 00000800:00000400:63.0:1644346409.818081:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.1.136@o2ib100: 2 seconds 00000400:00000200:63.0:1644346409.818085:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346409.818090:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.1.136@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346409.818093:0:170482:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:63.0:1644346409.818095:0:170482:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.1.136@o2ib100: 1 00000400:00000200:26.0:1644346409.818106:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:26.0:1644346409.818111:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.136@o2ib100(00000000e24e7935) state 0x4860 00000400:00000200:26.0:1644346409.818116:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000dc6203d0 00000400:00000200:26.0:1644346409.818118:0:170492:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:26.0:1644346409.818122:0:170492:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.1.136@o2ib100:-110 00000400:00000200:26.0:1644346409.818124:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.136@o2ib100(00000000e24e7935) state 0x6060 rc -110 00000400:00000200:26.0:1644346409.818126:0:170492:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.1.136@o2ib100: -110 00000400:00000200:26.0:1644346409.818129:0:170492:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.1.136@o2ib100 00000400:00000200:26.0:1644346409.818131:0:170492:0:(lib-msg.c:1012:lnet_is_health_check()) msg 0000000085587921 not committed for send or receive 00000400:00000200:26.0:1644346409.818133:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000dbd50fbd 00000100:00000200:26.0:1644346409.818137:0:170492:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@00000000aa049739 x1723495557509312/t0(0) o38->lflood-MDT0002-lwp-OST0001@172.19.1.136@o2ib100:12/10 lens 520/544 e 0 to 0 dl 1644346463 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000400:00000200:26.0:1644346409.818151:0:170492:0:(lib-msg.c:1012:lnet_is_health_check()) msg 000000000adcf2e7 not committed for send or receive 00000400:00000200:26.0:1644346409.818152:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000a9d9fca1 00000100:00000200:26.0:1644346409.818155:0:170492:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@00000000b9cd26c4 x1723495557509376/t0(0) o38->lflood-MDT0003-lwp-OST0001@172.19.1.136@o2ib100:12/10 lens 520/544 e 0 to 0 dl 1644346463 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000400:00000200:8.0:1644346409.818162:0:170496:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000006161234d 00000100:00000200:8.0:1644346409.818169:0:170496:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@00000000aa049739 x1723495557509312/t0(0) o38->lflood-MDT0002-lwp-OST0001@172.19.1.136@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346463 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:8.0:1644346409.818180:0:170496:0:(events.c:122:reply_in_callback()) @@@ unlink req@00000000aa049739 x1723495557509312/t0(0) o38->lflood-MDT0002-lwp-OST0001@172.19.1.136@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346463 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000400:00000200:8.0:1644346409.818201:0:170496:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000b86463ba 00000100:00000200:8.0:1644346409.818204:0:170496:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@00000000b9cd26c4 x1723495557509376/t0(0) o38->lflood-MDT0003-lwp-OST0001@172.19.1.136@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346463 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:8.0:1644346409.818210:0:170496:0:(events.c:122:reply_in_callback()) @@@ unlink req@00000000b9cd26c4 x1723495557509376/t0(0) o38->lflood-MDT0003-lwp-OST0001@172.19.1.136@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346463 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000100:00000200:8.0:1644346409.818240:0:170496:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495557509440, portal 10 00000100:00000200:8.0:1644346409.818242:0:170496:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 12, xid 1723495557509440, offset 0 00000400:00000200:8.0:1644346409.818244:0:170496:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.135@o2ib100 00000400:00000200:8.0:1644346409.818250:0:170496:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.135@o2ib100: 0 00000400:00000200:8.0:1644346409.818250:0:170496:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:8.0:1644346409.818251:0:170496:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:8.0:1644346409.818252:0:170496:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.135@o2ib100 NID 172.19.1.135@o2ib100: 0. pending discovery 00000400:00000200:8.0:1644346409.818253:0:170496:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 000000009029807b delayed. 172.19.1.135@o2ib100 pending discovery 00000400:00000200:26.0:1644346409.818256:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000100:00000200:8.0:1644346409.818256:0:170496:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495557509504, portal 10 00000100:00000200:8.0:1644346409.818257:0:170496:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 12, xid 1723495557509504, offset 0 00000400:00000200:8.0:1644346409.818258:0:170496:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.135@o2ib100 00000400:00000200:26.0:1644346409.818260:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.135@o2ib100(00000000670d4d09) state 0x6060 00000400:00000200:8.0:1644346409.818264:0:170496:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.135@o2ib100: -114 00000400:00000200:8.0:1644346409.818265:0:170496:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:8.0:1644346409.818266:0:170496:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:8.0:1644346409.818267:0:170496:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.135@o2ib100 NID 172.19.1.135@o2ib100: 0. pending discovery 00000400:00000200:26.0:1644346409.818268:0:170492:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.1.135@o2ib100 00000400:00000200:8.0:1644346409.818269:0:170496:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 0000000083ab8747 delayed. 172.19.1.135@o2ib100 pending discovery 00000400:00000200:26.0:1644346409.818273:0:170492:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.1.135@o2ib100 local destination 00000400:00000200:26.0:1644346409.818277:0:170492:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.1.135@o2ib100 00000400:00000200:26.0:1644346409.818283:0:170492:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.1.135@o2ib100(172.19.1.135@o2ib100:172.19.1.135@o2ib100) : GET try# 0 00000800:00000200:26.0:1644346409.818288:0:170492:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.1.135@o2ib100 00000800:00000200:26.0:1644346409.818292:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000066fa9661] -> 172.19.1.135@o2ib100 (2) version: 0 00000800:00000200:26.0:1644346409.818295:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000066fa9661] -> 172.19.1.135@o2ib100 (2) version: 0 00000400:00000200:26.0:1644346409.818297:0:170492:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.1.135@o2ib100 00000400:00000200:26.0:1644346409.818299:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.135@o2ib100(00000000670d4d09) state 0x4260 rc 0 00000400:00000200:21.0:1644346410.458085:0:170493:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.8@o2ib100, cpt = 1 00000400:00000200:21.0:1644346410.458093:0:170493:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.8@o2ib100: 0 00000400:00000200:21.0:1644346410.458094:0:170493:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:21.0:1644346410.458095:0:170493:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:21.0:1644346410.458102:0:170493:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.8@o2ib100 NID 172.19.2.8@o2ib100: 0. pending discovery 00000400:00000200:26.0:1644346410.458142:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:26.0:1644346410.458147:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.8@o2ib100(00000000031d8f5d) state 0x36060 00000400:00000200:26.0:1644346410.458156:0:170492:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.8@o2ib100 00000400:00000200:26.0:1644346410.458160:0:170492:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.8@o2ib100 local destination 00000400:00000200:26.0:1644346410.458164:0:170492:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.8@o2ib100 00000400:00000200:26.0:1644346410.458170:0:170492:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.8@o2ib100(172.19.2.8@o2ib100:172.19.2.8@o2ib100) : GET try# 0 00000800:00000200:26.0:1644346410.458174:0:170492:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.8@o2ib100 00000800:00000200:26.0:1644346410.458178:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000050bc1d78] -> 172.19.2.8@o2ib100 (2) version: 0 00000800:00000200:26.0:1644346410.458181:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000050bc1d78] -> 172.19.2.8@o2ib100 (2) version: 0 00000400:00000200:26.0:1644346410.458183:0:170492:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.8@o2ib100 00000400:00000200:26.0:1644346410.458185:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.8@o2ib100(00000000031d8f5d) state 0x34260 rc 0 00000800:00000400:63.0:1644346411.802099:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.1@o2ib100: 693686 seconds 00000800:00000400:63.0:1644346411.802101:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.1@o2ib100: 693686 seconds 00000400:00000200:63.0:1644346411.802104:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346411.802106:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.1@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346411.802112:0:170482:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:63.0:1644346411.802113:0:170482:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.1@o2ib100: 1 00000400:00000200:63.0:1644346411.802118:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346411.802119:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.1@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346411.802120:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346411.802121:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.1@o2ib100 recovery message sent unsuccessfully:-110 00000400:00000200:26.0:1644346411.802122:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00020000:63.0:1644346411.802123:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.1@o2ib100) recovery failed with -110 00000400:00000200:26.0:1644346411.802125:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.1@o2ib100(0000000031931eda) state 0x34860 00000800:00000400:63.0:1644346411.802126:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.2@o2ib100: 34 seconds 00000400:00000200:63.0:1644346411.802127:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346411.802128:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.2@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:26.0:1644346411.802128:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000004004410f 00000400:00000200:63.0:1644346411.802129:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:26.0:1644346411.802129:0:170492:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:63.0:1644346411.802130:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.2@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346411.802131:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.2@o2ib100) recovery failed with -110 00000400:00000200:26.0:1644346411.802131:0:170492:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.1@o2ib100:-110 00000400:00000200:26.0:1644346411.802132:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.1@o2ib100(0000000031931eda) state 0x36060 rc -110 00000800:00000400:63.0:1644346411.802133:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.3@o2ib100: 25 seconds 00000800:00000400:63.0:1644346411.802134:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.3@o2ib100: 16 seconds 00000400:00000200:26.0:1644346411.802134:0:170492:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.1@o2ib100: -110 00000400:00000200:63.0:1644346411.802135:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:26.0:1644346411.802135:0:170492:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.1@o2ib100 00000400:00000200:63.0:1644346411.802136:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.3@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346411.802139:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346411.802139:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.3@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346411.802140:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.3@o2ib100) recovery failed with -110 00000400:00000200:63.0:1644346411.802141:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346411.802142:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.3@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346411.802143:0:170482:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:63.0:1644346411.802144:0:170482:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.3@o2ib100: 1 00000800:00000400:63.0:1644346411.802147:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.4@o2ib100: 693686 seconds 00000800:00000400:63.0:1644346411.802147:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.4@o2ib100: 693686 seconds 00000400:00000200:63.0:1644346411.802148:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:26.0:1644346411.802148:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:26.0:1644346411.802149:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.3@o2ib100(00000000a8d5cd1f) state 0x34860 00000400:00000200:63.0:1644346411.802150:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.4@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:26.0:1644346411.802150:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000f3937d34 00000400:00000200:26.0:1644346411.802150:0:170492:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:63.0:1644346411.802151:0:170482:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:63.0:1644346411.802151:0:170482:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.4@o2ib100: 1 00000400:00000200:26.0:1644346411.802152:0:170492:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.3@o2ib100:-110 00000400:00000200:26.0:1644346411.802152:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.3@o2ib100(00000000a8d5cd1f) state 0x36060 rc -110 00000400:00000200:63.0:1644346411.802153:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:26.0:1644346411.802153:0:170492:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.3@o2ib100: -110 00000400:00000200:26.0:1644346411.802154:0:170492:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.3@o2ib100 00000400:00000200:26.0:1644346411.802155:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.4@o2ib100(000000009954ed50) state 0x34860 00000400:00000200:63.0:1644346411.802156:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.4@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:26.0:1644346411.802156:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000030cf9031 00000400:00000200:63.0:1644346411.802157:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346411.802157:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.4@o2ib100 recovery message sent unsuccessfully:-110 00000400:00000200:26.0:1644346411.802157:0:170492:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:26.0:1644346411.802158:0:170492:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.4@o2ib100:-110 00000400:00000200:26.0:1644346411.802158:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.4@o2ib100(000000009954ed50) state 0x36060 rc -110 00000400:00020000:63.0:1644346411.802159:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.4@o2ib100) recovery failed with -110 00000400:00000200:26.0:1644346411.802159:0:170492:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.4@o2ib100: -110 00000400:00000200:26.0:1644346411.802160:0:170492:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.4@o2ib100 00000400:00000200:21.0:1644346412.506093:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000e3f098fd 00000400:00000200:21.0:1644346412.506095:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346412.506097:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.4@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346412.506101:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.4@o2ib100 00000400:00000200:21.0:1644346412.506104:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.4@o2ib100 local destination 00000400:00000200:21.0:1644346412.506106:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.4@o2ib100 00000400:00000200:21.0:1644346412.506108:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.4@o2ib100(172.19.2.4@o2ib100:172.19.2.4@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346412.506110:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.4@o2ib100 00000800:00000200:21.0:1644346412.506113:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000673851dd] -> 172.19.2.4@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346412.506115:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000673851dd] -> 172.19.2.4@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346412.506116:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000b76935cb 00000400:00000200:21.0:1644346412.506116:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346412.506117:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.3@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346412.506119:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.3@o2ib100 00000400:00000200:21.0:1644346412.506120:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.3@o2ib100 local destination 00000400:00000200:21.0:1644346412.506121:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.3@o2ib100 00000400:00000200:21.0:1644346412.506123:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.3@o2ib100(172.19.2.3@o2ib100:172.19.2.3@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346412.506124:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.3@o2ib100 00000800:00000200:21.0:1644346412.506125:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f57ddac4] -> 172.19.2.3@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346412.506126:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f57ddac4] -> 172.19.2.3@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346412.506127:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000b90bda72 00000400:00000200:21.0:1644346412.506127:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346412.506128:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.2@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346412.506130:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.2@o2ib100 00000400:00000200:21.0:1644346412.506130:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.2@o2ib100 local destination 00000400:00000200:21.0:1644346412.506131:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.2@o2ib100 00000400:00000200:21.0:1644346412.506133:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.2@o2ib100(172.19.2.2@o2ib100:172.19.2.2@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346412.506134:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.2@o2ib100 00000800:00000200:21.0:1644346412.506135:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000000c3ff7ee] -> 172.19.2.2@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346412.506139:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000000c3ff7ee] -> 172.19.2.2@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346412.506139:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000006fd4441a 00000400:00000200:21.0:1644346412.506140:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346412.506140:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.1@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346412.506142:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.1@o2ib100 00000400:00000200:21.0:1644346412.506143:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.1@o2ib100 local destination 00000400:00000200:21.0:1644346412.506144:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.1@o2ib100 00000400:00000200:21.0:1644346412.506145:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.1@o2ib100(172.19.2.1@o2ib100:172.19.2.1@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346412.506146:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.1@o2ib100 00000800:00000200:21.0:1644346412.506147:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000005251bd3f] -> 172.19.2.1@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346412.506148:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000005251bd3f] -> 172.19.2.1@o2ib100 (2) version: 0 00000800:00000400:63.0:1644346412.826072:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.5@o2ib100: 20 seconds 00000400:00000200:63.0:1644346412.826076:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346412.826081:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.5@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346412.826084:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346412.826086:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.5@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346412.826089:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.5@o2ib100) recovery failed with -110 00000800:00000400:63.0:1644346412.826094:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.6@o2ib100: 693687 seconds 00000800:00000400:63.0:1644346412.826096:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.6@o2ib100: 693687 seconds 00000400:00000200:63.0:1644346412.826098:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346412.826101:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.6@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346412.826104:0:170482:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:63.0:1644346412.826106:0:170482:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.6@o2ib100: 1 00000400:00000200:63.0:1644346412.826112:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346412.826115:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.6@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:26.0:1644346412.826115:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:63.0:1644346412.826116:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346412.826118:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.6@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346412.826121:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.6@o2ib100) recovery failed with -110 00000400:00000200:26.0:1644346412.826121:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.6@o2ib100(00000000c5c06062) state 0x34860 00000800:00000400:63.0:1644346412.826124:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.7@o2ib100: 693687 seconds 00000400:00000200:26.0:1644346412.826125:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000966c56af 00000800:00000400:63.0:1644346412.826126:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.7@o2ib100: 693687 seconds 00000400:00000200:63.0:1644346412.826127:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:26.0:1644346412.826127:0:170492:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:63.0:1644346412.826130:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.7@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:26.0:1644346412.826131:0:170492:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.6@o2ib100:-110 00000400:00000200:63.0:1644346412.826132:0:170482:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:63.0:1644346412.826134:0:170482:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.7@o2ib100: 1 00000400:00000200:26.0:1644346412.826134:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.6@o2ib100(00000000c5c06062) state 0x36060 rc -110 00000400:00000200:63.0:1644346412.826137:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:26.0:1644346412.826138:0:170492:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.6@o2ib100: -110 00000400:00000200:63.0:1644346412.826140:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.7@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:26.0:1644346412.826141:0:170492:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.6@o2ib100 00000400:00000200:63.0:1644346412.826143:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:26.0:1644346412.826144:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.7@o2ib100(0000000088a898f7) state 0x34860 00000400:00000200:26.0:1644346412.826145:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000279a57f2 00000400:00000200:26.0:1644346412.826146:0:170492:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:26.0:1644346412.826148:0:170492:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.7@o2ib100:-110 00000400:00000200:26.0:1644346412.826164:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.7@o2ib100(0000000088a898f7) state 0x36060 rc -110 00000400:00000200:63.0:1644346412.826165:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.7@o2ib100 recovery message sent unsuccessfully:-110 00000400:00000200:26.0:1644346412.826165:0:170492:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.7@o2ib100: -110 00000400:00000200:26.0:1644346412.826165:0:170492:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.7@o2ib100 00000400:00020000:63.0:1644346412.826167:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.7@o2ib100) recovery failed with -110 00000800:00000400:63.0:1644346412.826168:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.8@o2ib100: 60 seconds 00000400:00000200:63.0:1644346412.826169:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346412.826170:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.8@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346412.826171:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346412.826172:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.8@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346412.826172:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.8@o2ib100) recovery failed with -110 00000800:00000400:63.0:1644346412.826174:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.9@o2ib100: 693687 seconds 00000800:00000400:63.0:1644346412.826174:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.9@o2ib100: 693687 seconds 00000400:00000200:63.0:1644346412.826175:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346412.826176:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.9@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346412.826177:0:170482:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:63.0:1644346412.826177:0:170482:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.9@o2ib100: 1 00000400:00000200:63.0:1644346412.826180:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346412.826181:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.9@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346412.826181:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:26.0:1644346412.826181:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:63.0:1644346412.826182:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.9@o2ib100 recovery message sent unsuccessfully:-110 00000400:00000200:26.0:1644346412.826182:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.9@o2ib100(000000005b998561) state 0x34860 00000400:00020000:63.0:1644346412.826183:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.9@o2ib100) recovery failed with -110 00000400:00000200:26.0:1644346412.826183:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000850883a9 00000400:00000200:26.0:1644346412.826184:0:170492:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000800:00000400:63.0:1644346412.826185:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.10@o2ib100: 56 seconds 00000400:00000200:63.0:1644346412.826185:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346412.826187:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.10@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346412.826187:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:26.0:1644346412.826187:0:170492:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.9@o2ib100:-110 00000400:00000200:63.0:1644346412.826188:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.10@o2ib100 recovery message sent unsuccessfully:-110 00000400:00000200:26.0:1644346412.826188:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.9@o2ib100(000000005b998561) state 0x36060 rc -110 00000400:00000200:26.0:1644346412.826189:0:170492:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.9@o2ib100: -110 00000400:00000200:26.0:1644346412.826189:0:170492:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.9@o2ib100 00000400:00020000:63.0:1644346412.826190:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.10@o2ib100) recovery failed with -110 00000800:00000400:63.0:1644346412.826192:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.11@o2ib100: 137 seconds 00000400:00000200:63.0:1644346412.826192:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346412.826193:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.11@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346412.826194:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346412.826195:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.11@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346412.826197:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.11@o2ib100) recovery failed with -110 00000800:00000400:63.0:1644346412.826199:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.12@o2ib100: 693687 seconds 00000800:00000400:63.0:1644346412.826199:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.12@o2ib100: 693687 seconds 00000400:00000200:63.0:1644346412.826200:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346412.826201:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.12@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346412.826202:0:170482:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:63.0:1644346412.826202:0:170482:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.12@o2ib100: 1 00000400:00000200:63.0:1644346412.826204:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346412.826206:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.12@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346412.826206:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:26.0:1644346412.826206:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:63.0:1644346412.826207:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.12@o2ib100 recovery message sent unsuccessfully:-110 00000400:00000200:26.0:1644346412.826207:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.12@o2ib100(00000000e223ed87) state 0x34860 00000400:00020000:63.0:1644346412.826208:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.12@o2ib100) recovery failed with -110 00000400:00000200:26.0:1644346412.826208:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000083495193 00000400:00000200:26.0:1644346412.826209:0:170492:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:26.0:1644346412.826210:0:170492:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.12@o2ib100:-110 00000400:00000200:26.0:1644346412.826210:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.12@o2ib100(00000000e223ed87) state 0x36060 rc -110 00000400:00000200:26.0:1644346412.826211:0:170492:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.12@o2ib100: -110 00000400:00000200:26.0:1644346412.826212:0:170492:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.12@o2ib100 00000400:00000200:21.0:1644346413.530106:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000c1c01860 00000400:00000200:21.0:1644346413.530110:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346413.530114:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.12@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346413.530121:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.12@o2ib100 00000400:00000200:21.0:1644346413.530125:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.12@o2ib100 local destination 00000400:00000200:21.0:1644346413.530129:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.12@o2ib100 00000400:00000200:21.0:1644346413.530135:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.12@o2ib100(172.19.2.12@o2ib100:172.19.2.12@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346413.530139:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.12@o2ib100 00000800:00000200:21.0:1644346413.530143:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a282465c] -> 172.19.2.12@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346413.530146:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a282465c] -> 172.19.2.12@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346413.530148:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000006ecec57c 00000400:00000200:21.0:1644346413.530149:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346413.530151:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.11@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346413.530155:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.11@o2ib100 00000400:00000200:21.0:1644346413.530157:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.11@o2ib100 local destination 00000400:00000200:21.0:1644346413.530160:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.11@o2ib100 00000400:00000200:21.0:1644346413.530165:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.11@o2ib100(172.19.2.11@o2ib100:172.19.2.11@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346413.530168:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.11@o2ib100 00000800:00000200:21.0:1644346413.530171:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a013326d] -> 172.19.2.11@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346413.530177:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a013326d] -> 172.19.2.11@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346413.530178:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000007d7d62d7 00000400:00000200:21.0:1644346413.530179:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346413.530181:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.10@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346413.530185:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.10@o2ib100 00000400:00000200:21.0:1644346413.530187:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.10@o2ib100 local destination 00000400:00000200:21.0:1644346413.530189:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.10@o2ib100 00000400:00000200:21.0:1644346413.530194:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.10@o2ib100(172.19.2.10@o2ib100:172.19.2.10@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346413.530197:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.10@o2ib100 00000800:00000200:21.0:1644346413.530199:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000028023f80] -> 172.19.2.10@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346413.530201:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000028023f80] -> 172.19.2.10@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346413.530203:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000b7e9b1b6 00000400:00000200:21.0:1644346413.530204:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346413.530205:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.9@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346413.530209:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.9@o2ib100 00000400:00000200:21.0:1644346413.530211:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.9@o2ib100 local destination 00000400:00000200:21.0:1644346413.530213:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.9@o2ib100 00000400:00000200:21.0:1644346413.530218:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.9@o2ib100(172.19.2.9@o2ib100:172.19.2.9@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346413.530221:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.9@o2ib100 00000800:00000200:21.0:1644346413.530224:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000063275d68] -> 172.19.2.9@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346413.530226:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000063275d68] -> 172.19.2.9@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346413.530228:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000001746cb78 00000400:00000200:21.0:1644346413.530228:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346413.530230:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.7@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346413.530234:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.7@o2ib100 00000400:00000200:21.0:1644346413.530236:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.7@o2ib100 local destination 00000400:00000200:21.0:1644346413.530252:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.7@o2ib100 00000400:00000200:21.0:1644346413.530254:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.7@o2ib100(172.19.2.7@o2ib100:172.19.2.7@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346413.530255:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.7@o2ib100 00000800:00000200:21.0:1644346413.530256:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000084975cc7] -> 172.19.2.7@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346413.530258:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000084975cc7] -> 172.19.2.7@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346413.530259:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000fae22045 00000400:00000200:21.0:1644346413.530259:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346413.530260:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.8@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346413.530262:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.8@o2ib100 00000400:00000200:21.0:1644346413.530263:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.8@o2ib100 local destination 00000400:00000200:21.0:1644346413.530263:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.8@o2ib100 00000400:00000200:21.0:1644346413.530265:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.8@o2ib100(172.19.2.8@o2ib100:172.19.2.8@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346413.530266:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.8@o2ib100 00000800:00000200:21.0:1644346413.530267:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000050bc1d78] -> 172.19.2.8@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346413.530267:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000050bc1d78] -> 172.19.2.8@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346413.530268:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000a37ad97c 00000400:00000200:21.0:1644346413.530268:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346413.530269:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.6@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346413.530270:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.6@o2ib100 00000400:00000200:21.0:1644346413.530271:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.6@o2ib100 local destination 00000400:00000200:21.0:1644346413.530272:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.6@o2ib100 00000400:00000200:21.0:1644346413.530273:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.6@o2ib100(172.19.2.6@o2ib100:172.19.2.6@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346413.530274:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.6@o2ib100 00000800:00000200:21.0:1644346413.530275:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000006ae19065] -> 172.19.2.6@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346413.530276:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000006ae19065] -> 172.19.2.6@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346413.530276:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000c66d5622 00000400:00000200:21.0:1644346413.530277:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346413.530277:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.5@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346413.530279:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.5@o2ib100 00000400:00000200:21.0:1644346413.530280:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.5@o2ib100 local destination 00000400:00000200:21.0:1644346413.530280:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.5@o2ib100 00000400:00000200:21.0:1644346413.530282:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.5@o2ib100(172.19.2.5@o2ib100:172.19.2.5@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346413.530283:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.5@o2ib100 00000800:00000200:21.0:1644346413.530284:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000675c9457] -> 172.19.2.5@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346413.530287:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000675c9457] -> 172.19.2.5@o2ib100 (2) version: 0 00000800:00000200:17.2:1644346413.977828:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[00000000c99f1f45] (20)++ 00000800:00000200:20.0:1644346413.977892:0:170489:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[00000000c99f1f45] (21)++ 00000800:00000200:20.0:1644346413.977904:0:170489:0:(o2iblnd_cb.c:343:kiblnd_handle_rx()) Received d1[1] from 172.19.1.55@o2ib100 00000400:00000200:20.0:1644346413.977911:0:170489:0:(lib-move.c:4287:lnet_parse()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100) <- 192.168.128.103@o2ib18 : PUT - for me 00000800:00000200:17.0:1644346413.977911:0:170490:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[00000000c99f1f45] (22)-- 00000400:00000200:20.0:1644346413.977920:0:170489:0:(lib-ptl.c:571:lnet_ptl_match_md()) Request from 12345-192.168.128.103@o2ib18 of length 224 into portal 28 MB=0x61afc2a848040 00000400:00000200:20.0:1644346413.977927:0:170489:0:(lib-ptl.c:200:lnet_try_match_md()) Incoming put index 1c from 12345-192.168.128.103@o2ib18 of length 224/224 into md 0x411011 [8] + 10304 00000400:00000200:20.0:1644346413.977932:0:170489:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:20.0:1644346413.977936:0:170489:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.1.55@o2ib100: PUT: OK 00000100:00000200:20.0:1644346413.977939:0:170489:0:(events.c:313:request_in_callback()) event type 2, status 0, service ost 00000800:00000200:20.0:1644346413.977956:0:170489:0:(o2iblnd_cb.c:205:kiblnd_post_rx()) conn[00000000c99f1f45] (21)++ 00000800:00000200:20.0:1644346413.977960:0:170489:0:(o2iblnd_cb.c:239:kiblnd_post_rx()) conn[00000000c99f1f45] (22)-- 00000800:00000200:20.0:1644346413.977961:0:170489:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[00000000c99f1f45] (21)-- 00000100:00000200:43.0:1644346413.978034:0:3173235:0:(service.c:2304:ptlrpc_server_handle_request()) got req 1718520207671360 00010000:00000200:43.0:1644346413.978047:0:3173235:0:(ldlm_lib.c:3215:target_send_reply_msg()) @@@ sending reply req@00000000a7cb357e x1718520207671360/t0(0) o400->e941be7c-6bba-b5a3-5d49-5e2cdc2d2e99@192.168.128.103@o2ib18:264/0 lens 224/224 e 0 to 0 dl 1644346474 ref 1 fl Interpret:H/0/0 rc 0/0 job:'kworker/52:0.0' 00000100:00000200:43.0:1644346413.978060:0:3173235:0:(niobuf.c:87:ptl_send_buf()) Sending 224 bytes to portal 4, xid 1718520207671360, offset 224 00000400:00000200:43.0:1644346413.978065:0:3173235:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-192.168.128.103@o2ib18 00000400:00000200:43.0:1644346413.978071:0:3173235:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source Specified: 172.19.1.138@o2ib100 to NMR: 192.168.128.103@o2ib18 routed destination 00000400:00000200:43.0:1644346413.978079:0:3173235:0:(lib-move.c:2014:lnet_handle_find_routed_path()) using src nid 172.19.1.138@o2ib100 for route restriction 00000400:00000200:43.0:1644346413.978082:0:3173235:0:(lib-move.c:1336:lnet_select_peer_ni()) 172.19.1.138@o2ib100 ni_is_pref = 1 00000400:00000200:43.0:1644346413.978099:0:3173235:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 192.168.128.103@o2ib18 00000400:00000200:43.0:1644346413.978099:0:3173235:0:(lib-move.c:1474:lnet_find_route_locked()) Looking up a route to o2ib18, from o2ib100 00000400:00000200:43.0:1644346413.978101:0:3173235:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 172.19.1.54@o2ib100 00000400:00000200:43.0:1644346413.978102:0:3173235:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 172.19.1.55@o2ib100 00000400:00000200:43.0:1644346413.978106:0:3173235:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:172.19.1.138@o2ib100) -> 192.168.128.103@o2ib18(192.168.128.103@o2ib18:172.19.1.55@o2ib100) : PUT try# 0 00000800:00000200:43.0:1644346413.978108:0:3173235:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 224 bytes in 1 frags to 12345-172.19.1.55@o2ib100 00000800:00000200:43.0:1644346413.978111:0:3173235:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000cd903220] -> 172.19.1.55@o2ib100 (3) version: 12 00000800:00000200:43.0:1644346413.978112:0:3173235:0:(o2iblnd_cb.c:1519:kiblnd_launch_tx()) conn[00000000c99f1f45] (20)++ 00000800:00000200:43.0:1644346413.978113:0:3173235:0:(o2iblnd_cb.c:1265:kiblnd_queue_tx_locked()) conn[00000000c99f1f45] (21)++ 00000800:00000200:43.0:1644346413.978117:0:3173235:0:(o2iblnd_cb.c:1525:kiblnd_launch_tx()) conn[00000000c99f1f45] (22)-- 00000800:00000200:17.2:1644346413.978127:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[00000000c99f1f45] (21)++ 00000800:00000200:18.0:1644346413.978182:0:170491:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[00000000c99f1f45] (22)++ 00000800:00000200:18.0:1644346413.978193:0:170491:0:(o2iblnd_cb.c:75:kiblnd_tx_done()) conn[00000000c99f1f45] (23)-- 00000400:00000200:18.0:1644346413.978195:0:170491:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:18.0:1644346413.978201:0:170491:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.1.55@o2ib100: PUT: OK 00000400:00000200:18.0:1644346413.978204:0:170491:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000b3942e9e 00000800:00000200:18.0:1644346413.978210:0:170491:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[00000000c99f1f45] (22)-- 00000800:00000200:18.0:1644346413.978213:0:170491:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[00000000c99f1f45] (21)-- 00000400:00000200:21.0:1644346416.602072:0:170493:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.2@o2ib100, cpt = 1 00000400:00000200:21.0:1644346416.602077:0:170493:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.2@o2ib100: 0 00000400:00000200:21.0:1644346416.602078:0:170493:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:21.0:1644346416.602079:0:170493:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:21.0:1644346416.602080:0:170493:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.2@o2ib100 NID 172.19.2.2@o2ib100: 0. pending discovery 00000400:00000200:26.0:1644346416.602083:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:26.0:1644346416.602086:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.2@o2ib100(000000001205a4c1) state 0x36060 00000400:00000200:26.0:1644346416.602092:0:170492:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.2@o2ib100 00000400:00000200:26.0:1644346416.602095:0:170492:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.2@o2ib100 local destination 00000400:00000200:26.0:1644346416.602098:0:170492:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.2@o2ib100 00000400:00000200:26.0:1644346416.602100:0:170492:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.2@o2ib100(172.19.2.2@o2ib100:172.19.2.2@o2ib100) : GET try# 0 00000800:00000200:26.0:1644346416.602102:0:170492:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.2@o2ib100 00000800:00000200:26.0:1644346416.602105:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000000c3ff7ee] -> 172.19.2.2@o2ib100 (2) version: 0 00000800:00000200:26.0:1644346416.602106:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000000c3ff7ee] -> 172.19.2.2@o2ib100 (2) version: 0 00000400:00000200:26.0:1644346416.602107:0:170492:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.2@o2ib100 00000400:00000200:26.0:1644346416.602108:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.2@o2ib100(000000001205a4c1) state 0x34260 rc 0 00000800:00000400:63.0:1644346421.786083:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.1.135@o2ib100: 14 seconds 00000400:00000200:63.0:1644346421.786089:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346421.786094:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.1.135@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346421.786098:0:170482:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:63.0:1644346421.786101:0:170482:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.1.135@o2ib100: 1 00000400:00000200:26.0:1644346421.786115:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:26.0:1644346421.786121:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.135@o2ib100(00000000670d4d09) state 0x4860 00000400:00000200:26.0:1644346421.786127:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000dc6203d0 00000400:00000200:26.0:1644346421.786129:0:170492:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:26.0:1644346421.786133:0:170492:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.1.135@o2ib100:-110 00000400:00000200:26.0:1644346421.786136:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.135@o2ib100(00000000670d4d09) state 0x6060 rc -110 00000400:00000200:26.0:1644346421.786139:0:170492:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.1.135@o2ib100: -110 00000400:00000200:26.0:1644346421.786142:0:170492:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.1.135@o2ib100 00000400:00000200:26.0:1644346421.786145:0:170492:0:(lib-msg.c:1012:lnet_is_health_check()) msg 000000009029807b not committed for send or receive 00000400:00000200:26.0:1644346421.786150:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000006161234d 00000100:00000200:26.0:1644346421.786155:0:170492:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@00000000def6375c x1723495557509440/t0(0) o38->lflood-MDT0002-lwp-OST0001@172.19.1.135@o2ib100:12/10 lens 520/544 e 0 to 0 dl 1644346464 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000400:00000200:26.0:1644346421.786170:0:170492:0:(lib-msg.c:1012:lnet_is_health_check()) msg 0000000083ab8747 not committed for send or receive 00000400:00000200:26.0:1644346421.786172:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000076ceea99 00000100:00000200:26.0:1644346421.786175:0:170492:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@00000000987294ca x1723495557509504/t0(0) o38->lflood-MDT0003-lwp-OST0001@172.19.1.135@o2ib100:12/10 lens 520/544 e 0 to 0 dl 1644346464 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000400:00000200:8.0:1644346421.786229:0:170496:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000053c493d4 00000100:00000200:8.0:1644346421.786248:0:170496:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@00000000987294ca x1723495557509504/t0(0) o38->lflood-MDT0003-lwp-OST0001@172.19.1.135@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346464 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:8.0:1644346421.786257:0:170496:0:(events.c:122:reply_in_callback()) @@@ unlink req@00000000987294ca x1723495557509504/t0(0) o38->lflood-MDT0003-lwp-OST0001@172.19.1.135@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346464 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000400:00000200:8.0:1644346421.786266:0:170496:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000b86463ba 00000100:00000200:8.0:1644346421.786268:0:170496:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@00000000def6375c x1723495557509440/t0(0) o38->lflood-MDT0002-lwp-OST0001@172.19.1.135@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346464 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:8.0:1644346421.786270:0:170496:0:(events.c:122:reply_in_callback()) @@@ unlink req@00000000def6375c x1723495557509440/t0(0) o38->lflood-MDT0002-lwp-OST0001@172.19.1.135@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346464 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000100:00000200:8.0:1644346421.786343:0:170496:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495557509568, portal 10 00000100:00000200:8.0:1644346421.786346:0:170496:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 12, xid 1723495557509568, offset 0 00000400:00000200:8.0:1644346421.786348:0:170496:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.136@o2ib100 00000400:00000200:8.0:1644346421.786354:0:170496:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.136@o2ib100: 0 00000400:00000200:8.0:1644346421.786355:0:170496:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:8.0:1644346421.786356:0:170496:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:8.0:1644346421.786357:0:170496:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.136@o2ib100 NID 172.19.1.136@o2ib100: 0. pending discovery 00000400:00000200:8.0:1644346421.786358:0:170496:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 00000000f9873cd8 delayed. 172.19.1.136@o2ib100 pending discovery 00000400:00000200:26.0:1644346421.786361:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000100:00000200:8.0:1644346421.786362:0:170496:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495557509632, portal 10 00000100:00000200:8.0:1644346421.786364:0:170496:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 12, xid 1723495557509632, offset 0 00000400:00000200:26.0:1644346421.786365:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.136@o2ib100(00000000e24e7935) state 0x6060 00000400:00000200:8.0:1644346421.786366:0:170496:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.136@o2ib100 00000400:00000200:8.0:1644346421.786369:0:170496:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.136@o2ib100: -114 00000400:00000200:8.0:1644346421.786369:0:170496:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:8.0:1644346421.786370:0:170496:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:8.0:1644346421.786370:0:170496:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.136@o2ib100 NID 172.19.1.136@o2ib100: 0. pending discovery 00000400:00000200:26.0:1644346421.786374:0:170492:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.1.136@o2ib100 00000400:00000200:8.0:1644346421.786385:0:170496:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 00000000caa8e3e5 delayed. 172.19.1.136@o2ib100 pending discovery 00000400:00000200:26.0:1644346421.786386:0:170492:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.1.136@o2ib100 local destination 00000400:00000200:26.0:1644346421.786391:0:170492:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.1.136@o2ib100 00000400:00000200:26.0:1644346421.786397:0:170492:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.1.136@o2ib100(172.19.1.136@o2ib100:172.19.1.136@o2ib100) : GET try# 0 00000800:00000200:26.0:1644346421.786402:0:170492:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.1.136@o2ib100 00000800:00000200:26.0:1644346421.786407:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000d8f15834] -> 172.19.1.136@o2ib100 (2) version: 0 00000800:00000200:26.0:1644346421.786410:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000d8f15834] -> 172.19.1.136@o2ib100 (2) version: 0 00000400:00000200:26.0:1644346421.786412:0:170492:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.1.136@o2ib100 00000400:00000200:26.0:1644346421.786414:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.136@o2ib100(00000000e24e7935) state 0x4260 rc 0 00000800:00000400:63.0:1644346424.794075:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.1@o2ib100: 693699 seconds 00000400:00000200:63.0:1644346424.794081:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346424.794086:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.1@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346424.794089:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346424.794092:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.1@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346424.794096:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.1@o2ib100) recovery failed with -110 00000800:00000400:63.0:1644346424.794102:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.2@o2ib100: 29 seconds 00000400:00000200:63.0:1644346424.794104:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346424.794107:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.2@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346424.794109:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346424.794111:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.2@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346424.794113:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.2@o2ib100) recovery failed with -110 00000800:00000400:63.0:1644346424.794116:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.3@o2ib100: 38 seconds 00000400:00000200:63.0:1644346424.794122:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346424.794125:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.3@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346424.794126:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346424.794128:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.3@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346424.794130:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.3@o2ib100) recovery failed with -110 00000800:00000400:63.0:1644346424.794133:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.4@o2ib100: 693699 seconds 00000400:00000200:63.0:1644346424.794134:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346424.794137:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.4@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346424.794138:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346424.794140:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.4@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346424.794142:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.4@o2ib100) recovery failed with -110 00000800:00000400:63.0:1644346424.794144:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.5@o2ib100: 32 seconds 00000400:00000200:63.0:1644346424.794146:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346424.794149:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.5@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346424.794150:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346424.794164:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.5@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346424.794165:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.5@o2ib100) recovery failed with -110 00000800:00000400:63.0:1644346424.794166:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.6@o2ib100: 693699 seconds 00000400:00000200:63.0:1644346424.794167:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346424.794168:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.6@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346424.794168:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346424.794169:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.6@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346424.794170:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.6@o2ib100) recovery failed with -110 00000800:00000400:63.0:1644346424.794171:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.7@o2ib100: 693699 seconds 00000400:00000200:63.0:1644346424.794172:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:63.0:1644346424.794173:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.7@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346424.794173:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346424.794174:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.7@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346424.794175:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.7@o2ib100) recovery failed with -110 00000400:00000200:21.0:1644346425.818095:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000001746cb78 00000400:00000200:21.0:1644346425.818099:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000800:00000400:63.0:1644346425.818103:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.8@o2ib100: 73 seconds 00000400:00000200:21.0:1644346425.818103:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.7@o2ib100 recovery ping unlinked 00000400:00000200:63.0:1644346425.818110:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:21.0:1644346425.818112:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.7@o2ib100 00000400:00000200:63.0:1644346425.818115:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.8@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:21.0:1644346425.818116:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.7@o2ib100 local destination 00000400:00000200:63.0:1644346425.818118:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346425.818121:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.8@o2ib100 recovery message sent unsuccessfully:-110 00000400:00000200:21.0:1644346425.818121:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.7@o2ib100 00000400:00020000:63.0:1644346425.818125:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.8@o2ib100) recovery failed with -110 00000400:00000200:21.0:1644346425.818127:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.7@o2ib100(172.19.2.7@o2ib100:172.19.2.7@o2ib100) : GET try# 0 00000800:00000400:63.0:1644346425.818129:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.9@o2ib100: 693700 seconds 00000400:00000200:63.0:1644346425.818132:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000800:00000200:21.0:1644346425.818132:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.7@o2ib100 00000400:00000200:63.0:1644346425.818135:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.9@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:63.0:1644346425.818137:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000800:00000200:21.0:1644346425.818137:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000084975cc7] -> 172.19.2.7@o2ib100 (2) version: 0 00000400:00000200:63.0:1644346425.818139:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.9@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346425.818141:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.9@o2ib100) recovery failed with -110 00000800:00000200:21.0:1644346425.818141:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000084975cc7] -> 172.19.2.7@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346425.818144:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000fae22045 00000800:00000400:63.0:1644346425.818145:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.10@o2ib100: 69 seconds 00000400:00000200:21.0:1644346425.818145:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:63.0:1644346425.818147:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:21.0:1644346425.818147:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.8@o2ib100 recovery ping unlinked 00000400:00000200:63.0:1644346425.818150:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.10@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:21.0:1644346425.818152:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.8@o2ib100 00000400:00000200:63.0:1644346425.818153:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:63.0:1644346425.818155:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.10@o2ib100 recovery message sent unsuccessfully:-110 00000400:00000200:21.0:1644346425.818155:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.8@o2ib100 local destination 00000400:00000200:21.0:1644346425.818159:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.8@o2ib100 00000400:00000200:21.0:1644346425.818164:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.8@o2ib100(172.19.2.8@o2ib100:172.19.2.8@o2ib100) : GET try# 0 00000400:00020000:63.0:1644346425.818168:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.10@o2ib100) recovery failed with -110 00000800:00000200:21.0:1644346425.818168:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.8@o2ib100 00000800:00000400:63.0:1644346425.818171:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.11@o2ib100: 9 seconds 00000800:00000400:63.0:1644346425.818173:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.11@o2ib100: 150 seconds 00000400:00000200:63.0:1644346425.818175:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000800:00000200:21.0:1644346425.818189:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000050bc1d78] -> 172.19.2.8@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346425.818190:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000050bc1d78] -> 172.19.2.8@o2ib100 (2) version: 0 00000400:00000200:63.0:1644346425.818191:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.11@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:21.0:1644346425.818191:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000a37ad97c 00000400:00000200:21.0:1644346425.818191:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:63.0:1644346425.818192:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:21.0:1644346425.818192:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.6@o2ib100 recovery ping unlinked 00000400:00000200:63.0:1644346425.818193:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.11@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:63.0:1644346425.818194:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.11@o2ib100) recovery failed with -110 00000400:00000200:63.0:1644346425.818194:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:21.0:1644346425.818195:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.6@o2ib100 00000400:00000200:63.0:1644346425.818196:0:170482:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:63.0:1644346425.818197:0:170482:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.11@o2ib100: 1 00000400:00000200:21.0:1644346425.818197:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.6@o2ib100 local destination 00000800:00000400:63.0:1644346425.818204:0:170482:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.12@o2ib100: 693700 seconds 00000400:00000200:21.0:1644346425.818204:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.6@o2ib100 00000400:00000200:63.0:1644346425.818205:0:170482:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:21.0:1644346425.818206:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.6@o2ib100(172.19.2.6@o2ib100:172.19.2.6@o2ib100) : GET try# 0 00000400:00000200:63.0:1644346425.818207:0:170482:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.2.12@o2ib100: GET: NETWORK_TIMEOUT 00000800:00000200:21.0:1644346425.818207:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.6@o2ib100 00000400:00000200:63.0:1644346425.818209:0:170482:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000800:00000200:21.0:1644346425.818209:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000006ae19065] -> 172.19.2.6@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346425.818209:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000006ae19065] -> 172.19.2.6@o2ib100 (2) version: 0 00000400:00000200:63.0:1644346425.818210:0:170482:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.12@o2ib100 recovery message sent unsuccessfully:-110 00000400:00000200:21.0:1644346425.818210:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000c66d5622 00000400:00020000:63.0:1644346425.818211:0:170482:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.12@o2ib100) recovery failed with -110 00000400:00000200:21.0:1644346425.818211:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346425.818212:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.5@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346425.818214:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.5@o2ib100 00000400:00000200:21.0:1644346425.818215:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.5@o2ib100 local destination 00000400:00000200:21.0:1644346425.818216:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.5@o2ib100 00000400:00000200:21.0:1644346425.818217:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.5@o2ib100(172.19.2.5@o2ib100:172.19.2.5@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346425.818218:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.5@o2ib100 00000800:00000200:21.0:1644346425.818220:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000675c9457] -> 172.19.2.5@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346425.818220:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000675c9457] -> 172.19.2.5@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346425.818221:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000e3f098fd 00000400:00000200:21.0:1644346425.818221:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346425.818222:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.4@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346425.818224:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.4@o2ib100 00000400:00000200:21.0:1644346425.818224:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.4@o2ib100 local destination 00000400:00000200:21.0:1644346425.818225:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.4@o2ib100 00000400:00000200:21.0:1644346425.818227:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.4@o2ib100(172.19.2.4@o2ib100:172.19.2.4@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346425.818228:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.4@o2ib100 00000800:00000200:21.0:1644346425.818229:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000673851dd] -> 172.19.2.4@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346425.818230:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000673851dd] -> 172.19.2.4@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346425.818230:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000b76935cb 00000400:00000200:21.0:1644346425.818231:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346425.818231:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.3@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346425.818233:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.3@o2ib100 00000400:00000200:21.0:1644346425.818234:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.3@o2ib100 local destination 00000400:00000200:21.0:1644346425.818238:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.3@o2ib100 00000400:00000200:21.0:1644346425.818240:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.3@o2ib100(172.19.2.3@o2ib100:172.19.2.3@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346425.818241:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.3@o2ib100 00000800:00000200:21.0:1644346425.818242:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f57ddac4] -> 172.19.2.3@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346425.818242:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f57ddac4] -> 172.19.2.3@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346425.818243:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000b90bda72 00000400:00000200:21.0:1644346425.818243:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346425.818244:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.2@o2ib100 recovery ping unlinked 00000400:00000200:26.0:1644346425.818246:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:21.0:1644346425.818246:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.2@o2ib100 00000400:00000200:21.0:1644346425.818247:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.2@o2ib100 local destination 00000400:00000200:21.0:1644346425.818248:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.2@o2ib100 00000400:00000200:21.0:1644346425.818249:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.2@o2ib100(172.19.2.2@o2ib100:172.19.2.2@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346425.818250:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.2@o2ib100 00000400:00000200:26.0:1644346425.818263:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.11@o2ib100(000000005dbb10e4) state 0x34860 00000800:00000200:21.0:1644346425.818263:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000000c3ff7ee] -> 172.19.2.2@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346425.818264:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000000c3ff7ee] -> 172.19.2.2@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346425.818264:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000006fd4441a 00000400:00000200:21.0:1644346425.818265:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346425.818265:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.1@o2ib100 recovery ping unlinked 00000400:00000200:26.0:1644346425.818266:0:170492:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000f0eeeb8c 00000400:00000200:26.0:1644346425.818268:0:170492:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:21.0:1644346425.818269:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.1@o2ib100 00000400:00000200:26.0:1644346425.818270:0:170492:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.11@o2ib100:-110 00000400:00000200:26.0:1644346425.818271:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.11@o2ib100(000000005dbb10e4) state 0x36060 rc -110 00000400:00000200:21.0:1644346425.818271:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.1@o2ib100 local destination 00000400:00000200:21.0:1644346425.818272:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.1@o2ib100 00000400:00000200:21.0:1644346425.818273:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.1@o2ib100(172.19.2.1@o2ib100:172.19.2.1@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346425.818274:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.1@o2ib100 00000400:00000200:26.0:1644346425.818275:0:170492:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.11@o2ib100: -110 00000400:00000200:26.0:1644346425.818276:0:170492:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.11@o2ib100 00000800:00000200:21.0:1644346425.818278:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000005251bd3f] -> 172.19.2.1@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346425.818279:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000005251bd3f] -> 172.19.2.1@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346426.842091:0:170493:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.11@o2ib100, cpt = 1 00000400:00000200:21.0:1644346426.842099:0:170493:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.11@o2ib100: 0 00000400:00000200:21.0:1644346426.842101:0:170493:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:21.0:1644346426.842103:0:170493:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:21.0:1644346426.842105:0:170493:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.11@o2ib100 NID 172.19.2.11@o2ib100: 0. pending discovery 00000400:00000200:26.0:1644346426.842112:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:21.0:1644346426.842115:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000c1c01860 00000400:00000200:26.0:1644346426.842117:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.11@o2ib100(000000005dbb10e4) state 0x36060 00000400:00000200:21.0:1644346426.842118:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346426.842121:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.12@o2ib100 recovery ping unlinked 00000400:00000200:26.0:1644346426.842127:0:170492:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.11@o2ib100 00000400:00000200:26.0:1644346426.842133:0:170492:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.11@o2ib100 local destination 00000400:00000200:21.0:1644346426.842133:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.12@o2ib100 00000400:00000200:26.0:1644346426.842138:0:170492:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.11@o2ib100 00000400:00000200:26.0:1644346426.842147:0:170492:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.11@o2ib100(172.19.2.11@o2ib100:172.19.2.11@o2ib100) : GET try# 0 00000400:00000200:21.0:1644346426.842152:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.12@o2ib100 local destination 00000800:00000200:26.0:1644346426.842153:0:170492:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.11@o2ib100 00000400:00000200:21.0:1644346426.842157:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.12@o2ib100 00000800:00000200:26.0:1644346426.842159:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a013326d] -> 172.19.2.11@o2ib100 (2) version: 0 00000800:00000200:26.0:1644346426.842163:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a013326d] -> 172.19.2.11@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346426.842164:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.12@o2ib100(172.19.2.12@o2ib100:172.19.2.12@o2ib100) : GET try# 0 00000400:00000200:26.0:1644346426.842165:0:170492:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.11@o2ib100 00000400:00000200:26.0:1644346426.842167:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.11@o2ib100(000000005dbb10e4) state 0x34260 rc 0 00000800:00000200:21.0:1644346426.842170:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.12@o2ib100 00000800:00000200:21.0:1644346426.842175:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a282465c] -> 172.19.2.12@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346426.842178:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a282465c] -> 172.19.2.12@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346426.842180:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000006ecec57c 00000400:00000200:21.0:1644346426.842181:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346426.842183:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.11@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346426.842188:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.11@o2ib100 00000400:00000200:21.0:1644346426.842190:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.11@o2ib100 local destination 00000400:00000200:21.0:1644346426.842193:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.11@o2ib100 00000400:00000200:21.0:1644346426.842198:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.11@o2ib100(172.19.2.11@o2ib100:172.19.2.11@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346426.842201:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.11@o2ib100 00000800:00000200:21.0:1644346426.842204:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a013326d] -> 172.19.2.11@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346426.842206:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a013326d] -> 172.19.2.11@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346426.842208:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000007d7d62d7 00000400:00000200:21.0:1644346426.842209:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346426.842211:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.10@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346426.842214:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.10@o2ib100 00000400:00000200:21.0:1644346426.842216:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.10@o2ib100 local destination 00000400:00000200:21.0:1644346426.842218:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.10@o2ib100 00000400:00000200:21.0:1644346426.842223:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.10@o2ib100(172.19.2.10@o2ib100:172.19.2.10@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346426.842226:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.10@o2ib100 00000800:00000200:21.0:1644346426.842228:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000028023f80] -> 172.19.2.10@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346426.842230:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000028023f80] -> 172.19.2.10@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346426.842235:0:170493:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000b7e9b1b6 00000400:00000200:21.0:1644346426.842236:0:170493:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:21.0:1644346426.842237:0:170493:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.9@o2ib100 recovery ping unlinked 00000400:00000200:21.0:1644346426.842241:0:170493:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.9@o2ib100 00000400:00000200:21.0:1644346426.842243:0:170493:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.9@o2ib100 local destination 00000400:00000200:21.0:1644346426.842245:0:170493:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.9@o2ib100 00000400:00000200:21.0:1644346426.842250:0:170493:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.9@o2ib100(172.19.2.9@o2ib100:172.19.2.9@o2ib100) : GET try# 0 00000800:00000200:21.0:1644346426.842253:0:170493:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.9@o2ib100 00000800:00000200:21.0:1644346426.842256:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000063275d68] -> 172.19.2.9@o2ib100 (2) version: 0 00000800:00000200:21.0:1644346426.842258:0:170493:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000063275d68] -> 172.19.2.9@o2ib100 (2) version: 0 00000400:00000200:21.0:1644346427.866105:0:170493:0:(router.c:1231:lnet_check_routers()) discover 172.19.1.55@o2ib100, cpt = 1 00000400:00000200:21.0:1644346427.866124:0:170493:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.55@o2ib100: 0 00000400:00000200:21.0:1644346427.866124:0:170493:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:21.0:1644346427.866125:0:170493:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:21.0:1644346427.866126:0:170493:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.55@o2ib100 NID 172.19.1.55@o2ib100: 0. pending discovery 00000400:00000200:21.0:1644346427.866127:0:170493:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.5@o2ib100, cpt = 1 00000400:00000200:26.0:1644346427.866128:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:21.0:1644346427.866129:0:170493:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.5@o2ib100: 0 00000400:00000200:21.0:1644346427.866129:0:170493:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:21.0:1644346427.866129:0:170493:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:21.0:1644346427.866130:0:170493:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.5@o2ib100 NID 172.19.2.5@o2ib100: 0. pending discovery 00000400:00000200:26.0:1644346427.866132:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.55@o2ib100(000000008b200298) state 0x36056 00000400:00000200:26.0:1644346427.866137:0:170492:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.1.55@o2ib100 00000400:00000200:26.0:1644346427.866139:0:170492:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.1.55@o2ib100 local destination 00000400:00000200:26.0:1644346427.866140:0:170492:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.1.55@o2ib100 00000400:00000200:26.0:1644346427.866143:0:170492:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.1.55@o2ib100(172.19.1.55@o2ib100:172.19.1.55@o2ib100) : GET try# 0 00000800:00000200:26.0:1644346427.866145:0:170492:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.1.55@o2ib100 00000800:00000200:26.0:1644346427.866147:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000cd903220] -> 172.19.1.55@o2ib100 (3) version: 12 00000800:00000200:26.0:1644346427.866148:0:170492:0:(o2iblnd_cb.c:1519:kiblnd_launch_tx()) conn[00000000c99f1f45] (20)++ 00000800:00000200:26.0:1644346427.866149:0:170492:0:(o2iblnd_cb.c:1265:kiblnd_queue_tx_locked()) conn[00000000c99f1f45] (21)++ 00000800:00000200:26.0:1644346427.866155:0:170492:0:(o2iblnd_cb.c:1525:kiblnd_launch_tx()) conn[00000000c99f1f45] (22)-- 00000400:00000200:26.0:1644346427.866155:0:170492:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.1.55@o2ib100 00000400:00000200:26.0:1644346427.866156:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.55@o2ib100(000000008b200298) state 0x34256 rc 0 00000400:00000200:26.0:1644346427.866157:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.5@o2ib100(00000000cd43bc4b) state 0x36060 00000400:00000200:26.0:1644346427.866160:0:170492:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.5@o2ib100 00000400:00000200:26.0:1644346427.866161:0:170492:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.5@o2ib100 local destination 00000400:00000200:26.0:1644346427.866162:0:170492:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.5@o2ib100 00000400:00000200:26.0:1644346427.866163:0:170492:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100:) -> 172.19.2.5@o2ib100(172.19.2.5@o2ib100:172.19.2.5@o2ib100) : GET try# 0 00000800:00000200:26.0:1644346427.866164:0:170492:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.5@o2ib100 00000800:00000200:26.0:1644346427.866166:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000675c9457] -> 172.19.2.5@o2ib100 (2) version: 0 00000800:00000200:26.0:1644346427.866166:0:170492:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000675c9457] -> 172.19.2.5@o2ib100 (2) version: 0 00000400:00000200:26.0:1644346427.866167:0:170492:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.5@o2ib100 00000400:00000200:26.0:1644346427.866168:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.5@o2ib100(00000000cd43bc4b) state 0x34260 rc 0 00000800:00000200:17.2:1644346427.866205:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[00000000c99f1f45] (21)++ 00000800:00000200:17.0:1644346427.866237:0:170490:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[00000000c99f1f45] (22)++ 00000800:00000200:17.0:1644346427.866241:0:170490:0:(o2iblnd_cb.c:75:kiblnd_tx_done()) conn[00000000c99f1f45] (23)-- 00000400:00000200:17.0:1644346427.866243:0:170490:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:17.0:1644346427.866245:0:170490:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.1.55@o2ib100: GET: OK 00000400:00000200:17.0:1644346427.866247:0:170490:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:17.0:1644346427.866248:0:170490:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.1.55@o2ib100: 0 00000800:00000200:17.0:1644346427.866251:0:170490:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[00000000c99f1f45] (22)-- 00000800:00000200:17.0:1644346427.866253:0:170490:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[00000000c99f1f45] (21)-- 00000800:00000200:17.2:1644346427.866297:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[00000000c99f1f45] (20)++ 00000800:00000200:18.0:1644346427.866350:0:170491:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[00000000c99f1f45] (21)++ 00000800:00000200:18.0:1644346427.866368:0:170491:0:(o2iblnd_cb.c:343:kiblnd_handle_rx()) Received d1[2] from 172.19.1.55@o2ib100 00000400:00000200:18.0:1644346427.866372:0:170491:0:(lib-move.c:4287:lnet_parse()) TRACE: 172.19.1.138@o2ib100(172.19.1.138@o2ib100) <- 172.19.1.55@o2ib100 : REPLY - for me 00000400:00000200:18.0:1644346427.866376:0:170491:0:(lib-move.c:4115:lnet_parse_reply()) 172.19.1.138@o2ib100: Reply from 12345-172.19.1.55@o2ib100 of length 64/64 into md 0x70fdad 00000400:00000200:18.0:1644346427.866379:0:170491:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:18.0:1644346427.866385:0:170491:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.138@o2ib100->172.19.1.55@o2ib100: REPLY: OK 00000400:00000200:18.0:1644346427.866386:0:170491:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000850883a9 00000400:00000200:18.0:1644346427.866387:0:170491:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 3 00000400:00000200:18.0:1644346427.866389:0:170491:0:(peer.c:2351:lnet_discovery_event_reply()) Peer 172.19.1.55@o2ib100 has discovery disabled 00000400:00000200:18.0:1644346427.866390:0:170491:0:(peer.c:2374:lnet_discovery_event_reply()) peer 172.19.1.55@o2ib100(000000008b200298) not MR: DD disabled remotely 00000400:00000200:18.0:1644346427.866391:0:170491:0:(peer.c:2432:lnet_discovery_event_reply()) peer 172.19.1.55@o2ib100 data present 0. state = 0x34256 00000400:00000200:18.0:1644346427.866393:0:170491:0:(router.c:457:lnet_router_discovery_ping_reply()) Discovery is disabled. Processing reply for gw: 172.19.1.55@o2ib100:3 00000800:00000200:18.0:1644346427.866397:0:170491:0:(o2iblnd_cb.c:205:kiblnd_post_rx()) conn[00000000c99f1f45] (22)++ 00000400:00000200:26.0:1644346427.866398:0:170492:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000800:00000200:18.0:1644346427.866399:0:170491:0:(o2iblnd_cb.c:239:kiblnd_post_rx()) conn[00000000c99f1f45] (23)-- 00000800:00000200:18.0:1644346427.866399:0:170491:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[00000000c99f1f45] (22)-- 00000800:00000200:18.0:1644346427.866401:0:170491:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[00000000c99f1f45] (21)-- 00000400:00000200:26.0:1644346427.866402:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.55@o2ib100(000000008b200298) state 0x340d6 00000400:00000200:26.0:1644346427.866405:0:170492:0:(peer.c:2727:lnet_peer_merge_data()) peer 172.19.1.55@o2ib100 (000000008b200298): 0 00000400:00000200:26.0:1644346427.866406:0:170492:0:(peer.c:2922:lnet_peer_data_present()) peer 172.19.1.55@o2ib100(000000008b200298): 0. state = 0x34156 00000400:00000200:26.0:1644346427.866407:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.55@o2ib100(000000008b200298) state 0x34156 rc 1 00000400:00000200:26.0:1644346427.866408:0:170492:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.55@o2ib100(000000008b200298) state 0x34156 00000400:00000200:26.0:1644346427.866409:0:170492:0:(peer.c:3086:lnet_peer_discovered()) peer 172.19.1.55@o2ib100 00000400:00000200:26.0:1644346427.866410:0:170492:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.55@o2ib100(000000008b200298) state 0x30116 rc 0 00000400:00000200:26.0:1644346427.866411:0:170492:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.1.55@o2ib100 Debug log: 1879 lines, 1879 kept, 0 dropped, 0 bad.