00000800:00000400:5.0F:1644346376.167291:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.1.136@o2ib100: 56 seconds 00000400:00000200:5.0:1644346376.167299:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346376.167304:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.136@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346376.167308:0:170138:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:5.0:1644346376.167311:0:170138:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.1.136@o2ib100: 1 00000400:00000200:6.0F:1644346376.167366:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:6.0:1644346376.167371:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.136@o2ib100(00000000d39439d1) state 0x4860 00000400:00000200:6.0:1644346376.167386:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000dccbd790 00000400:00000200:6.0:1644346376.167388:0:170148:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:6.0:1644346376.167390:0:170148:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.1.136@o2ib100:-110 00000400:00000200:6.0:1644346376.167391:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.136@o2ib100(00000000d39439d1) state 0x6060 rc -110 00000400:00000200:6.0:1644346376.167392:0:170148:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.1.136@o2ib100: -110 00000400:00000200:6.0:1644346376.167393:0:170148:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.1.136@o2ib100 00000400:00000200:6.0:1644346376.167394:0:170148:0:(lib-msg.c:1012:lnet_is_health_check()) msg 000000000f77802a not committed for send or receive 00000400:00000200:6.0:1644346376.167395:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000009065dc54 00000100:00000200:6.0:1644346376.167398:0:170148:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@000000009e435ddf x1723495536590016/t0(0) o38->lflood-MDT0002-lwp-OST0000@172.19.1.136@o2ib100:12/10 lens 520/544 e 0 to 0 dl 1644346418 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000400:00000200:6.0:1644346376.167407:0:170148:0:(lib-msg.c:1012:lnet_is_health_check()) msg 00000000ffe24572 not committed for send or receive 00000400:00000200:6.0:1644346376.167408:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000c3067162 00000100:00000200:6.0:1644346376.167409:0:170148:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@000000002879ed4c x1723495536590080/t0(0) o38->lflood-MDT0003-lwp-OST0000@172.19.1.136@o2ib100:12/10 lens 520/544 e 0 to 0 dl 1644346418 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000400:00000200:4.0F:1644346376.167458:0:170152:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000ec5fae42 00000100:00000200:4.0:1644346376.167464:0:170152:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@000000002879ed4c x1723495536590080/t0(0) o38->lflood-MDT0003-lwp-OST0000@172.19.1.136@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346418 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:4.0:1644346376.167481:0:170152:0:(events.c:122:reply_in_callback()) @@@ unlink req@000000002879ed4c x1723495536590080/t0(0) o38->lflood-MDT0003-lwp-OST0000@172.19.1.136@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346418 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000400:00000200:4.0:1644346376.167490:0:170152:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000b107933f 00000100:00000200:4.0:1644346376.167492:0:170152:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@000000009e435ddf x1723495536590016/t0(0) o38->lflood-MDT0002-lwp-OST0000@172.19.1.136@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346418 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:4.0:1644346376.167494:0:170152:0:(events.c:122:reply_in_callback()) @@@ unlink req@000000009e435ddf x1723495536590016/t0(0) o38->lflood-MDT0002-lwp-OST0000@172.19.1.136@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346418 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000100:00000200:4.0:1644346376.167584:0:170152:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495536590144, portal 10 00000100:00000200:4.0:1644346376.167586:0:170152:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 12, xid 1723495536590144, offset 0 00000400:00000200:4.0:1644346376.167589:0:170152:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.135@o2ib100 00000400:00000200:4.0:1644346376.167595:0:170152:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.135@o2ib100: 0 00000400:00000200:4.0:1644346376.167595:0:170152:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:4.0:1644346376.167596:0:170152:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:4.0:1644346376.167597:0:170152:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.135@o2ib100 NID 172.19.1.135@o2ib100: 0. pending discovery 00000400:00000200:4.0:1644346376.167598:0:170152:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 000000007b26eb65 delayed. 172.19.1.135@o2ib100 pending discovery 00000400:00000200:6.0:1644346376.167600:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:6.0:1644346376.167601:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.135@o2ib100(0000000024b6c9c7) state 0x6060 00000100:00000200:4.0:1644346376.167604:0:170152:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495536590208, portal 10 00000400:00000200:6.0:1644346376.167605:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.1.135@o2ib100 00000100:00000200:4.0:1644346376.167607:0:170152:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 12, xid 1723495536590208, offset 0 00000400:00000200:4.0:1644346376.167608:0:170152:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.135@o2ib100 00000400:00000200:6.0:1644346376.167609:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.1.135@o2ib100 local destination 00000400:00000200:6.0:1644346376.167611:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.1.135@o2ib100 00000400:00000200:6.0:1644346376.167613:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.1.135@o2ib100(172.19.1.135@o2ib100:172.19.1.135@o2ib100) : GET try# 0 00000800:00000200:6.0:1644346376.167615:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.1.135@o2ib100 00000400:00000200:4.0:1644346376.167615:0:170152:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.135@o2ib100: -114 00000400:00000200:4.0:1644346376.167616:0:170152:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:4.0:1644346376.167616:0:170152:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:4.0:1644346376.167617:0:170152:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.135@o2ib100 NID 172.19.1.135@o2ib100: 0. pending discovery 00000400:00000200:4.0:1644346376.167618:0:170152:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 00000000fcdc8872 delayed. 172.19.1.135@o2ib100 pending discovery 00000800:00000200:6.0:1644346376.167620:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000c3c4eac4] -> 172.19.1.135@o2ib100 (2) version: 0 00000800:00000200:6.0:1644346376.167622:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000c3c4eac4] -> 172.19.1.135@o2ib100 (2) version: 0 00000400:00000200:6.0:1644346376.167622:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.1.135@o2ib100 00000400:00000200:6.0:1644346376.167623:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.135@o2ib100(0000000024b6c9c7) state 0x4260 rc 0 00000800:00000400:5.0:1644346378.151289:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.1@o2ib100: 693650 seconds 00000400:00000200:5.0:1644346378.151293:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346378.151298:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.1@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346378.151302:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346378.151305:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.1@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346378.151309:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.1@o2ib100) recovery failed with -110 00000400:00000200:27.0F:1644346378.535296:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000009c215f09 00000400:00000200:27.0:1644346378.535303:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346378.535308:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.1@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346378.535316:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.1@o2ib100 00000400:00000200:27.0:1644346378.535320:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.1@o2ib100 local destination 00000400:00000200:27.0:1644346378.535325:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.1@o2ib100 00000400:00000200:27.0:1644346378.535331:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.1@o2ib100(172.19.2.1@o2ib100:172.19.2.1@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346378.535336:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.1@o2ib100 00000800:00000200:27.0:1644346378.535341:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000b7e97ead] -> 172.19.2.1@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346378.535344:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000b7e97ead] -> 172.19.2.1@o2ib100 (2) version: 0 00000800:00000400:5.0:1644346379.111286:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.2@o2ib100: 89 seconds 00000400:00000200:5.0:1644346379.111290:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346379.111295:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.2@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346379.111299:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346379.111301:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.2@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346379.111305:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.2@o2ib100) recovery failed with -110 00000800:00000400:5.0:1644346379.111311:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.4@o2ib100: 693651 seconds 00000400:00000200:5.0:1644346379.111313:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346379.111317:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.4@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346379.111319:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346379.111321:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.4@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346379.111323:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.4@o2ib100) recovery failed with -110 00000800:00000400:5.0:1644346379.111327:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.5@o2ib100: 31 seconds 00000800:00000400:5.0:1644346379.111328:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.5@o2ib100: 48 seconds 00000400:00000200:5.0:1644346379.111330:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346379.111333:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.5@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346379.111336:0:170138:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:5.0:1644346379.111338:0:170138:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.5@o2ib100: 1 00000400:00000200:5.0:1644346379.111344:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346379.111347:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.5@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346379.111348:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346379.111353:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.5@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346379.111355:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.5@o2ib100) recovery failed with -110 00000800:00000400:5.0:1644346379.111358:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.6@o2ib100: 693651 seconds 00000400:00000200:5.0:1644346379.111360:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346379.111363:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.6@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346379.111364:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346379.111366:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.6@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346379.111368:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.6@o2ib100) recovery failed with -110 00000800:00000400:5.0:1644346379.111371:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.7@o2ib100: 693651 seconds 00000400:00000200:5.0:1644346379.111373:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346379.111376:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.7@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346379.111377:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346379.111379:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.7@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346379.111381:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.7@o2ib100) recovery failed with -110 00000800:00000400:5.0:1644346379.111384:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.9@o2ib100: 693651 seconds 00000400:00000200:5.0:1644346379.111385:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346379.111388:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.9@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346379.111390:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346379.111391:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.9@o2ib100 recovery message sent unsuccessfully:-110 00000400:00000200:6.0:1644346379.111393:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00020000:5.0:1644346379.111394:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.9@o2ib100) recovery failed with -110 00000400:00000200:6.0:1644346379.111399:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.5@o2ib100(0000000021a8d375) state 0x34860 00000400:00000200:6.0:1644346379.111403:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000061927726 00000400:00000200:6.0:1644346379.111405:0:170148:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:6.0:1644346379.111409:0:170148:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.5@o2ib100:-110 00000400:00000200:6.0:1644346379.111411:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.5@o2ib100(0000000021a8d375) state 0x36060 rc -110 00000400:00000200:6.0:1644346379.111417:0:170148:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.5@o2ib100: -110 00000400:00000200:6.0:1644346379.111419:0:170148:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.5@o2ib100 00000400:00000200:27.0:1644346379.559273:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000bcab80a7 00000400:00000200:27.0:1644346379.559277:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346379.559293:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.9@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346379.559298:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.9@o2ib100 00000400:00000200:27.0:1644346379.559300:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.9@o2ib100 local destination 00000400:00000200:27.0:1644346379.559302:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.9@o2ib100 00000400:00000200:27.0:1644346379.559304:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.9@o2ib100(172.19.2.9@o2ib100:172.19.2.9@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346379.559306:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.9@o2ib100 00000800:00000200:27.0:1644346379.559309:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f182ce9e] -> 172.19.2.9@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346379.559310:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f182ce9e] -> 172.19.2.9@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346379.559311:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000004045eb92 00000400:00000200:27.0:1644346379.559312:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346379.559313:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.7@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346379.559314:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.7@o2ib100 00000400:00000200:27.0:1644346379.559315:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.7@o2ib100 local destination 00000400:00000200:27.0:1644346379.559316:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.7@o2ib100 00000400:00000200:27.0:1644346379.559318:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.7@o2ib100(172.19.2.7@o2ib100:172.19.2.7@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346379.559319:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.7@o2ib100 00000800:00000200:27.0:1644346379.559320:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000b34b27af] -> 172.19.2.7@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346379.559321:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000b34b27af] -> 172.19.2.7@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346379.559322:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000042525ce2 00000400:00000200:27.0:1644346379.559325:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346379.559326:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.6@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346379.559328:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.6@o2ib100 00000400:00000200:27.0:1644346379.559329:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.6@o2ib100 local destination 00000400:00000200:27.0:1644346379.559330:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.6@o2ib100 00000400:00000200:27.0:1644346379.559331:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.6@o2ib100(172.19.2.6@o2ib100:172.19.2.6@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346379.559332:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.6@o2ib100 00000800:00000200:27.0:1644346379.559333:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000006f0e0d3a] -> 172.19.2.6@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346379.559334:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000006f0e0d3a] -> 172.19.2.6@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346379.559335:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000c0d74cb9 00000400:00000200:27.0:1644346379.559335:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346379.559336:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.4@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346379.559338:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.4@o2ib100 00000400:00000200:27.0:1644346379.559339:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.4@o2ib100 local destination 00000400:00000200:27.0:1644346379.559339:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.4@o2ib100 00000400:00000200:27.0:1644346379.559341:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.4@o2ib100(172.19.2.4@o2ib100:172.19.2.4@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346379.559342:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.4@o2ib100 00000800:00000200:27.0:1644346379.559343:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000004a830e13] -> 172.19.2.4@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346379.559343:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000004a830e13] -> 172.19.2.4@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346379.559344:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000015445d5e 00000400:00000200:27.0:1644346379.559345:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346379.559345:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.5@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346379.559348:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.5@o2ib100 00000400:00000200:27.0:1644346379.559349:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.5@o2ib100 local destination 00000400:00000200:27.0:1644346379.559350:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.5@o2ib100 00000400:00000200:27.0:1644346379.559352:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.5@o2ib100(172.19.2.5@o2ib100:172.19.2.5@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346379.559353:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.5@o2ib100 00000800:00000200:27.0:1644346379.559356:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000055082794] -> 172.19.2.5@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346379.559356:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000055082794] -> 172.19.2.5@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346379.559357:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000920aee3e 00000400:00000200:27.0:1644346379.559361:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346379.559362:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.2@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346379.559365:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.2@o2ib100 00000400:00000200:27.0:1644346379.559365:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.2@o2ib100 local destination 00000400:00000200:27.0:1644346379.559366:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.2@o2ib100 00000400:00000200:27.0:1644346379.559368:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.2@o2ib100(172.19.2.2@o2ib100:172.19.2.2@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346379.559369:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.2@o2ib100 00000800:00000200:27.0:1644346379.559370:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000c9068579] -> 172.19.2.2@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346379.559371:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000c9068579] -> 172.19.2.2@o2ib100 (2) version: 0 00000800:00000400:5.0:1644346380.135285:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.10@o2ib100: 95 seconds 00000400:00000200:5.0:1644346380.135289:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346380.135295:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.10@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346380.135299:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346380.135302:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.10@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346380.135305:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.10@o2ib100) recovery failed with -110 00000800:00000400:5.0:1644346380.135311:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.11@o2ib100: 260 seconds 00000400:00000200:5.0:1644346380.135313:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346380.135317:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.11@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346380.135319:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346380.135323:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.11@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346380.135326:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.11@o2ib100) recovery failed with -110 00000800:00000400:5.0:1644346380.135329:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.12@o2ib100: 693652 seconds 00000400:00000200:5.0:1644346380.135331:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346380.135334:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.12@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346380.135335:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346380.135337:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.12@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346380.135339:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.12@o2ib100) recovery failed with -110 00000400:00000200:27.0:1644346380.583305:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000044e51676 00000400:00000200:27.0:1644346380.583309:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346380.583314:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.12@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346380.583323:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.12@o2ib100 00000400:00000200:27.0:1644346380.583327:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.12@o2ib100 local destination 00000400:00000200:27.0:1644346380.583331:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.12@o2ib100 00000400:00000200:27.0:1644346380.583337:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.12@o2ib100(172.19.2.12@o2ib100:172.19.2.12@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346380.583341:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.12@o2ib100 00000800:00000200:27.0:1644346380.583346:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a77c653d] -> 172.19.2.12@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346380.583349:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a77c653d] -> 172.19.2.12@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346380.583352:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000057bab685 00000400:00000200:27.0:1644346380.583353:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346380.583355:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.11@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346380.583360:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.11@o2ib100 00000400:00000200:27.0:1644346380.583362:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.11@o2ib100 local destination 00000400:00000200:27.0:1644346380.583365:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.11@o2ib100 00000400:00000200:27.0:1644346380.583371:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.11@o2ib100(172.19.2.11@o2ib100:172.19.2.11@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346380.583374:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.11@o2ib100 00000800:00000200:27.0:1644346380.583379:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f801af1f] -> 172.19.2.11@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346380.583381:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f801af1f] -> 172.19.2.11@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346380.583383:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000c6557ad2 00000400:00000200:27.0:1644346380.583388:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346380.583390:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.10@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346380.583394:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.10@o2ib100 00000400:00000200:27.0:1644346380.583396:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.10@o2ib100 local destination 00000400:00000200:27.0:1644346380.583410:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.10@o2ib100 00000400:00000200:27.0:1644346380.583412:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.10@o2ib100(172.19.2.10@o2ib100:172.19.2.10@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346380.583413:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.10@o2ib100 00000800:00000200:27.0:1644346380.583414:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000acd7a784] -> 172.19.2.10@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346380.583415:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000acd7a784] -> 172.19.2.10@o2ib100 (2) version: 0 00000800:00000200:56.2F:1644346383.769797:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[000000003720f48b] (20)++ 00000800:00000200:53.0F:1644346383.769866:0:170146:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[000000003720f48b] (21)++ 00000800:00000200:53.0:1644346383.769888:0:170146:0:(o2iblnd_cb.c:343:kiblnd_handle_rx()) Received d1[0] from 172.19.1.55@o2ib100 00000400:00000200:53.0:1644346383.769891:0:170146:0:(lib-move.c:4287:lnet_parse()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100) <- 192.168.128.103@o2ib18 : PUT - for me 00000400:00000200:53.0:1644346383.769895:0:170146:0:(lib-ptl.c:571:lnet_ptl_match_md()) Request from 12345-192.168.128.103@o2ib18 of length 224 into portal 28 MB=0x61afc2a845c00 00000400:00000200:53.0:1644346383.769900:0:170146:0:(lib-ptl.c:200:lnet_try_match_md()) Incoming put index 1c from 12345-192.168.128.103@o2ib18 of length 224/224 into md 0x524f29 [8] + 7392 00000400:00000200:53.0:1644346383.769902:0:170146:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:53.0:1644346383.769922:0:170146:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.55@o2ib100: PUT: OK 00000100:00000200:53.0:1644346383.769924:0:170146:0:(events.c:313:request_in_callback()) event type 2, status 0, service ost 00000800:00000200:53.0:1644346383.769931:0:170146:0:(o2iblnd_cb.c:205:kiblnd_post_rx()) conn[000000003720f48b] (22)++ 00000800:00000200:53.0:1644346383.769933:0:170146:0:(o2iblnd_cb.c:239:kiblnd_post_rx()) conn[000000003720f48b] (23)-- 00000800:00000200:55.0F:1644346383.769934:0:170145:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000003720f48b] (23)-- 00000800:00000200:53.0:1644346383.769952:0:170146:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000003720f48b] (21)-- 00000100:00000200:43.0F:1644346383.770016:0:2689286:0:(service.c:2304:ptlrpc_server_handle_request()) got req 1718520207662080 00010000:00000200:43.0:1644346383.770035:0:2689286:0:(ldlm_lib.c:3215:target_send_reply_msg()) @@@ sending reply req@000000009be4d280 x1718520207662080/t0(0) o400->e941be7c-6bba-b5a3-5d49-5e2cdc2d2e99@192.168.128.103@o2ib18:234/0 lens 224/224 e 0 to 0 dl 1644346444 ref 1 fl Interpret:H/0/0 rc 0/0 job:'kworker/52:0.0' 00000100:00000200:43.0:1644346383.770050:0:2689286:0:(niobuf.c:87:ptl_send_buf()) Sending 224 bytes to portal 4, xid 1718520207662080, offset 224 00000400:00000200:43.0:1644346383.770055:0:2689286:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-192.168.128.103@o2ib18 00000400:00000200:43.0:1644346383.770072:0:2689286:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source Specified: 172.19.1.137@o2ib100 to NMR: 192.168.128.103@o2ib18 routed destination 00000400:00000200:43.0:1644346383.770076:0:2689286:0:(lib-move.c:2014:lnet_handle_find_routed_path()) using src nid 172.19.1.137@o2ib100 for route restriction 00000400:00000200:43.0:1644346383.770080:0:2689286:0:(lib-move.c:1336:lnet_select_peer_ni()) 172.19.1.137@o2ib100 ni_is_pref = 1 00000400:00000200:43.0:1644346383.770082:0:2689286:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 192.168.128.103@o2ib18 00000400:00000200:43.0:1644346383.770084:0:2689286:0:(lib-move.c:1474:lnet_find_route_locked()) Looking up a route to o2ib18, from o2ib100 00000400:00000200:43.0:1644346383.770088:0:2689286:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 172.19.1.54@o2ib100 00000400:00000200:43.0:1644346383.770090:0:2689286:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 172.19.1.55@o2ib100 00000400:00000200:43.0:1644346383.770099:0:2689286:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:172.19.1.137@o2ib100) -> 192.168.128.103@o2ib18(192.168.128.103@o2ib18:172.19.1.54@o2ib100) : PUT try# 0 00000800:00000200:43.0:1644346383.770103:0:2689286:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 224 bytes in 1 frags to 12345-172.19.1.54@o2ib100 00000800:00000200:43.0:1644346383.770109:0:2689286:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000fd8faaad] -> 172.19.1.54@o2ib100 (3) version: 12 00000800:00000200:43.0:1644346383.770112:0:2689286:0:(o2iblnd_cb.c:1519:kiblnd_launch_tx()) conn[000000002c12efa1] (20)++ 00000800:00000200:43.0:1644346383.770115:0:2689286:0:(o2iblnd_cb.c:1265:kiblnd_queue_tx_locked()) conn[000000002c12efa1] (21)++ 00000800:00000200:43.0:1644346383.770141:0:2689286:0:(o2iblnd_cb.c:1525:kiblnd_launch_tx()) conn[000000002c12efa1] (22)-- 00000800:00000200:30.2F:1644346383.770194:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[000000002c12efa1] (21)++ 00000800:00000200:51.0F:1644346383.770283:0:170147:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[000000002c12efa1] (22)++ 00000800:00000200:51.0:1644346383.770292:0:170147:0:(o2iblnd_cb.c:75:kiblnd_tx_done()) conn[000000002c12efa1] (23)-- 00000400:00000200:51.0:1644346383.770294:0:170147:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:51.0:1644346383.770296:0:170147:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.54@o2ib100: PUT: OK 00000400:00000200:51.0:1644346383.770298:0:170147:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000089285504 00000800:00000200:51.0:1644346383.770301:0:170147:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000002c12efa1] (22)-- 00000800:00000200:51.0:1644346383.770302:0:170147:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000002c12efa1] (21)-- 00000400:00000200:27.0:1644346384.679273:0:170149:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.8@o2ib100, cpt = 1 00000400:00000200:27.0:1644346384.679281:0:170149:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.8@o2ib100: 0 00000400:00000200:27.0:1644346384.679283:0:170149:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:27.0:1644346384.679284:0:170149:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:27.0:1644346384.679287:0:170149:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.8@o2ib100 NID 172.19.2.8@o2ib100: 0. pending discovery 00000400:00000200:6.0:1644346384.679336:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:6.0:1644346384.679343:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.8@o2ib100(00000000670b5934) state 0x36060 00000400:00000200:6.0:1644346384.679354:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.8@o2ib100 00000400:00000200:6.0:1644346384.679359:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.8@o2ib100 local destination 00000400:00000200:6.0:1644346384.679364:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.8@o2ib100 00000400:00000200:6.0:1644346384.679370:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.8@o2ib100(172.19.2.8@o2ib100:172.19.2.8@o2ib100) : GET try# 0 00000800:00000200:6.0:1644346384.679375:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.8@o2ib100 00000800:00000200:6.0:1644346384.679380:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000005241c541] -> 172.19.2.8@o2ib100 (2) version: 0 00000800:00000200:6.0:1644346384.679383:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000005241c541] -> 172.19.2.8@o2ib100 (2) version: 0 00000400:00000200:6.0:1644346384.679385:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.8@o2ib100 00000400:00000200:6.0:1644346384.679387:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.8@o2ib100(00000000670b5934) state 0x34260 rc 0 00000400:00000080:6.0:1644346386.259394:0:3524063:0:(module.c:207:libcfs_ioctl()) libcfs ioctl cmd 3221775678 00000400:00000200:6.0:1644346386.259402:0:3524063:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.1.138@o2ib100 00000400:00000200:6.0:1644346386.259407:0:3524063:0:(peer.c:286:lnet_peer_alloc()) 00000000628dedb2 nid 172.19.1.138@o2ib100 00000400:00000200:6.0:1644346386.259409:0:3524063:0:(peer.c:221:lnet_peer_net_alloc()) 00000000cf56c444 net o2ib100 00000400:00000200:6.0:1644346386.259413:0:3524063:0:(peer.c:203:lnet_peer_ni_alloc()) 0000000081d7c651 nid 172.19.1.138@o2ib100 00000400:00000200:6.0:1644346386.259415:0:3524063:0:(peer.c:1312:lnet_peer_attach_peer_ni()) peer 172.19.1.138@o2ib100 NID 172.19.1.138@o2ib100 flags 0x0 00000400:00000200:6.0:1644346386.259416:0:3524063:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.1.138@o2ib100 local destination 00000400:00000200:6.0:1644346386.259419:0:3524063:0:(lib-move.c:1595:lnet_get_best_ni()) compare ni 172.19.1.137@o2ib100 [c:237, d:21, s:19] with best_ni not seleced [c:-2147483648, d:-1, s:0] 00000400:00000200:6.0:1644346386.259421:0:3524063:0:(lib-move.c:1638:lnet_get_best_ni()) selected best_ni 172.19.1.137@o2ib100 00000400:00000200:6.0:1644346386.259422:0:3524063:0:(lib-move.c:1846:lnet_set_non_mr_pref_nid()) Setting preferred local NID 172.19.1.137@o2ib100 on NMR peer 172.19.1.138@o2ib100 00000400:00000200:6.0:1644346386.259423:0:3524063:0:(peer.c:941:lnet_peer_ni_set_non_mr_pref_nid()) peer 172.19.1.138@o2ib100 nid 172.19.1.137@o2ib100: 0 00000400:00000200:6.0:1644346386.259425:0:3524063:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.1.138@o2ib100 00000400:00000200:6.0:1644346386.259427:0:3524063:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.1.138@o2ib100(172.19.1.138@o2ib100:172.19.1.138@o2ib100) : GET try# 0 00000800:00000200:6.0:1644346386.259430:0:3524063:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.1.138@o2ib100 00000800:00000200:6.0:1644346386.259438:0:3524063:0:(o2iblnd_cb.c:1607:kiblnd_launch_tx()) peer_ni[0000000038151d2b] -> 172.19.1.138@o2ib100 (1)++ 00000800:00000200:6.0:1644346386.259442:0:3524063:0:(o2iblnd_cb.c:1414:kiblnd_connect_peer()) peer_ni[0000000038151d2b] -> 172.19.1.138@o2ib100 (2)++ 00000800:00000200:6.0:1644346386.259478:0:3524063:0:(o2iblnd_cb.c:1338:kiblnd_resolve_addr_cap()) bound to port 1023 00000800:00000200:6.0:1644346386.259480:0:3524063:0:(o2iblnd_cb.c:1614:kiblnd_launch_tx()) peer_ni[0000000038151d2b] -> 172.19.1.138@o2ib100 (3)-- 00000800:00000200:12.0F:1644346386.259986:0:3523200:0:(o2iblnd_cb.c:3167:kiblnd_cm_callback()) 172.19.1.138@o2ib100 Addr resolved: 0 00000800:00000200:12.0:1644346386.260012:0:3523200:0:(o2iblnd_cb.c:3184:kiblnd_cm_callback()) 172.19.1.138@o2ib100: connection bound to san0:172.19.1.137:mlx5_0 00000400:00000200:6.0:1644346387.303285:0:3524063:0:(lib-md.c:65:lnet_md_unlink()) Queueing unlink of md 00000000a1d695ea 00000800:00000400:5.0:1644346388.135277:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.1.133@o2ib100: 2 seconds 00000400:00000200:5.0:1644346388.135281:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346388.135287:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.133@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346388.135290:0:170138:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:5.0:1644346388.135292:0:170138:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.1.133@o2ib100: 1 00000800:00000400:5.0:1644346388.135302:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.1.135@o2ib100: 68 seconds 00000400:00000200:5.0:1644346388.135304:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346388.135308:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.135@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346388.135310:0:170138:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:5.0:1644346388.135312:0:170138:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.1.135@o2ib100: 1 00000400:00000200:6.0:1644346388.135348:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:6.0:1644346388.135355:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.133@o2ib100(00000000f1da7397) state 0x4860 00000400:00000200:6.0:1644346388.135359:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000035d3df28 00000400:00000200:6.0:1644346388.135361:0:170148:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:6.0:1644346388.135366:0:170148:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.1.133@o2ib100:-110 00000400:00000200:6.0:1644346388.135368:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.133@o2ib100(00000000f1da7397) state 0x6060 rc -110 00000400:00000200:6.0:1644346388.135371:0:170148:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.1.133@o2ib100: -110 00000400:00000200:6.0:1644346388.135374:0:170148:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.1.133@o2ib100 00000400:00000200:6.0:1644346388.135376:0:170148:0:(lib-msg.c:1012:lnet_is_health_check()) msg 00000000ede65857 not committed for send or receive 00000400:00000200:6.0:1644346388.135378:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000040338653 00000100:00000200:6.0:1644346388.135382:0:170148:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@0000000062c0a132 x1723495536589696/t0(0) o250->MGC172.19.1.133@o2ib100@172.19.1.133@o2ib100:26/25 lens 520/544 e 0 to 0 dl 1644346405 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000400:00000200:6.0:1644346388.135409:0:170148:0:(lib-msg.c:1012:lnet_is_health_check()) msg 0000000029bf1577 not committed for send or receive 00000400:00000200:6.0:1644346388.135409:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000002fe0b4a4 00000100:00000200:6.0:1644346388.135411:0:170148:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@00000000ce85899f x1723495536589760/t0(0) o38->lflood-MDT0000-lwp-OST0000@172.19.1.133@o2ib100:12/10 lens 520/544 e 0 to 0 dl 1644346405 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000400:00000200:6.0:1644346388.135414:0:170148:0:(lib-msg.c:1012:lnet_is_health_check()) msg 00000000f669b1bc not committed for send or receive 00000400:00000200:6.0:1644346388.135415:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000055195240 00000100:00000200:6.0:1644346388.135416:0:170148:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@00000000ce223206 x1723495536589824/t0(0) o38->lflood-MDT0001-lwp-OST0000@172.19.1.133@o2ib100:12/10 lens 520/544 e 0 to 0 dl 1644346405 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000400:00000200:4.0:1644346388.135417:0:170152:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000413f6aa6 00000400:00000200:6.0:1644346388.135419:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.135@o2ib100(0000000024b6c9c7) state 0x4860 00000400:00000200:6.0:1644346388.135420:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000dccbd790 00000400:00000200:6.0:1644346388.135420:0:170148:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000100:00000200:4.0:1644346388.135421:0:170152:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@0000000062c0a132 x1723495536589696/t0(0) o250->MGC172.19.1.133@o2ib100@172.19.1.133@o2ib100:26/25 lens 520/544 e 0 to 1 dl 1644346405 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000400:00000200:6.0:1644346388.135422:0:170148:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.1.135@o2ib100:-110 00000400:00000200:6.0:1644346388.135422:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.135@o2ib100(0000000024b6c9c7) state 0x6060 rc -110 00000400:00000200:6.0:1644346388.135423:0:170148:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.1.135@o2ib100: -110 00000400:00000200:6.0:1644346388.135423:0:170148:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.1.135@o2ib100 00000400:00000200:6.0:1644346388.135424:0:170148:0:(lib-msg.c:1012:lnet_is_health_check()) msg 000000007b26eb65 not committed for send or receive 00000400:00000200:6.0:1644346388.135425:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000ec5fae42 00000100:00000200:6.0:1644346388.135426:0:170148:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@0000000038566c6b x1723495536590144/t0(0) o38->lflood-MDT0002-lwp-OST0000@172.19.1.135@o2ib100:12/10 lens 520/544 e 0 to 0 dl 1644346431 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:4.0:1644346388.135426:0:170152:0:(events.c:122:reply_in_callback()) @@@ unlink req@0000000062c0a132 x1723495536589696/t0(0) o250->MGC172.19.1.133@o2ib100@172.19.1.133@o2ib100:26/25 lens 520/544 e 0 to 1 dl 1644346405 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000400:00000200:6.0:1644346388.135430:0:170148:0:(lib-msg.c:1012:lnet_is_health_check()) msg 00000000fcdc8872 not committed for send or receive 00000400:00000200:6.0:1644346388.135431:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000c035e627 00000100:00000200:6.0:1644346388.135432:0:170148:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@00000000feb52d0a x1723495536590208/t0(0) o38->lflood-MDT0003-lwp-OST0000@172.19.1.135@o2ib100:12/10 lens 520/544 e 0 to 0 dl 1644346431 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000400:00000200:4.0:1644346388.135441:0:170152:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000007a905fcc 00000100:00000200:4.0:1644346388.135445:0:170152:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@00000000feb52d0a x1723495536590208/t0(0) o38->lflood-MDT0003-lwp-OST0000@172.19.1.135@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346431 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:4.0:1644346388.135448:0:170152:0:(events.c:122:reply_in_callback()) @@@ unlink req@00000000feb52d0a x1723495536590208/t0(0) o38->lflood-MDT0003-lwp-OST0000@172.19.1.135@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346431 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000400:00000200:4.0:1644346388.135453:0:170152:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000b107933f 00000100:00000200:4.0:1644346388.135454:0:170152:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@0000000038566c6b x1723495536590144/t0(0) o38->lflood-MDT0002-lwp-OST0000@172.19.1.135@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346431 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:4.0:1644346388.135456:0:170152:0:(events.c:122:reply_in_callback()) @@@ unlink req@0000000038566c6b x1723495536590144/t0(0) o38->lflood-MDT0002-lwp-OST0000@172.19.1.135@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346431 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000400:00000200:4.0:1644346388.135459:0:170152:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000081bf2296 00000100:00000200:4.0:1644346388.135460:0:170152:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@00000000ce85899f x1723495536589760/t0(0) o38->lflood-MDT0000-lwp-OST0000@172.19.1.133@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346405 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:4.0:1644346388.135462:0:170152:0:(events.c:122:reply_in_callback()) @@@ unlink req@00000000ce85899f x1723495536589760/t0(0) o38->lflood-MDT0000-lwp-OST0000@172.19.1.133@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346405 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000400:00000200:4.0:1644346388.135466:0:170152:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000abbb2c53 00000100:00000200:4.0:1644346388.135467:0:170152:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@00000000ce223206 x1723495536589824/t0(0) o38->lflood-MDT0001-lwp-OST0000@172.19.1.133@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346405 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:4.0:1644346388.135469:0:170152:0:(events.c:122:reply_in_callback()) @@@ unlink req@00000000ce223206 x1723495536589824/t0(0) o38->lflood-MDT0001-lwp-OST0000@172.19.1.133@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346405 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000100:00000200:4.0:1644346388.135485:0:170152:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495536590272, portal 25 00000100:00000200:4.0:1644346388.135487:0:170152:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 26, xid 1723495536590272, offset 0 00000400:00000200:4.0:1644346388.135489:0:170152:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.134@o2ib100 00000400:00000200:4.0:1644346388.135494:0:170152:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.134@o2ib100: 0 00000400:00000200:4.0:1644346388.135495:0:170152:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:4.0:1644346388.135496:0:170152:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:4.0:1644346388.135496:0:170152:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.134@o2ib100 NID 172.19.1.134@o2ib100: 0. pending discovery 00000400:00000200:4.0:1644346388.135497:0:170152:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 000000003999f1a9 delayed. 172.19.1.134@o2ib100 pending discovery 00000400:00000200:6.0:1644346388.135499:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:6.0:1644346388.135500:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.134@o2ib100(000000007e54abf4) state 0x6060 00000100:00000200:4.0:1644346388.135501:0:170152:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495536590336, portal 10 00000100:00000200:4.0:1644346388.135511:0:170152:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 12, xid 1723495536590336, offset 0 00000400:00000200:6.0:1644346388.135512:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.1.134@o2ib100 00000400:00000200:4.0:1644346388.135514:0:170152:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.136@o2ib100 00000400:00000200:4.0:1644346388.135517:0:170152:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.136@o2ib100: 0 00000400:00000200:4.0:1644346388.135519:0:170152:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:6.0:1644346388.135520:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.1.134@o2ib100 local destination 00000400:00000200:6.0:1644346388.135523:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.1.134@o2ib100 00000400:00000200:6.0:1644346388.135525:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.1.134@o2ib100(172.19.1.134@o2ib100:172.19.1.134@o2ib100) : GET try# 0 00000800:00000200:6.0:1644346388.135526:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.1.134@o2ib100 00000400:00000200:4.0:1644346388.135526:0:170152:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:4.0:1644346388.135527:0:170152:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.136@o2ib100 NID 172.19.1.136@o2ib100: 0. pending discovery 00000400:00000200:4.0:1644346388.135528:0:170152:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 00000000f159e7ee delayed. 172.19.1.136@o2ib100 pending discovery 00000100:00000200:4.0:1644346388.135529:0:170152:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495536590400, portal 10 00000800:00000200:6.0:1644346388.135531:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000035a33187] -> 172.19.1.134@o2ib100 (2) version: 0 00000100:00000200:4.0:1644346388.135531:0:170152:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 12, xid 1723495536590400, offset 0 00000800:00000200:6.0:1644346388.135532:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000035a33187] -> 172.19.1.134@o2ib100 (2) version: 0 00000400:00000200:4.0:1644346388.135532:0:170152:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.136@o2ib100 00000400:00000200:6.0:1644346388.135533:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.1.134@o2ib100 00000400:00000200:4.0:1644346388.135533:0:170152:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.136@o2ib100: -114 00000400:00000200:6.0:1644346388.135534:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.134@o2ib100(000000007e54abf4) state 0x4260 rc 0 00000400:00000200:4.0:1644346388.135534:0:170152:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:4.0:1644346388.135534:0:170152:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:4.0:1644346388.135535:0:170152:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.136@o2ib100 NID 172.19.1.136@o2ib100: 0. pending discovery 00000400:00000200:4.0:1644346388.135535:0:170152:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 00000000e6ca7a77 delayed. 172.19.1.136@o2ib100 pending discovery 00000400:00000200:6.0:1644346388.135537:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.136@o2ib100(00000000d39439d1) state 0x6060 00000100:00000200:4.0:1644346388.135537:0:170152:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495536590464, portal 10 00000100:00000200:4.0:1644346388.135539:0:170152:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 12, xid 1723495536590464, offset 0 00000400:00000200:6.0:1644346388.135540:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.1.136@o2ib100 00000400:00000200:4.0:1644346388.135541:0:170152:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.134@o2ib100 00000400:00000200:6.0:1644346388.135542:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.1.136@o2ib100 local destination 00000400:00000200:6.0:1644346388.135544:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.1.136@o2ib100 00000400:00000200:6.0:1644346388.135546:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.1.136@o2ib100(172.19.1.136@o2ib100:172.19.1.136@o2ib100) : GET try# 0 00000800:00000200:6.0:1644346388.135547:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.1.136@o2ib100 00000400:00000200:4.0:1644346388.135547:0:170152:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.134@o2ib100: -114 00000400:00000200:4.0:1644346388.135547:0:170152:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:4.0:1644346388.135548:0:170152:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:4.0:1644346388.135548:0:170152:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.134@o2ib100 NID 172.19.1.134@o2ib100: 0. pending discovery 00000400:00000200:4.0:1644346388.135549:0:170152:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 00000000b1e07f77 delayed. 172.19.1.134@o2ib100 pending discovery 00000800:00000200:6.0:1644346388.135551:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000e6cd4648] -> 172.19.1.136@o2ib100 (2) version: 0 00000100:00000200:4.0:1644346388.135551:0:170152:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495536590528, portal 10 00000800:00000200:6.0:1644346388.135552:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000e6cd4648] -> 172.19.1.136@o2ib100 (2) version: 0 00000100:00000200:4.0:1644346388.135552:0:170152:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 12, xid 1723495536590528, offset 0 00000400:00000200:6.0:1644346388.135553:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.1.136@o2ib100 00000400:00000200:6.0:1644346388.135554:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.136@o2ib100(00000000d39439d1) state 0x4260 rc 0 00000400:00000200:4.0:1644346388.135554:0:170152:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.134@o2ib100 00000400:00000200:4.0:1644346388.135555:0:170152:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.134@o2ib100: -114 00000400:00000200:4.0:1644346388.135555:0:170152:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:4.0:1644346388.135556:0:170152:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:4.0:1644346388.135557:0:170152:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.134@o2ib100 NID 172.19.1.134@o2ib100: 0. pending discovery 00000400:00000200:4.0:1644346388.135557:0:170152:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 000000000710ee5f delayed. 172.19.1.134@o2ib100 pending discovery 00000800:00000400:5.0:1644346389.159288:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.1.136@o2ib100: 69 seconds 00000400:00000200:5.0:1644346389.159291:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346389.159296:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.136@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346389.159299:0:170138:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:5.0:1644346389.159301:0:170138:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.1.136@o2ib100: 1 00000400:00000200:6.0:1644346389.159311:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:6.0:1644346389.159317:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.136@o2ib100(00000000d39439d1) state 0x4860 00000400:00000200:6.0:1644346389.159321:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000035d3df28 00000400:00000200:6.0:1644346389.159323:0:170148:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:6.0:1644346389.159329:0:170148:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.1.136@o2ib100:-110 00000400:00000200:6.0:1644346389.159332:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.136@o2ib100(00000000d39439d1) state 0x6060 rc -110 00000400:00000200:6.0:1644346389.159334:0:170148:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.1.136@o2ib100: -110 00000400:00000200:6.0:1644346389.159337:0:170148:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.1.136@o2ib100 00000400:00000200:6.0:1644346389.159338:0:170148:0:(lib-msg.c:1012:lnet_is_health_check()) msg 00000000f159e7ee not committed for send or receive 00000400:00000200:6.0:1644346389.159340:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000007a905fcc 00000100:00000200:6.0:1644346389.159344:0:170148:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@0000000000583bf0 x1723495536590336/t0(0) o38->lflood-MDT0003-lwp-OST0000@172.19.1.136@o2ib100:12/10 lens 520/544 e 0 to 0 dl 1644346443 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000400:00000200:6.0:1644346389.159356:0:170148:0:(lib-msg.c:1012:lnet_is_health_check()) msg 00000000e6ca7a77 not committed for send or receive 00000400:00000200:6.0:1644346389.159357:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000006da18aa2 00000100:00000200:6.0:1644346389.159360:0:170148:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@00000000fedda9a1 x1723495536590400/t0(0) o38->lflood-MDT0002-lwp-OST0000@172.19.1.136@o2ib100:12/10 lens 520/544 e 0 to 0 dl 1644346443 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000400:00000200:4.0:1644346389.159410:0:170152:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000b107933f 00000100:00000200:4.0:1644346389.159416:0:170152:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@0000000000583bf0 x1723495536590336/t0(0) o38->lflood-MDT0003-lwp-OST0000@172.19.1.136@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346443 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:4.0:1644346389.159426:0:170152:0:(events.c:122:reply_in_callback()) @@@ unlink req@0000000000583bf0 x1723495536590336/t0(0) o38->lflood-MDT0003-lwp-OST0000@172.19.1.136@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346443 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000400:00000200:4.0:1644346389.159445:0:170152:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000413f6aa6 00000100:00000200:4.0:1644346389.159448:0:170152:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@00000000fedda9a1 x1723495536590400/t0(0) o38->lflood-MDT0002-lwp-OST0000@172.19.1.136@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346443 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:4.0:1644346389.159455:0:170152:0:(events.c:122:reply_in_callback()) @@@ unlink req@00000000fedda9a1 x1723495536590400/t0(0) o38->lflood-MDT0002-lwp-OST0000@172.19.1.136@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346443 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000100:00000200:4.0:1644346389.159479:0:170152:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495536590592, portal 10 00000100:00000200:4.0:1644346389.159483:0:170152:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 12, xid 1723495536590592, offset 0 00000400:00000200:4.0:1644346389.159487:0:170152:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.135@o2ib100 00000400:00000200:4.0:1644346389.159494:0:170152:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.135@o2ib100: 0 00000400:00000200:4.0:1644346389.159496:0:170152:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:4.0:1644346389.159497:0:170152:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:4.0:1644346389.159499:0:170152:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.135@o2ib100 NID 172.19.1.135@o2ib100: 0. pending discovery 00000400:00000200:4.0:1644346389.159502:0:170152:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 00000000f958fa12 delayed. 172.19.1.135@o2ib100 pending discovery 00000400:00000200:6.0:1644346389.159504:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:6.0:1644346389.159507:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.135@o2ib100(0000000024b6c9c7) state 0x6060 00000100:00000200:4.0:1644346389.159510:0:170152:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495536590656, portal 10 00000400:00000200:6.0:1644346389.159514:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.1.135@o2ib100 00000100:00000200:4.0:1644346389.159516:0:170152:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 12, xid 1723495536590656, offset 0 00000400:00000200:6.0:1644346389.159518:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.1.135@o2ib100 local destination 00000400:00000200:4.0:1644346389.159519:0:170152:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.135@o2ib100 00000400:00000200:6.0:1644346389.159523:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.1.135@o2ib100 00000400:00000200:6.0:1644346389.159528:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.1.135@o2ib100(172.19.1.135@o2ib100:172.19.1.135@o2ib100) : GET try# 0 00000400:00000200:4.0:1644346389.159532:0:170152:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.135@o2ib100: -114 00000800:00000200:6.0:1644346389.159533:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.1.135@o2ib100 00000400:00000200:4.0:1644346389.159534:0:170152:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:4.0:1644346389.159535:0:170152:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:4.0:1644346389.159537:0:170152:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.135@o2ib100 NID 172.19.1.135@o2ib100: 0. pending discovery 00000400:00000200:4.0:1644346389.159539:0:170152:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 0000000043d74501 delayed. 172.19.1.135@o2ib100 pending discovery 00000800:00000200:6.0:1644346389.159543:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000c3c4eac4] -> 172.19.1.135@o2ib100 (2) version: 0 00000800:00000200:6.0:1644346389.159546:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000c3c4eac4] -> 172.19.1.135@o2ib100 (2) version: 0 00000400:00000200:6.0:1644346389.159548:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.1.135@o2ib100 00000400:00000200:6.0:1644346389.159550:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.135@o2ib100(0000000024b6c9c7) state 0x4260 rc 0 00000800:00000400:5.0:1644346391.143292:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.1@o2ib100: 693663 seconds 00000400:00000200:5.0:1644346391.143297:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346391.143302:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.1@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346391.143306:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346391.143309:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.1@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346391.143312:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.1@o2ib100) recovery failed with -110 00000800:00000400:5.0:1644346391.143319:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.2@o2ib100: 10 seconds 00000800:00000400:5.0:1644346391.143321:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.2@o2ib100: 60 seconds 00000400:00000200:5.0:1644346391.143326:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346391.143330:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.2@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346391.143331:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346391.143333:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.2@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346391.143336:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.2@o2ib100) recovery failed with -110 00000400:00000200:5.0:1644346391.143338:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346391.143341:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.2@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346391.143343:0:170138:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:5.0:1644346391.143345:0:170138:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.2@o2ib100: 1 00000800:00000400:5.0:1644346391.143353:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.4@o2ib100: 693663 seconds 00000400:00000200:5.0:1644346391.143355:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346391.143358:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.4@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346391.143360:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346391.143361:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.4@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346391.143363:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.4@o2ib100) recovery failed with -110 00000400:00000200:6.0:1644346391.143402:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:6.0:1644346391.143408:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.2@o2ib100(000000001da20d30) state 0x34860 00000400:00000200:6.0:1644346391.143412:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000a4228d20 00000400:00000200:6.0:1644346391.143414:0:170148:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:6.0:1644346391.143419:0:170148:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.2@o2ib100:-110 00000400:00000200:6.0:1644346391.143421:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.2@o2ib100(000000001da20d30) state 0x36060 rc -110 00000400:00000200:6.0:1644346391.143424:0:170148:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.2@o2ib100: -110 00000400:00000200:6.0:1644346391.143426:0:170148:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.2@o2ib100 00000800:00000200:30.2:1644346391.513908:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[000000002c12efa1] (20)++ 00000800:00000200:55.0:1644346391.513972:0:170145:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[000000002c12efa1] (21)++ 00000800:00000200:55.0:1644346391.513985:0:170145:0:(o2iblnd_cb.c:343:kiblnd_handle_rx()) Received d1[1] from 172.19.1.54@o2ib100 00000400:00000200:55.0:1644346391.513992:0:170145:0:(lib-move.c:4287:lnet_parse()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100) <- 192.168.128.103@o2ib18 : PUT - for me 00000400:00000200:55.0:1644346391.514000:0:170145:0:(lib-ptl.c:571:lnet_ptl_match_md()) Request from 12345-192.168.128.103@o2ib18 of length 224 into portal 28 MB=0x61afc2a846200 00000400:00000200:55.0:1644346391.514007:0:170145:0:(lib-ptl.c:200:lnet_try_match_md()) Incoming put index 1c from 12345-192.168.128.103@o2ib18 of length 224/224 into md 0x524f29 [8] + 7616 00000400:00000200:55.0:1644346391.514011:0:170145:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:55.0:1644346391.514015:0:170145:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.54@o2ib100: PUT: OK 00000100:00000200:55.0:1644346391.514018:0:170145:0:(events.c:313:request_in_callback()) event type 2, status 0, service ost 00000800:00000200:55.0:1644346391.514028:0:170145:0:(o2iblnd_cb.c:205:kiblnd_post_rx()) conn[000000002c12efa1] (22)++ 00000800:00000200:55.0:1644346391.514031:0:170145:0:(o2iblnd_cb.c:239:kiblnd_post_rx()) conn[000000002c12efa1] (23)-- 00000800:00000200:53.0:1644346391.514033:0:170146:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000002c12efa1] (23)-- 00000800:00000200:55.0:1644346391.514037:0:170145:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000002c12efa1] (21)-- 00000100:00000200:43.0:1644346391.514110:0:2689286:0:(service.c:2304:ptlrpc_server_handle_request()) got req 1718520207663616 00010000:00000200:43.0:1644346391.514126:0:2689286:0:(ldlm_lib.c:3215:target_send_reply_msg()) @@@ sending reply req@00000000245484de x1718520207663616/t0(0) o400->e941be7c-6bba-b5a3-5d49-5e2cdc2d2e99@192.168.128.103@o2ib18:242/0 lens 224/224 e 0 to 0 dl 1644346452 ref 1 fl Interpret:H/0/0 rc 0/0 job:'kworker/52:0.0' 00000100:00000200:43.0:1644346391.514138:0:2689286:0:(niobuf.c:87:ptl_send_buf()) Sending 224 bytes to portal 4, xid 1718520207663616, offset 224 00000400:00000200:43.0:1644346391.514144:0:2689286:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-192.168.128.103@o2ib18 00000400:00000200:43.0:1644346391.514150:0:2689286:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source Specified: 172.19.1.137@o2ib100 to NMR: 192.168.128.103@o2ib18 routed destination 00000400:00000200:43.0:1644346391.514154:0:2689286:0:(lib-move.c:2014:lnet_handle_find_routed_path()) using src nid 172.19.1.137@o2ib100 for route restriction 00000400:00000200:43.0:1644346391.514158:0:2689286:0:(lib-move.c:1336:lnet_select_peer_ni()) 172.19.1.137@o2ib100 ni_is_pref = 1 00000400:00000200:43.0:1644346391.514160:0:2689286:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 192.168.128.103@o2ib18 00000400:00000200:43.0:1644346391.514162:0:2689286:0:(lib-move.c:1474:lnet_find_route_locked()) Looking up a route to o2ib18, from o2ib100 00000400:00000200:43.0:1644346391.514166:0:2689286:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 172.19.1.54@o2ib100 00000400:00000200:43.0:1644346391.514181:0:2689286:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 172.19.1.55@o2ib100 00000400:00000200:43.0:1644346391.514185:0:2689286:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:172.19.1.137@o2ib100) -> 192.168.128.103@o2ib18(192.168.128.103@o2ib18:172.19.1.55@o2ib100) : PUT try# 0 00000800:00000200:43.0:1644346391.514188:0:2689286:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 224 bytes in 1 frags to 12345-172.19.1.55@o2ib100 00000800:00000200:43.0:1644346391.514191:0:2689286:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000b495b0db] -> 172.19.1.55@o2ib100 (3) version: 12 00000800:00000200:43.0:1644346391.514193:0:2689286:0:(o2iblnd_cb.c:1519:kiblnd_launch_tx()) conn[000000003720f48b] (20)++ 00000800:00000200:43.0:1644346391.514194:0:2689286:0:(o2iblnd_cb.c:1265:kiblnd_queue_tx_locked()) conn[000000003720f48b] (21)++ 00000800:00000200:43.0:1644346391.514211:0:2689286:0:(o2iblnd_cb.c:1525:kiblnd_launch_tx()) conn[000000003720f48b] (22)-- 00000800:00000200:56.2:1644346391.514265:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[000000003720f48b] (21)++ 00000800:00000200:51.0:1644346391.514328:0:170147:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[000000003720f48b] (22)++ 00000800:00000200:51.0:1644346391.514339:0:170147:0:(o2iblnd_cb.c:75:kiblnd_tx_done()) conn[000000003720f48b] (23)-- 00000400:00000200:51.0:1644346391.514342:0:170147:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:51.0:1644346391.514348:0:170147:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.55@o2ib100: PUT: OK 00000400:00000200:51.0:1644346391.514351:0:170147:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000000009a273 00000800:00000200:51.0:1644346391.514356:0:170147:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000003720f48b] (22)-- 00000800:00000200:51.0:1644346391.514359:0:170147:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000003720f48b] (21)-- 00000400:00000200:27.0:1644346391.847291:0:170149:0:(router.c:1231:lnet_check_routers()) discover 172.19.1.54@o2ib100, cpt = 1 00000400:00000200:27.0:1644346391.847299:0:170149:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.54@o2ib100: 0 00000400:00000200:27.0:1644346391.847301:0:170149:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:27.0:1644346391.847302:0:170149:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:27.0:1644346391.847305:0:170149:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.54@o2ib100 NID 172.19.1.54@o2ib100: 0. pending discovery 00000400:00000200:27.0:1644346391.847307:0:170149:0:(router.c:1231:lnet_check_routers()) discover 172.19.1.55@o2ib100, cpt = 1 00000400:00000200:27.0:1644346391.847310:0:170149:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.55@o2ib100: 0 00000400:00000200:27.0:1644346391.847311:0:170149:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:27.0:1644346391.847312:0:170149:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:27.0:1644346391.847314:0:170149:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.55@o2ib100 NID 172.19.1.55@o2ib100: 0. pending discovery 00000400:00000200:27.0:1644346391.847317:0:170149:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.1@o2ib100, cpt = 1 00000400:00000200:27.0:1644346391.847319:0:170149:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.1@o2ib100: 0 00000400:00000200:27.0:1644346391.847320:0:170149:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:27.0:1644346391.847321:0:170149:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:27.0:1644346391.847323:0:170149:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.1@o2ib100 NID 172.19.2.1@o2ib100: 0. pending discovery 00000400:00000200:27.0:1644346391.847326:0:170149:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.2@o2ib100, cpt = 1 00000400:00000200:27.0:1644346391.847328:0:170149:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.2@o2ib100: 0 00000400:00000200:27.0:1644346391.847329:0:170149:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:27.0:1644346391.847334:0:170149:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:27.0:1644346391.847336:0:170149:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.2@o2ib100 NID 172.19.2.2@o2ib100: 0. pending discovery 00000400:00000200:27.0:1644346391.847339:0:170149:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.4@o2ib100, cpt = 1 00000400:00000200:27.0:1644346391.847341:0:170149:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.4@o2ib100: 0 00000400:00000200:27.0:1644346391.847342:0:170149:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:27.0:1644346391.847342:0:170149:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:27.0:1644346391.847344:0:170149:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.4@o2ib100 NID 172.19.2.4@o2ib100: 0. pending discovery 00000400:00000200:27.0:1644346391.847347:0:170149:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.6@o2ib100, cpt = 1 00000400:00000200:6.0:1644346391.847347:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:27.0:1644346391.847350:0:170149:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.6@o2ib100: 0 00000400:00000200:27.0:1644346391.847351:0:170149:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:27.0:1644346391.847372:0:170149:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:27.0:1644346391.847373:0:170149:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.6@o2ib100 NID 172.19.2.6@o2ib100: 0. pending discovery 00000400:00000200:27.0:1644346391.847374:0:170149:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.7@o2ib100, cpt = 1 00000400:00000200:27.0:1644346391.847377:0:170149:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.7@o2ib100: 0 00000400:00000200:6.0:1644346391.847377:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.54@o2ib100(00000000ff93ee65) state 0x36056 00000400:00000200:27.0:1644346391.847378:0:170149:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:27.0:1644346391.847378:0:170149:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:27.0:1644346391.847379:0:170149:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.7@o2ib100 NID 172.19.2.7@o2ib100: 0. pending discovery 00000400:00000200:27.0:1644346391.847380:0:170149:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.9@o2ib100, cpt = 1 00000400:00000200:27.0:1644346391.847381:0:170149:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.9@o2ib100: 0 00000400:00000200:27.0:1644346391.847381:0:170149:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:27.0:1644346391.847382:0:170149:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:27.0:1644346391.847383:0:170149:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.9@o2ib100 NID 172.19.2.9@o2ib100: 0. pending discovery 00000400:00000200:27.0:1644346391.847385:0:170149:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.12@o2ib100, cpt = 1 00000400:00000200:6.0:1644346391.847385:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.1.54@o2ib100 00000400:00000200:27.0:1644346391.847386:0:170149:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.12@o2ib100: 0 00000400:00000200:27.0:1644346391.847387:0:170149:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:27.0:1644346391.847387:0:170149:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:27.0:1644346391.847388:0:170149:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.12@o2ib100 NID 172.19.2.12@o2ib100: 0. pending discovery 00000400:00000200:6.0:1644346391.847390:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.1.54@o2ib100 local destination 00000400:00000200:6.0:1644346391.847392:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.1.54@o2ib100 00000400:00000200:6.0:1644346391.847395:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.1.54@o2ib100(172.19.1.54@o2ib100:172.19.1.54@o2ib100) : GET try# 0 00000800:00000200:6.0:1644346391.847396:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.1.54@o2ib100 00000400:00000200:27.0:1644346391.847398:0:170149:0:(lib-md.c:65:lnet_md_unlink()) Queueing unlink of md 00000000a86d8c8c 00000800:00000200:6.0:1644346391.847399:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000fd8faaad] -> 172.19.1.54@o2ib100 (3) version: 12 00000400:00000200:27.0:1644346391.847400:0:170149:0:(lib-move.c:3055:lnet_finalize_expired_responses()) Response timeout: md = 00000000a86d8c8c: nid = 172.19.2.8@o2ib100 00000800:00000200:6.0:1644346391.847400:0:170148:0:(o2iblnd_cb.c:1519:kiblnd_launch_tx()) conn[000000002c12efa1] (20)++ 00000800:00000200:6.0:1644346391.847401:0:170148:0:(o2iblnd_cb.c:1265:kiblnd_queue_tx_locked()) conn[000000002c12efa1] (21)++ 00000400:00000200:27.0:1644346391.847404:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000c0d74cb9 00000400:00000200:27.0:1644346391.847405:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000800:00000200:6.0:1644346391.847405:0:170148:0:(o2iblnd_cb.c:1525:kiblnd_launch_tx()) conn[000000002c12efa1] (22)-- 00000400:00000200:6.0:1644346391.847406:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.1.54@o2ib100 00000400:00000200:27.0:1644346391.847407:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.4@o2ib100 recovery ping unlinked 00000400:00000200:6.0:1644346391.847407:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.54@o2ib100(00000000ff93ee65) state 0x34256 rc 0 00000400:00000200:6.0:1644346391.847410:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.55@o2ib100(000000006a0e6dce) state 0x36056 00000400:00000200:27.0:1644346391.847413:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.4@o2ib100 00000400:00000200:6.0:1644346391.847414:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.1.55@o2ib100 00000400:00000200:27.0:1644346391.847415:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.4@o2ib100 local destination 00000400:00000200:6.0:1644346391.847416:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.1.55@o2ib100 local destination 00000400:00000200:27.0:1644346391.847420:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.4@o2ib100 00000400:00000200:6.0:1644346391.847420:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.1.55@o2ib100 00000400:00000200:27.0:1644346391.847427:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.4@o2ib100(172.19.2.4@o2ib100:172.19.2.4@o2ib100) : GET try# 0 00000400:00000200:6.0:1644346391.847427:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.1.55@o2ib100(172.19.1.55@o2ib100:172.19.1.55@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346391.847430:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.4@o2ib100 00000800:00000200:6.0:1644346391.847430:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.1.55@o2ib100 00000800:00000200:6.0:1644346391.847432:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000b495b0db] -> 172.19.1.55@o2ib100 (3) version: 12 00000800:00000200:6.0:1644346391.847433:0:170148:0:(o2iblnd_cb.c:1519:kiblnd_launch_tx()) conn[000000003720f48b] (20)++ 00000800:00000200:27.0:1644346391.847434:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000004a830e13] -> 172.19.2.4@o2ib100 (2) version: 0 00000800:00000200:6.0:1644346391.847434:0:170148:0:(o2iblnd_cb.c:1265:kiblnd_queue_tx_locked()) conn[000000003720f48b] (21)++ 00000800:00000200:27.0:1644346391.847435:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000004a830e13] -> 172.19.2.4@o2ib100 (2) version: 0 00000800:00000200:6.0:1644346391.847436:0:170148:0:(o2iblnd_cb.c:1525:kiblnd_launch_tx()) conn[000000003720f48b] (22)-- 00000400:00000200:27.0:1644346391.847437:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000920aee3e 00000400:00000200:6.0:1644346391.847437:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.1.55@o2ib100 00000400:00000200:27.0:1644346391.847438:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:6.0:1644346391.847438:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.55@o2ib100(000000006a0e6dce) state 0x34256 rc 0 00000400:00000200:27.0:1644346391.847439:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.2@o2ib100 recovery ping unlinked 00000400:00000200:6.0:1644346391.847440:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.1@o2ib100(0000000017e41955) state 0x36060 00000400:00000200:27.0:1644346391.847444:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.2@o2ib100 00000400:00000200:6.0:1644346391.847445:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.1@o2ib100 00000800:00000200:56.2:1644346391.847446:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[000000003720f48b] (21)++ 00000400:00000200:27.0:1644346391.847446:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.2@o2ib100 local destination 00000400:00000200:6.0:1644346391.847447:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.1@o2ib100 local destination 00000400:00000200:27.0:1644346391.847448:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.2@o2ib100 00000400:00000200:6.0:1644346391.847450:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.1@o2ib100 00000400:00000200:27.0:1644346391.847452:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.2@o2ib100(172.19.2.2@o2ib100:172.19.2.2@o2ib100) : GET try# 0 00000800:00000200:30.2:1644346391.847454:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[000000002c12efa1] (21)++ 00000800:00000200:27.0:1644346391.847454:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.2@o2ib100 00000400:00000200:6.0:1644346391.847455:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.1@o2ib100(172.19.2.1@o2ib100:172.19.2.1@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346391.847457:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000c9068579] -> 172.19.2.2@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346391.847458:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000c9068579] -> 172.19.2.2@o2ib100 (2) version: 0 00000800:00000200:6.0:1644346391.847458:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.1@o2ib100 00000400:00000200:27.0:1644346391.847460:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000009c215f09 00000400:00000200:27.0:1644346391.847460:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346391.847461:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.1@o2ib100 recovery ping unlinked 00000800:00000200:53.0:1644346391.847463:0:170146:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[000000003720f48b] (22)++ 00000800:00000200:6.0:1644346391.847464:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000b7e97ead] -> 172.19.2.1@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346391.847465:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.1@o2ib100 00000800:00000200:6.0:1644346391.847465:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000b7e97ead] -> 172.19.2.1@o2ib100 (2) version: 0 00000400:00000200:6.0:1644346391.847466:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.1@o2ib100 00000400:00000200:27.0:1644346391.847467:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.1@o2ib100 local destination 00000400:00000200:6.0:1644346391.847467:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.1@o2ib100(0000000017e41955) state 0x34260 rc 0 00000400:00000200:27.0:1644346391.847469:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.1@o2ib100 00000400:00000200:27.0:1644346391.847471:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.1@o2ib100(172.19.2.1@o2ib100:172.19.2.1@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346391.847472:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.1@o2ib100 00000400:00000200:6.0:1644346391.847473:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.2@o2ib100(000000001da20d30) state 0x36060 00000800:00000200:27.0:1644346391.847475:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000b7e97ead] -> 172.19.2.1@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346391.847476:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000b7e97ead] -> 172.19.2.1@o2ib100 (2) version: 0 00000400:00000200:6.0:1644346391.847477:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.2@o2ib100 00000800:00000200:53.0:1644346391.847478:0:170146:0:(o2iblnd_cb.c:75:kiblnd_tx_done()) conn[000000003720f48b] (23)-- 00000400:00000200:6.0:1644346391.847478:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.2@o2ib100 local destination 00000400:00000200:53.0:1644346391.847479:0:170146:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:6.0:1644346391.847480:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.2@o2ib100 00000400:00000200:53.0:1644346391.847483:0:170146:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.55@o2ib100: GET: OK 00000400:00000200:53.0:1644346391.847484:0:170146:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:6.0:1644346391.847484:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.2@o2ib100(172.19.2.2@o2ib100:172.19.2.2@o2ib100) : GET try# 0 00000400:00000200:53.0:1644346391.847486:0:170146:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.1.55@o2ib100: 0 00000800:00000200:6.0:1644346391.847486:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.2@o2ib100 00000800:00000200:53.0:1644346391.847490:0:170146:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000003720f48b] (22)-- 00000800:00000200:6.0:1644346391.847490:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000c9068579] -> 172.19.2.2@o2ib100 (2) version: 0 00000800:00000200:6.0:1644346391.847491:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000c9068579] -> 172.19.2.2@o2ib100 (2) version: 0 00000800:00000200:53.0:1644346391.847492:0:170146:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[000000002c12efa1] (22)++ 00000400:00000200:6.0:1644346391.847492:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.2@o2ib100 00000400:00000200:6.0:1644346391.847492:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.2@o2ib100(000000001da20d30) state 0x34260 rc 0 00000400:00000200:6.0:1644346391.847494:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.4@o2ib100(000000005d5e71b7) state 0x36060 00000800:00000200:53.0:1644346391.847495:0:170146:0:(o2iblnd_cb.c:75:kiblnd_tx_done()) conn[000000002c12efa1] (23)-- 00000400:00000200:53.0:1644346391.847496:0:170146:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:6.0:1644346391.847497:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.4@o2ib100 00000400:00000200:53.0:1644346391.847498:0:170146:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.54@o2ib100: GET: OK 00000400:00000200:6.0:1644346391.847498:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.4@o2ib100 local destination 00000400:00000200:53.0:1644346391.847499:0:170146:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:53.0:1644346391.847501:0:170146:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.1.54@o2ib100: 0 00000400:00000200:6.0:1644346391.847501:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.4@o2ib100 00000400:00000200:6.0:1644346391.847503:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.4@o2ib100(172.19.2.4@o2ib100:172.19.2.4@o2ib100) : GET try# 0 00000800:00000200:6.0:1644346391.847505:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.4@o2ib100 00000800:00000200:53.0:1644346391.847506:0:170146:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000002c12efa1] (22)-- 00000800:00000200:53.0:1644346391.847507:0:170146:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000003720f48b] (21)-- 00000800:00000200:53.0:1644346391.847508:0:170146:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000002c12efa1] (21)-- 00000800:00000200:6.0:1644346391.847508:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000004a830e13] -> 172.19.2.4@o2ib100 (2) version: 0 00000800:00000200:6.0:1644346391.847509:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000004a830e13] -> 172.19.2.4@o2ib100 (2) version: 0 00000400:00000200:6.0:1644346391.847510:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.4@o2ib100 00000400:00000200:6.0:1644346391.847510:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.4@o2ib100(000000005d5e71b7) state 0x34260 rc 0 00000400:00000200:6.0:1644346391.847511:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.6@o2ib100(0000000096bb0739) state 0x36060 00000400:00000200:6.0:1644346391.847514:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.6@o2ib100 00000400:00000200:6.0:1644346391.847515:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.6@o2ib100 local destination 00000400:00000200:6.0:1644346391.847515:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.6@o2ib100 00000400:00000200:6.0:1644346391.847517:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.6@o2ib100(172.19.2.6@o2ib100:172.19.2.6@o2ib100) : GET try# 0 00000800:00000200:6.0:1644346391.847518:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.6@o2ib100 00000800:00000200:6.0:1644346391.847521:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000006f0e0d3a] -> 172.19.2.6@o2ib100 (2) version: 0 00000800:00000200:6.0:1644346391.847522:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000006f0e0d3a] -> 172.19.2.6@o2ib100 (2) version: 0 00000400:00000200:6.0:1644346391.847523:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.6@o2ib100 00000400:00000200:6.0:1644346391.847523:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.6@o2ib100(0000000096bb0739) state 0x34260 rc 0 00000400:00000200:6.0:1644346391.847524:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.7@o2ib100(000000002fde759c) state 0x36060 00000400:00000200:6.0:1644346391.847527:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.7@o2ib100 00000400:00000200:6.0:1644346391.847528:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.7@o2ib100 local destination 00000400:00000200:6.0:1644346391.847528:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.7@o2ib100 00000400:00000200:6.0:1644346391.847531:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.7@o2ib100(172.19.2.7@o2ib100:172.19.2.7@o2ib100) : GET try# 0 00000800:00000200:6.0:1644346391.847532:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.7@o2ib100 00000800:00000200:6.0:1644346391.847533:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000b34b27af] -> 172.19.2.7@o2ib100 (2) version: 0 00000800:00000200:6.0:1644346391.847533:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000b34b27af] -> 172.19.2.7@o2ib100 (2) version: 0 00000400:00000200:6.0:1644346391.847534:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.7@o2ib100 00000400:00000200:6.0:1644346391.847534:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.7@o2ib100(000000002fde759c) state 0x34260 rc 0 00000400:00000200:6.0:1644346391.847535:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.9@o2ib100(000000008a76668c) state 0x36060 00000400:00000200:6.0:1644346391.847538:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.9@o2ib100 00000400:00000200:6.0:1644346391.847539:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.9@o2ib100 local destination 00000400:00000200:6.0:1644346391.847540:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.9@o2ib100 00000400:00000200:6.0:1644346391.847541:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.9@o2ib100(172.19.2.9@o2ib100:172.19.2.9@o2ib100) : GET try# 0 00000800:00000200:6.0:1644346391.847542:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.9@o2ib100 00000800:00000200:6.0:1644346391.847543:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f182ce9e] -> 172.19.2.9@o2ib100 (2) version: 0 00000800:00000200:6.0:1644346391.847544:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f182ce9e] -> 172.19.2.9@o2ib100 (2) version: 0 00000400:00000200:6.0:1644346391.847545:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.9@o2ib100 00000400:00000200:6.0:1644346391.847545:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.9@o2ib100(000000008a76668c) state 0x34260 rc 0 00000400:00000200:6.0:1644346391.847546:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.12@o2ib100(0000000061ad14f1) state 0x36060 00000400:00000200:6.0:1644346391.847549:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.12@o2ib100 00000800:00000200:30.2:1644346391.847550:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[000000002c12efa1] (20)++ 00000400:00000200:6.0:1644346391.847550:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.12@o2ib100 local destination 00000400:00000200:6.0:1644346391.847551:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.12@o2ib100 00000400:00000200:6.0:1644346391.847553:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.12@o2ib100(172.19.2.12@o2ib100:172.19.2.12@o2ib100) : GET try# 0 00000800:00000200:6.0:1644346391.847555:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.12@o2ib100 00000800:00000200:50.0F:1644346391.847556:0:170144:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[000000002c12efa1] (21)++ 00000800:00000200:6.0:1644346391.847556:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a77c653d] -> 172.19.2.12@o2ib100 (2) version: 0 00000800:00000200:6.0:1644346391.847557:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a77c653d] -> 172.19.2.12@o2ib100 (2) version: 0 00000400:00000200:6.0:1644346391.847558:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.12@o2ib100 00000400:00000200:6.0:1644346391.847558:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.12@o2ib100(0000000061ad14f1) state 0x34260 rc 0 00000800:00000200:50.0:1644346391.847578:0:170144:0:(o2iblnd_cb.c:343:kiblnd_handle_rx()) Received d1[1] from 172.19.1.54@o2ib100 00000800:00000200:51.0:1644346391.847580:0:170147:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000002c12efa1] (22)-- 00000400:00000200:50.0:1644346391.847581:0:170144:0:(lib-move.c:4287:lnet_parse()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100) <- 172.19.1.54@o2ib100 : REPLY - for me 00000400:00000200:50.0:1644346391.847586:0:170144:0:(lib-move.c:4115:lnet_parse_reply()) 172.19.1.137@o2ib100: Reply from 12345-172.19.1.54@o2ib100 of length 64/64 into md 0x585761 00000400:00000200:50.0:1644346391.847588:0:170144:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:50.0:1644346391.847589:0:170144:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.54@o2ib100: REPLY: OK 00000400:00000200:50.0:1644346391.847590:0:170144:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000a4228d20 00000400:00000200:50.0:1644346391.847592:0:170144:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 3 00000400:00000200:50.0:1644346391.847593:0:170144:0:(peer.c:2351:lnet_discovery_event_reply()) Peer 172.19.1.54@o2ib100 has discovery disabled 00000400:00000200:50.0:1644346391.847594:0:170144:0:(peer.c:2374:lnet_discovery_event_reply()) peer 172.19.1.54@o2ib100(00000000ff93ee65) not MR: DD disabled remotely 00000400:00000200:50.0:1644346391.847595:0:170144:0:(peer.c:2432:lnet_discovery_event_reply()) peer 172.19.1.54@o2ib100 data present 0. state = 0x34256 00000400:00000200:50.0:1644346391.847596:0:170144:0:(router.c:457:lnet_router_discovery_ping_reply()) Discovery is disabled. Processing reply for gw: 172.19.1.54@o2ib100:3 00000800:00000200:50.0:1644346391.847601:0:170144:0:(o2iblnd_cb.c:205:kiblnd_post_rx()) conn[000000002c12efa1] (21)++ 00000800:00000200:50.0:1644346391.847603:0:170144:0:(o2iblnd_cb.c:239:kiblnd_post_rx()) conn[000000002c12efa1] (22)-- 00000800:00000200:50.0:1644346391.847603:0:170144:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000002c12efa1] (21)-- 00000400:00000200:6.0:1644346391.847605:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:6.0:1644346391.847607:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.54@o2ib100(00000000ff93ee65) state 0x340d6 00000400:00000200:6.0:1644346391.847611:0:170148:0:(peer.c:2727:lnet_peer_merge_data()) peer 172.19.1.54@o2ib100 (00000000ff93ee65): 0 00000400:00000200:6.0:1644346391.847612:0:170148:0:(peer.c:2922:lnet_peer_data_present()) peer 172.19.1.54@o2ib100(00000000ff93ee65): 0. state = 0x34156 00000400:00000200:6.0:1644346391.847612:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.54@o2ib100(00000000ff93ee65) state 0x34156 rc 1 00000800:00000200:56.2:1644346391.847613:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[000000003720f48b] (20)++ 00000400:00000200:6.0:1644346391.847613:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.54@o2ib100(00000000ff93ee65) state 0x34156 00000400:00000200:6.0:1644346391.847614:0:170148:0:(peer.c:3086:lnet_peer_discovered()) peer 172.19.1.54@o2ib100 00000400:00000200:6.0:1644346391.847615:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.54@o2ib100(00000000ff93ee65) state 0x30116 rc 0 00000400:00000200:6.0:1644346391.847615:0:170148:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.1.54@o2ib100 00000800:00000200:55.0:1644346391.847619:0:170145:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[000000003720f48b] (21)++ 00000800:00000200:55.0:1644346391.847625:0:170145:0:(o2iblnd_cb.c:343:kiblnd_handle_rx()) Received d1[2] from 172.19.1.55@o2ib100 00000800:00000200:53.0:1644346391.847626:0:170146:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000003720f48b] (22)-- 00000400:00000200:55.0:1644346391.847628:0:170145:0:(lib-move.c:4287:lnet_parse()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100) <- 172.19.1.55@o2ib100 : REPLY - for me 00000400:00000200:55.0:1644346391.847633:0:170145:0:(lib-move.c:4115:lnet_parse_reply()) 172.19.1.137@o2ib100: Reply from 12345-172.19.1.55@o2ib100 of length 64/64 into md 0x585769 00000400:00000200:55.0:1644346391.847635:0:170145:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:55.0:1644346391.847636:0:170145:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.55@o2ib100: REPLY: OK 00000400:00000200:55.0:1644346391.847637:0:170145:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000c5de0d4d 00000400:00000200:55.0:1644346391.847639:0:170145:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 3 00000400:00000200:55.0:1644346391.847640:0:170145:0:(peer.c:2351:lnet_discovery_event_reply()) Peer 172.19.1.55@o2ib100 has discovery disabled 00000400:00000200:55.0:1644346391.847641:0:170145:0:(peer.c:2374:lnet_discovery_event_reply()) peer 172.19.1.55@o2ib100(000000006a0e6dce) not MR: DD disabled remotely 00000400:00000200:55.0:1644346391.847642:0:170145:0:(peer.c:2432:lnet_discovery_event_reply()) peer 172.19.1.55@o2ib100 data present 0. state = 0x34256 00000400:00000200:55.0:1644346391.847643:0:170145:0:(router.c:457:lnet_router_discovery_ping_reply()) Discovery is disabled. Processing reply for gw: 172.19.1.55@o2ib100:3 00000800:00000200:55.0:1644346391.847648:0:170145:0:(o2iblnd_cb.c:205:kiblnd_post_rx()) conn[000000003720f48b] (21)++ 00000800:00000200:55.0:1644346391.847650:0:170145:0:(o2iblnd_cb.c:239:kiblnd_post_rx()) conn[000000003720f48b] (22)-- 00000800:00000200:55.0:1644346391.847650:0:170145:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000003720f48b] (21)-- 00000400:00000200:6.0:1644346391.847651:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:6.0:1644346391.847653:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.55@o2ib100(000000006a0e6dce) state 0x340d6 00000400:00000200:6.0:1644346391.847655:0:170148:0:(peer.c:2727:lnet_peer_merge_data()) peer 172.19.1.55@o2ib100 (000000006a0e6dce): 0 00000400:00000200:6.0:1644346391.847655:0:170148:0:(peer.c:2922:lnet_peer_data_present()) peer 172.19.1.55@o2ib100(000000006a0e6dce): 0. state = 0x34156 00000400:00000200:6.0:1644346391.847656:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.55@o2ib100(000000006a0e6dce) state 0x34156 rc 1 00000400:00000200:6.0:1644346391.847657:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.55@o2ib100(000000006a0e6dce) state 0x34156 00000400:00000200:6.0:1644346391.847657:0:170148:0:(peer.c:3086:lnet_peer_discovered()) peer 172.19.1.55@o2ib100 00000400:00000200:6.0:1644346391.847658:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.55@o2ib100(000000006a0e6dce) state 0x30116 rc 0 00000400:00000200:6.0:1644346391.847658:0:170148:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.1.55@o2ib100 00000800:00000400:5.0:1644346392.167285:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.5@o2ib100: 44 seconds 00000400:00000200:5.0:1644346392.167290:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346392.167295:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.5@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346392.167298:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346392.167301:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.5@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346392.167305:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.5@o2ib100) recovery failed with -110 00000800:00000400:5.0:1644346392.167310:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.6@o2ib100: 693664 seconds 00000800:00000400:5.0:1644346392.167312:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.6@o2ib100: 693664 seconds 00000400:00000200:5.0:1644346392.167314:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346392.167317:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.6@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346392.167319:0:170138:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:5.0:1644346392.167322:0:170138:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.6@o2ib100: 1 00000400:00000200:5.0:1644346392.167328:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346392.167331:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.6@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346392.167332:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346392.167337:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.6@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346392.167340:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.6@o2ib100) recovery failed with -110 00000800:00000400:5.0:1644346392.167343:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.7@o2ib100: 693664 seconds 00000800:00000400:5.0:1644346392.167345:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.7@o2ib100: 693664 seconds 00000400:00000200:5.0:1644346392.167346:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346392.167348:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.7@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346392.167350:0:170138:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:5.0:1644346392.167351:0:170138:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.7@o2ib100: 1 00000400:00000200:5.0:1644346392.167354:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346392.167356:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.7@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346392.167358:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346392.167360:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.7@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346392.167361:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.7@o2ib100) recovery failed with -110 00000800:00000400:5.0:1644346392.167364:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.8@o2ib100: 1 seconds 00000400:00000200:5.0:1644346392.167366:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346392.167368:0:170138:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000a86d8c8c 00000400:00000200:5.0:1644346392.167370:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346392.167372:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.8@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346392.167374:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.8@o2ib100) recovery failed with -110 00000400:00000200:6.0:1644346392.167379:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000800:00000400:5.0:1644346392.167379:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.9@o2ib100: 693664 seconds 00000800:00000400:5.0:1644346392.167381:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.9@o2ib100: 693664 seconds 00000400:00000200:5.0:1644346392.167382:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:6.0:1644346392.167384:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.6@o2ib100(0000000096bb0739) state 0x34860 00000400:00000200:5.0:1644346392.167385:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.9@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346392.167386:0:170138:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:6.0:1644346392.167388:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000ded1e15a 00000400:00000200:5.0:1644346392.167388:0:170138:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.9@o2ib100: 1 00000400:00000200:6.0:1644346392.167391:0:170148:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:5.0:1644346392.167392:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:6.0:1644346392.167395:0:170148:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.6@o2ib100:-110 00000400:00000200:5.0:1644346392.167395:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.9@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346392.167397:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:6.0:1644346392.167398:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.6@o2ib100(0000000096bb0739) state 0x36060 rc -110 00000400:00000200:5.0:1644346392.167399:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.9@o2ib100 recovery message sent unsuccessfully:-110 00000400:00000200:6.0:1644346392.167401:0:170148:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.6@o2ib100: -110 00000400:00000200:6.0:1644346392.167403:0:170148:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.6@o2ib100 00000400:00000200:6.0:1644346392.167409:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.7@o2ib100(000000002fde759c) state 0x34860 00000400:00020000:5.0:1644346392.167409:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.9@o2ib100) recovery failed with -110 00000400:00000200:6.0:1644346392.167411:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000ad500f49 00000400:00000200:6.0:1644346392.167412:0:170148:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000800:00000400:5.0:1644346392.167412:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.10@o2ib100: 107 seconds 00000400:00000200:6.0:1644346392.167414:0:170148:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.7@o2ib100:-110 00000400:00000200:6.0:1644346392.167416:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.7@o2ib100(000000002fde759c) state 0x36060 rc -110 00000400:00000200:5.0:1644346392.167417:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:6.0:1644346392.167418:0:170148:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.7@o2ib100: -110 00000400:00000200:6.0:1644346392.167420:0:170148:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.7@o2ib100 00000400:00000200:5.0:1644346392.167421:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.10@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:6.0:1644346392.167423:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.9@o2ib100(000000008a76668c) state 0x34860 00000400:00000200:5.0:1644346392.167423:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:6.0:1644346392.167424:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000120a6f15 00000400:00000200:6.0:1644346392.167425:0:170148:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:5.0:1644346392.167425:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.10@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346392.167427:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.10@o2ib100) recovery failed with -110 00000400:00000200:6.0:1644346392.167428:0:170148:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.9@o2ib100:-110 00000400:00000200:6.0:1644346392.167429:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.9@o2ib100(000000008a76668c) state 0x36060 rc -110 00000800:00000400:5.0:1644346392.167431:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.11@o2ib100: 272 seconds 00000400:00000200:6.0:1644346392.167445:0:170148:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.9@o2ib100: -110 00000400:00000200:6.0:1644346392.167445:0:170148:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.9@o2ib100 00000400:00000200:5.0:1644346392.167445:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346392.167447:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.11@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346392.167448:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346392.167448:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.11@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346392.167449:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.11@o2ib100) recovery failed with -110 00000800:00000400:5.0:1644346392.167450:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.12@o2ib100: 693664 seconds 00000800:00000400:5.0:1644346392.167451:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.12@o2ib100: 693664 seconds 00000400:00000200:5.0:1644346392.167451:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346392.167452:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.12@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346392.167453:0:170138:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:5.0:1644346392.167453:0:170138:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.12@o2ib100: 1 00000400:00000200:5.0:1644346392.167455:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346392.167456:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.12@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:6.0:1644346392.167457:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:5.0:1644346392.167457:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346392.167458:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.12@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346392.167458:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.12@o2ib100) recovery failed with -110 00000400:00000200:6.0:1644346392.167459:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.12@o2ib100(0000000061ad14f1) state 0x34860 00000400:00000200:6.0:1644346392.167460:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000640f740a 00000400:00000200:6.0:1644346392.167460:0:170148:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:6.0:1644346392.167461:0:170148:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.12@o2ib100:-110 00000400:00000200:6.0:1644346392.167461:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.12@o2ib100(0000000061ad14f1) state 0x36060 rc -110 00000400:00000200:6.0:1644346392.167462:0:170148:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.12@o2ib100: -110 00000400:00000200:6.0:1644346392.167463:0:170148:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.12@o2ib100 00000400:00000200:27.0:1644346392.871306:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000044e51676 00000400:00000200:27.0:1644346392.871310:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346392.871314:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.12@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346392.871323:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.12@o2ib100 00000400:00000200:27.0:1644346392.871327:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.12@o2ib100 local destination 00000400:00000200:27.0:1644346392.871334:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.12@o2ib100 00000400:00000200:27.0:1644346392.871340:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.12@o2ib100(172.19.2.12@o2ib100:172.19.2.12@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346392.871345:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.12@o2ib100 00000800:00000200:27.0:1644346392.871350:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a77c653d] -> 172.19.2.12@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346392.871353:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a77c653d] -> 172.19.2.12@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346392.871355:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000057bab685 00000400:00000200:27.0:1644346392.871357:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346392.871359:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.11@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346392.871363:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.11@o2ib100 00000400:00000200:27.0:1644346392.871365:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.11@o2ib100 local destination 00000400:00000200:27.0:1644346392.871368:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.11@o2ib100 00000400:00000200:27.0:1644346392.871374:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.11@o2ib100(172.19.2.11@o2ib100:172.19.2.11@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346392.871377:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.11@o2ib100 00000800:00000200:27.0:1644346392.871381:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f801af1f] -> 172.19.2.11@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346392.871383:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f801af1f] -> 172.19.2.11@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346392.871385:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000c6557ad2 00000400:00000200:27.0:1644346392.871386:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346392.871388:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.10@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346392.871392:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.10@o2ib100 00000400:00000200:27.0:1644346392.871394:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.10@o2ib100 local destination 00000400:00000200:27.0:1644346392.871396:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.10@o2ib100 00000400:00000200:27.0:1644346392.871401:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.10@o2ib100(172.19.2.10@o2ib100:172.19.2.10@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346392.871404:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.10@o2ib100 00000800:00000200:27.0:1644346392.871406:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000acd7a784] -> 172.19.2.10@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346392.871408:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000acd7a784] -> 172.19.2.10@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346392.871411:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000bcab80a7 00000400:00000200:27.0:1644346392.871412:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346392.871414:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.9@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346392.871418:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.9@o2ib100 00000400:00000200:27.0:1644346392.871420:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.9@o2ib100 local destination 00000400:00000200:27.0:1644346392.871423:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.9@o2ib100 00000400:00000200:27.0:1644346392.871428:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.9@o2ib100(172.19.2.9@o2ib100:172.19.2.9@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346392.871431:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.9@o2ib100 00000800:00000200:27.0:1644346392.871434:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f182ce9e] -> 172.19.2.9@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346392.871436:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f182ce9e] -> 172.19.2.9@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346392.871438:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000004045eb92 00000400:00000200:27.0:1644346392.871439:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346392.871441:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.7@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346392.871445:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.7@o2ib100 00000400:00000200:27.0:1644346392.871447:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.7@o2ib100 local destination 00000400:00000200:27.0:1644346392.871449:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.7@o2ib100 00000400:00000200:27.0:1644346392.871454:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.7@o2ib100(172.19.2.7@o2ib100:172.19.2.7@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346392.871457:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.7@o2ib100 00000800:00000200:27.0:1644346392.871459:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000b34b27af] -> 172.19.2.7@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346392.871461:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000b34b27af] -> 172.19.2.7@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346392.871463:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000042525ce2 00000400:00000200:27.0:1644346392.871464:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346392.871466:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.6@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346392.871470:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.6@o2ib100 00000400:00000200:27.0:1644346392.871472:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.6@o2ib100 local destination 00000400:00000200:27.0:1644346392.871474:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.6@o2ib100 00000400:00000200:27.0:1644346392.871479:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.6@o2ib100(172.19.2.6@o2ib100:172.19.2.6@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346392.871482:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.6@o2ib100 00000800:00000200:27.0:1644346392.871484:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000006f0e0d3a] -> 172.19.2.6@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346392.871486:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000006f0e0d3a] -> 172.19.2.6@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346392.871488:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000015445d5e 00000400:00000200:27.0:1644346392.871489:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346392.871490:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.5@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346392.871494:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.5@o2ib100 00000400:00000200:27.0:1644346392.871496:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.5@o2ib100 local destination 00000400:00000200:27.0:1644346392.871499:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.5@o2ib100 00000400:00000200:27.0:1644346392.871504:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.5@o2ib100(172.19.2.5@o2ib100:172.19.2.5@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346392.871507:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.5@o2ib100 00000800:00000200:27.0:1644346392.871509:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000055082794] -> 172.19.2.5@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346392.871511:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000055082794] -> 172.19.2.5@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346392.871516:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.8@o2ib100 00000400:00000200:27.0:1644346392.871518:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.8@o2ib100 local destination 00000400:00000200:27.0:1644346392.871520:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.8@o2ib100 00000400:00000200:27.0:1644346392.871524:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.8@o2ib100(172.19.2.8@o2ib100:172.19.2.8@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346392.871527:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.8@o2ib100 00000800:00000200:27.0:1644346392.871530:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000005241c541] -> 172.19.2.8@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346392.871531:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000005241c541] -> 172.19.2.8@o2ib100 (2) version: 0 00000800:00000200:30.2:1644346396.569887:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[000000002c12efa1] (20)++ 00000800:00000200:51.0:1644346396.569950:0:170147:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[000000002c12efa1] (21)++ 00000800:00000200:51.0:1644346396.569962:0:170147:0:(o2iblnd_cb.c:343:kiblnd_handle_rx()) Received d1[0] from 172.19.1.54@o2ib100 00000400:00000200:51.0:1644346396.569968:0:170147:0:(lib-move.c:4287:lnet_parse()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100) <- 192.168.128.103@o2ib18 : PUT - for me 00000400:00000200:51.0:1644346396.569976:0:170147:0:(lib-ptl.c:571:lnet_ptl_match_md()) Request from 12345-192.168.128.103@o2ib18 of length 224 into portal 28 MB=0x61afc2a8467c0 00000400:00000200:51.0:1644346396.569982:0:170147:0:(lib-ptl.c:200:lnet_try_match_md()) Incoming put index 1c from 12345-192.168.128.103@o2ib18 of length 224/224 into md 0x524f29 [8] + 7840 00000400:00000200:51.0:1644346396.569987:0:170147:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:51.0:1644346396.569991:0:170147:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.54@o2ib100: PUT: OK 00000100:00000200:51.0:1644346396.569994:0:170147:0:(events.c:313:request_in_callback()) event type 2, status 0, service ost 00000800:00000200:51.0:1644346396.570003:0:170147:0:(o2iblnd_cb.c:205:kiblnd_post_rx()) conn[000000002c12efa1] (22)++ 00000800:00000200:51.0:1644346396.570006:0:170147:0:(o2iblnd_cb.c:239:kiblnd_post_rx()) conn[000000002c12efa1] (23)-- 00000800:00000200:51.0:1644346396.570008:0:170147:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000002c12efa1] (22)-- 00000800:00000200:51.0:1644346396.570010:0:170147:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000002c12efa1] (21)-- 00000100:00000200:43.0:1644346396.570084:0:2689286:0:(service.c:2304:ptlrpc_server_handle_request()) got req 1718520207665088 00010000:00000200:43.0:1644346396.570100:0:2689286:0:(ldlm_lib.c:3215:target_send_reply_msg()) @@@ sending reply req@000000002623f4ce x1718520207665088/t0(0) o400->e941be7c-6bba-b5a3-5d49-5e2cdc2d2e99@192.168.128.103@o2ib18:247/0 lens 224/224 e 0 to 0 dl 1644346457 ref 1 fl Interpret:H/0/0 rc 0/0 job:'kworker/52:0.0' 00000100:00000200:43.0:1644346396.570113:0:2689286:0:(niobuf.c:87:ptl_send_buf()) Sending 224 bytes to portal 4, xid 1718520207665088, offset 224 00000400:00000200:43.0:1644346396.570118:0:2689286:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-192.168.128.103@o2ib18 00000400:00000200:43.0:1644346396.570126:0:2689286:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source Specified: 172.19.1.137@o2ib100 to NMR: 192.168.128.103@o2ib18 routed destination 00000400:00000200:43.0:1644346396.570130:0:2689286:0:(lib-move.c:2014:lnet_handle_find_routed_path()) using src nid 172.19.1.137@o2ib100 for route restriction 00000400:00000200:43.0:1644346396.570133:0:2689286:0:(lib-move.c:1336:lnet_select_peer_ni()) 172.19.1.137@o2ib100 ni_is_pref = 1 00000400:00000200:43.0:1644346396.570136:0:2689286:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 192.168.128.103@o2ib18 00000400:00000200:43.0:1644346396.570138:0:2689286:0:(lib-move.c:1474:lnet_find_route_locked()) Looking up a route to o2ib18, from o2ib100 00000400:00000200:43.0:1644346396.570141:0:2689286:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 172.19.1.54@o2ib100 00000400:00000200:43.0:1644346396.570143:0:2689286:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 172.19.1.55@o2ib100 00000400:00000200:43.0:1644346396.570152:0:2689286:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:172.19.1.137@o2ib100) -> 192.168.128.103@o2ib18(192.168.128.103@o2ib18:172.19.1.54@o2ib100) : PUT try# 0 00000800:00000200:43.0:1644346396.570157:0:2689286:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 224 bytes in 1 frags to 12345-172.19.1.54@o2ib100 00000800:00000200:43.0:1644346396.570163:0:2689286:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000fd8faaad] -> 172.19.1.54@o2ib100 (3) version: 12 00000800:00000200:43.0:1644346396.570166:0:2689286:0:(o2iblnd_cb.c:1519:kiblnd_launch_tx()) conn[000000002c12efa1] (20)++ 00000800:00000200:43.0:1644346396.570168:0:2689286:0:(o2iblnd_cb.c:1265:kiblnd_queue_tx_locked()) conn[000000002c12efa1] (21)++ 00000800:00000200:43.0:1644346396.570175:0:2689286:0:(o2iblnd_cb.c:1525:kiblnd_launch_tx()) conn[000000002c12efa1] (22)-- 00000800:00000200:30.2:1644346396.570240:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[000000002c12efa1] (21)++ 00000800:00000200:53.0:1644346396.570302:0:170146:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[000000002c12efa1] (22)++ 00000800:00000200:53.0:1644346396.570312:0:170146:0:(o2iblnd_cb.c:75:kiblnd_tx_done()) conn[000000002c12efa1] (23)-- 00000400:00000200:53.0:1644346396.570315:0:170146:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:53.0:1644346396.570321:0:170146:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.54@o2ib100: PUT: OK 00000400:00000200:53.0:1644346396.570324:0:170146:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000033d7d05b 00000800:00000200:53.0:1644346396.570332:0:170146:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000002c12efa1] (22)-- 00000800:00000200:53.0:1644346396.570335:0:170146:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000002c12efa1] (21)-- 00000400:00000200:27.0:1644346401.063303:0:170149:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.5@o2ib100, cpt = 1 00000400:00000200:27.0:1644346401.063308:0:170149:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.5@o2ib100: 0 00000400:00000200:27.0:1644346401.063309:0:170149:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:27.0:1644346401.063309:0:170149:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:27.0:1644346401.063310:0:170149:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.5@o2ib100 NID 172.19.2.5@o2ib100: 0. pending discovery 00000400:00000200:6.0:1644346401.063369:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:6.0:1644346401.063376:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.5@o2ib100(0000000021a8d375) state 0x36060 00000400:00000200:6.0:1644346401.063388:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.5@o2ib100 00000400:00000200:6.0:1644346401.063395:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.5@o2ib100 local destination 00000400:00000200:6.0:1644346401.063400:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.5@o2ib100 00000400:00000200:6.0:1644346401.063406:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.5@o2ib100(172.19.2.5@o2ib100:172.19.2.5@o2ib100) : GET try# 0 00000800:00000200:6.0:1644346401.063410:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.5@o2ib100 00000800:00000200:6.0:1644346401.063416:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000055082794] -> 172.19.2.5@o2ib100 (2) version: 0 00000800:00000200:6.0:1644346401.063419:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000055082794] -> 172.19.2.5@o2ib100 (2) version: 0 00000400:00000200:6.0:1644346401.063421:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.5@o2ib100 00000400:00000200:6.0:1644346401.063426:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.5@o2ib100(0000000021a8d375) state 0x34260 rc 0 00000800:00000400:5.0:1644346401.127284:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.1.134@o2ib100: 693673 seconds 00000400:00000200:5.0:1644346401.127289:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346401.127295:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.134@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346401.127298:0:170138:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:5.0:1644346401.127300:0:170138:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.1.134@o2ib100: 1 00000800:00000400:5.0:1644346401.127309:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.1.135@o2ib100: 81 seconds 00000400:00000200:5.0:1644346401.127311:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346401.127316:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.135@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346401.127320:0:170138:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:5.0:1644346401.127322:0:170138:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.1.135@o2ib100: 1 00000400:00000200:6.0:1644346401.127361:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:6.0:1644346401.127367:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.134@o2ib100(000000007e54abf4) state 0x4860 00000400:00000200:6.0:1644346401.127371:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000dccbd790 00000400:00000200:6.0:1644346401.127374:0:170148:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:6.0:1644346401.127378:0:170148:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.1.134@o2ib100:-110 00000400:00000200:6.0:1644346401.127380:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.134@o2ib100(000000007e54abf4) state 0x6060 rc -110 00000400:00000200:6.0:1644346401.127383:0:170148:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.1.134@o2ib100: -110 00000400:00000200:6.0:1644346401.127385:0:170148:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.1.134@o2ib100 00000400:00000200:6.0:1644346401.127387:0:170148:0:(lib-msg.c:1012:lnet_is_health_check()) msg 000000003999f1a9 not committed for send or receive 00000400:00000200:6.0:1644346401.127389:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000081bf2296 00000100:00000200:6.0:1644346401.127394:0:170148:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@00000000f6d877ca x1723495536590272/t0(0) o250->MGC172.19.1.133@o2ib100@172.19.1.134@o2ib100:26/25 lens 520/544 e 0 to 0 dl 1644346443 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000400:00000200:6.0:1644346401.127410:0:170148:0:(lib-msg.c:1012:lnet_is_health_check()) msg 00000000b1e07f77 not committed for send or receive 00000400:00000200:6.0:1644346401.127412:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000006eeb9061 00000100:00000200:6.0:1644346401.127415:0:170148:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@00000000171c429f x1723495536590464/t0(0) o38->lflood-MDT0000-lwp-OST0000@172.19.1.134@o2ib100:12/10 lens 520/544 e 0 to 0 dl 1644346443 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000400:00000200:6.0:1644346401.127422:0:170148:0:(lib-msg.c:1012:lnet_is_health_check()) msg 000000000710ee5f not committed for send or receive 00000400:00000200:6.0:1644346401.127424:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000009b6d32f6 00000100:00000200:6.0:1644346401.127426:0:170148:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@00000000d9e26c79 x1723495536590528/t0(0) o38->lflood-MDT0001-lwp-OST0000@172.19.1.134@o2ib100:12/10 lens 520/544 e 0 to 0 dl 1644346443 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000400:00000200:6.0:1644346401.127433:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.135@o2ib100(0000000024b6c9c7) state 0x4860 00000400:00000200:6.0:1644346401.127435:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000035d3df28 00000400:00000200:6.0:1644346401.127436:0:170148:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:6.0:1644346401.127438:0:170148:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.1.135@o2ib100:-110 00000400:00000200:6.0:1644346401.127440:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.135@o2ib100(0000000024b6c9c7) state 0x6060 rc -110 00000400:00000200:6.0:1644346401.127442:0:170148:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.1.135@o2ib100: -110 00000400:00000200:6.0:1644346401.127444:0:170148:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.1.135@o2ib100 00000400:00000200:6.0:1644346401.127445:0:170148:0:(lib-msg.c:1012:lnet_is_health_check()) msg 00000000f958fa12 not committed for send or receive 00000400:00000200:6.0:1644346401.127446:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000b107933f 00000100:00000200:6.0:1644346401.127449:0:170148:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@0000000001e777b3 x1723495536590592/t0(0) o38->lflood-MDT0003-lwp-OST0000@172.19.1.135@o2ib100:12/10 lens 520/544 e 0 to 0 dl 1644346444 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000400:00000200:6.0:1644346401.127464:0:170148:0:(lib-msg.c:1012:lnet_is_health_check()) msg 0000000043d74501 not committed for send or receive 00000400:00000200:4.0:1644346401.127465:0:170152:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000abbb2c53 00000400:00000200:6.0:1644346401.127468:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000cbe79e01 00000100:00000200:6.0:1644346401.127471:0:170148:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@0000000068d4be4f x1723495536590656/t0(0) o38->lflood-MDT0002-lwp-OST0000@172.19.1.135@o2ib100:12/10 lens 520/544 e 0 to 0 dl 1644346444 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:4.0:1644346401.127472:0:170152:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@00000000f6d877ca x1723495536590272/t0(0) o250->MGC172.19.1.133@o2ib100@172.19.1.134@o2ib100:26/25 lens 520/544 e 0 to 1 dl 1644346443 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:4.0:1644346401.127483:0:170152:0:(events.c:122:reply_in_callback()) @@@ unlink req@00000000f6d877ca x1723495536590272/t0(0) o250->MGC172.19.1.133@o2ib100@172.19.1.134@o2ib100:26/25 lens 520/544 e 0 to 1 dl 1644346443 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000400:00000200:4.0:1644346401.127499:0:170152:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000081fc26e 00000100:00000200:4.0:1644346401.127502:0:170152:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@00000000171c429f x1723495536590464/t0(0) o38->lflood-MDT0000-lwp-OST0000@172.19.1.134@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346443 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:4.0:1644346401.127509:0:170152:0:(events.c:122:reply_in_callback()) @@@ unlink req@00000000171c429f x1723495536590464/t0(0) o38->lflood-MDT0000-lwp-OST0000@172.19.1.134@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346443 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000400:00000200:4.0:1644346401.127519:0:170152:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000ddf50ca8 00000100:00000200:4.0:1644346401.127522:0:170152:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@00000000d9e26c79 x1723495536590528/t0(0) o38->lflood-MDT0001-lwp-OST0000@172.19.1.134@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346443 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:4.0:1644346401.127527:0:170152:0:(events.c:122:reply_in_callback()) @@@ unlink req@00000000d9e26c79 x1723495536590528/t0(0) o38->lflood-MDT0001-lwp-OST0000@172.19.1.134@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346443 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000400:00000200:4.0:1644346401.127548:0:170152:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000001ffba440 00000100:00000200:4.0:1644346401.127550:0:170152:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@0000000068d4be4f x1723495536590656/t0(0) o38->lflood-MDT0002-lwp-OST0000@172.19.1.135@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346444 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:4.0:1644346401.127556:0:170152:0:(events.c:122:reply_in_callback()) @@@ unlink req@0000000068d4be4f x1723495536590656/t0(0) o38->lflood-MDT0002-lwp-OST0000@172.19.1.135@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346444 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000400:00000200:4.0:1644346401.127566:0:170152:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000413f6aa6 00000100:00000200:4.0:1644346401.127568:0:170152:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@0000000001e777b3 x1723495536590592/t0(0) o38->lflood-MDT0003-lwp-OST0000@172.19.1.135@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346444 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:4.0:1644346401.127574:0:170152:0:(events.c:122:reply_in_callback()) @@@ unlink req@0000000001e777b3 x1723495536590592/t0(0) o38->lflood-MDT0003-lwp-OST0000@172.19.1.135@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346444 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000100:00000200:4.0:1644346401.127597:0:170152:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495536590720, portal 25 00000100:00000200:4.0:1644346401.127600:0:170152:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 26, xid 1723495536590720, offset 0 00000400:00000200:4.0:1644346401.127604:0:170152:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.133@o2ib100 00000400:00000200:4.0:1644346401.127612:0:170152:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.133@o2ib100: 0 00000400:00000200:4.0:1644346401.127614:0:170152:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:4.0:1644346401.127615:0:170152:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:4.0:1644346401.127617:0:170152:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.133@o2ib100 NID 172.19.1.133@o2ib100: 0. pending discovery 00000400:00000200:4.0:1644346401.127620:0:170152:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 00000000c78a9a1c delayed. 172.19.1.133@o2ib100 pending discovery 00000400:00000200:6.0:1644346401.127623:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:6.0:1644346401.127625:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.133@o2ib100(00000000f1da7397) state 0x6060 00000100:00000200:4.0:1644346401.127625:0:170152:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495536590784, portal 10 00000100:00000200:4.0:1644346401.127628:0:170152:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 12, xid 1723495536590784, offset 0 00000400:00000200:4.0:1644346401.127631:0:170152:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.133@o2ib100 00000400:00000200:6.0:1644346401.127634:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.1.133@o2ib100 00000400:00000200:4.0:1644346401.127634:0:170152:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.133@o2ib100: -114 00000400:00000200:4.0:1644346401.127635:0:170152:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:4.0:1644346401.127636:0:170152:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:4.0:1644346401.127638:0:170152:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.133@o2ib100 NID 172.19.1.133@o2ib100: 0. pending discovery 00000400:00000200:4.0:1644346401.127642:0:170152:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 000000002a0bea91 delayed. 172.19.1.133@o2ib100 pending discovery 00000400:00000200:6.0:1644346401.127646:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.1.133@o2ib100 local destination 00000400:00000200:6.0:1644346401.127650:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.1.133@o2ib100 00000100:00000200:4.0:1644346401.127650:0:170152:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495536590848, portal 10 00000100:00000200:4.0:1644346401.127652:0:170152:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 12, xid 1723495536590848, offset 0 00000400:00000200:4.0:1644346401.127656:0:170152:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.133@o2ib100 00000400:00000200:6.0:1644346401.127657:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.1.133@o2ib100(172.19.1.133@o2ib100:172.19.1.133@o2ib100) : GET try# 0 00000800:00000200:6.0:1644346401.127661:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.1.133@o2ib100 00000400:00000200:4.0:1644346401.127661:0:170152:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.133@o2ib100: -114 00000400:00000200:4.0:1644346401.127662:0:170152:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:4.0:1644346401.127663:0:170152:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:4.0:1644346401.127665:0:170152:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.133@o2ib100 NID 172.19.1.133@o2ib100: 0. pending discovery 00000400:00000200:4.0:1644346401.127667:0:170152:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 000000009ed887c1 delayed. 172.19.1.133@o2ib100 pending discovery 00000800:00000200:6.0:1644346401.127671:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000002172b4cf] -> 172.19.1.133@o2ib100 (2) version: 0 00000100:00000200:4.0:1644346401.127671:0:170152:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495536590912, portal 10 00000100:00000200:4.0:1644346401.127673:0:170152:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 12, xid 1723495536590912, offset 0 00000800:00000200:6.0:1644346401.127674:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000002172b4cf] -> 172.19.1.133@o2ib100 (2) version: 0 00000400:00000200:6.0:1644346401.127676:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.1.133@o2ib100 00000400:00000200:4.0:1644346401.127676:0:170152:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.136@o2ib100 00000400:00000200:6.0:1644346401.127678:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.133@o2ib100(00000000f1da7397) state 0x4260 rc 0 00000400:00000200:4.0:1644346401.127679:0:170152:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.136@o2ib100: 0 00000400:00000200:4.0:1644346401.127680:0:170152:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:4.0:1644346401.127681:0:170152:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:4.0:1644346401.127683:0:170152:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.136@o2ib100 NID 172.19.1.136@o2ib100: 0. pending discovery 00000400:00000200:4.0:1644346401.127685:0:170152:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 00000000cee34335 delayed. 172.19.1.136@o2ib100 pending discovery 00000400:00000200:6.0:1644346401.127688:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.136@o2ib100(00000000d39439d1) state 0x6060 00000100:00000200:4.0:1644346401.127690:0:170152:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495536590976, portal 10 00000400:00000200:6.0:1644346401.127692:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.1.136@o2ib100 00000100:00000200:4.0:1644346401.127694:0:170152:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 12, xid 1723495536590976, offset 0 00000400:00000200:6.0:1644346401.127696:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.1.136@o2ib100 local destination 00000400:00000200:4.0:1644346401.127697:0:170152:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.136@o2ib100 00000400:00000200:6.0:1644346401.127698:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.1.136@o2ib100 00000400:00000200:6.0:1644346401.127703:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.1.136@o2ib100(172.19.1.136@o2ib100:172.19.1.136@o2ib100) : GET try# 0 00000800:00000200:6.0:1644346401.127706:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.1.136@o2ib100 00000400:00000200:4.0:1644346401.127707:0:170152:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.136@o2ib100: -114 00000400:00000200:4.0:1644346401.127708:0:170152:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:4.0:1644346401.127709:0:170152:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:4.0:1644346401.127711:0:170152:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.136@o2ib100 NID 172.19.1.136@o2ib100: 0. pending discovery 00000400:00000200:4.0:1644346401.127712:0:170152:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 0000000008a80a6d delayed. 172.19.1.136@o2ib100 pending discovery 00000800:00000200:6.0:1644346401.127715:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000e6cd4648] -> 172.19.1.136@o2ib100 (2) version: 0 00000800:00000200:6.0:1644346401.127717:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000e6cd4648] -> 172.19.1.136@o2ib100 (2) version: 0 00000400:00000200:6.0:1644346401.127719:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.1.136@o2ib100 00000400:00000200:6.0:1644346401.127721:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.136@o2ib100(00000000d39439d1) state 0x4260 rc 0 00000800:00000200:56.2:1644346401.753723:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[000000003720f48b] (20)++ 00000800:00000200:50.0:1644346401.753788:0:170144:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[000000003720f48b] (21)++ 00000800:00000200:50.0:1644346401.753801:0:170144:0:(o2iblnd_cb.c:343:kiblnd_handle_rx()) Received d1[0] from 172.19.1.55@o2ib100 00000400:00000200:50.0:1644346401.753807:0:170144:0:(lib-move.c:4287:lnet_parse()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100) <- 192.168.128.103@o2ib18 : PUT - for me 00000400:00000200:50.0:1644346401.753815:0:170144:0:(lib-ptl.c:571:lnet_ptl_match_md()) Request from 12345-192.168.128.103@o2ib18 of length 224 into portal 28 MB=0x61afc2a846d80 00000400:00000200:50.0:1644346401.753821:0:170144:0:(lib-ptl.c:200:lnet_try_match_md()) Incoming put index 1c from 12345-192.168.128.103@o2ib18 of length 224/224 into md 0x524f29 [8] + 8064 00000400:00000200:50.0:1644346401.753826:0:170144:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:50.0:1644346401.753829:0:170144:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.55@o2ib100: PUT: OK 00000100:00000200:50.0:1644346401.753832:0:170144:0:(events.c:313:request_in_callback()) event type 2, status 0, service ost 00000800:00000200:50.0:1644346401.753842:0:170144:0:(o2iblnd_cb.c:205:kiblnd_post_rx()) conn[000000003720f48b] (22)++ 00000800:00000200:50.0:1644346401.753845:0:170144:0:(o2iblnd_cb.c:239:kiblnd_post_rx()) conn[000000003720f48b] (23)-- 00000800:00000200:50.0:1644346401.753846:0:170144:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000003720f48b] (22)-- 00000800:00000200:51.0:1644346401.753849:0:170147:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000003720f48b] (21)-- 00000100:00000200:43.0:1644346401.753923:0:2689286:0:(service.c:2304:ptlrpc_server_handle_request()) got req 1718520207666560 00010000:00000200:43.0:1644346401.753938:0:2689286:0:(ldlm_lib.c:3215:target_send_reply_msg()) @@@ sending reply req@000000008b43dd43 x1718520207666560/t0(0) o400->e941be7c-6bba-b5a3-5d49-5e2cdc2d2e99@192.168.128.103@o2ib18:252/0 lens 224/224 e 0 to 0 dl 1644346462 ref 1 fl Interpret:H/0/0 rc 0/0 job:'kworker/52:0.0' 00000100:00000200:43.0:1644346401.753951:0:2689286:0:(niobuf.c:87:ptl_send_buf()) Sending 224 bytes to portal 4, xid 1718520207666560, offset 224 00000400:00000200:43.0:1644346401.753956:0:2689286:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-192.168.128.103@o2ib18 00000400:00000200:43.0:1644346401.753963:0:2689286:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source Specified: 172.19.1.137@o2ib100 to NMR: 192.168.128.103@o2ib18 routed destination 00000400:00000200:43.0:1644346401.753970:0:2689286:0:(lib-move.c:2014:lnet_handle_find_routed_path()) using src nid 172.19.1.137@o2ib100 for route restriction 00000400:00000200:43.0:1644346401.753973:0:2689286:0:(lib-move.c:1336:lnet_select_peer_ni()) 172.19.1.137@o2ib100 ni_is_pref = 1 00000400:00000200:43.0:1644346401.753976:0:2689286:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 192.168.128.103@o2ib18 00000400:00000200:43.0:1644346401.753977:0:2689286:0:(lib-move.c:1474:lnet_find_route_locked()) Looking up a route to o2ib18, from o2ib100 00000400:00000200:43.0:1644346401.753981:0:2689286:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 172.19.1.54@o2ib100 00000400:00000200:43.0:1644346401.753983:0:2689286:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 172.19.1.55@o2ib100 00000400:00000200:43.0:1644346401.753992:0:2689286:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:172.19.1.137@o2ib100) -> 192.168.128.103@o2ib18(192.168.128.103@o2ib18:172.19.1.55@o2ib100) : PUT try# 0 00000800:00000200:43.0:1644346401.753996:0:2689286:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 224 bytes in 1 frags to 12345-172.19.1.55@o2ib100 00000800:00000200:43.0:1644346401.754001:0:2689286:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000b495b0db] -> 172.19.1.55@o2ib100 (3) version: 12 00000800:00000200:43.0:1644346401.754004:0:2689286:0:(o2iblnd_cb.c:1519:kiblnd_launch_tx()) conn[000000003720f48b] (20)++ 00000800:00000200:43.0:1644346401.754006:0:2689286:0:(o2iblnd_cb.c:1265:kiblnd_queue_tx_locked()) conn[000000003720f48b] (21)++ 00000800:00000200:43.0:1644346401.754012:0:2689286:0:(o2iblnd_cb.c:1525:kiblnd_launch_tx()) conn[000000003720f48b] (22)-- 00000800:00000200:56.2:1644346401.754072:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[000000003720f48b] (21)++ 00000800:00000200:53.0:1644346401.754134:0:170146:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[000000003720f48b] (22)++ 00000800:00000200:53.0:1644346401.754153:0:170146:0:(o2iblnd_cb.c:75:kiblnd_tx_done()) conn[000000003720f48b] (23)-- 00000400:00000200:53.0:1644346401.754155:0:170146:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:53.0:1644346401.754157:0:170146:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.55@o2ib100: PUT: OK 00000400:00000200:53.0:1644346401.754159:0:170146:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000038a2c3cb 00000800:00000200:53.0:1644346401.754162:0:170146:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000003720f48b] (22)-- 00000800:00000200:53.0:1644346401.754164:0:170146:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000003720f48b] (21)-- 00000800:00000400:5.0:1644346404.135302:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.1@o2ib100: 693676 seconds 00000800:00000400:5.0:1644346404.135307:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.1@o2ib100: 693676 seconds 00000400:00000200:5.0:1644346404.135310:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346404.135315:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.1@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346404.135318:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346404.135321:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.1@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346404.135325:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.1@o2ib100) recovery failed with -110 00000400:00000200:5.0:1644346404.135328:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346404.135331:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.1@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346404.135334:0:170138:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:5.0:1644346404.135336:0:170138:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.1@o2ib100: 1 00000800:00000400:5.0:1644346404.135342:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.2@o2ib100: 18 seconds 00000400:00000200:5.0:1644346404.135345:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346404.135348:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.2@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346404.135350:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346404.135352:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.2@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346404.135354:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.2@o2ib100) recovery failed with -110 00000800:00000400:5.0:1644346404.135358:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.3@o2ib100: 8 seconds 00000800:00000400:5.0:1644346404.135360:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.3@o2ib100: 1 seconds 00000400:00000200:5.0:1644346404.135361:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346404.135363:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346404.135365:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.3@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346404.135367:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.3@o2ib100) recovery failed with -110 00000400:00000200:5.0:1644346404.135369:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346404.135370:0:170138:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:5.0:1644346404.135372:0:170138:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.3@o2ib100: 1 00000800:00000400:5.0:1644346404.135375:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.4@o2ib100: 693676 seconds 00000800:00000400:5.0:1644346404.135377:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.4@o2ib100: 693676 seconds 00000400:00000200:5.0:1644346404.135379:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346404.135385:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.4@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346404.135387:0:170138:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:5.0:1644346404.135389:0:170138:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.4@o2ib100: 1 00000400:00000200:5.0:1644346404.135391:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:6.0:1644346404.135393:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:5.0:1644346404.135393:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.4@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346404.135395:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346404.135396:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.4@o2ib100 recovery message sent unsuccessfully:-110 00000400:00000200:6.0:1644346404.135399:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.1@o2ib100(0000000017e41955) state 0x34860 00000400:00020000:5.0:1644346404.135399:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.4@o2ib100) recovery failed with -110 00000800:00000400:5.0:1644346404.135402:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.5@o2ib100: 13 seconds 00000400:00000200:6.0:1644346404.135403:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000140caae2 00000400:00000200:5.0:1644346404.135403:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:6.0:1644346404.135405:0:170148:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:5.0:1644346404.135406:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.5@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346404.135407:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346404.135409:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.5@o2ib100 recovery message sent unsuccessfully:-110 00000400:00000200:6.0:1644346404.135413:0:170148:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.1@o2ib100:-110 00000400:00020000:5.0:1644346404.135413:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.5@o2ib100) recovery failed with -110 00000400:00000200:6.0:1644346404.135415:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.1@o2ib100(0000000017e41955) state 0x36060 rc -110 00000800:00000400:5.0:1644346404.135416:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.6@o2ib100: 693676 seconds 00000400:00000200:5.0:1644346404.135417:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:6.0:1644346404.135418:0:170148:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.1@o2ib100: -110 00000400:00000200:6.0:1644346404.135421:0:170148:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.1@o2ib100 00000400:00000200:5.0:1644346404.135421:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.6@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346404.135423:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:6.0:1644346404.135424:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.3@o2ib100(00000000e73cae47) state 0x34860 00000400:00000200:6.0:1644346404.135425:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000008ad58149 00000400:00000200:5.0:1644346404.135425:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.6@o2ib100 recovery message sent unsuccessfully:-110 00000400:00000200:6.0:1644346404.135427:0:170148:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00020000:5.0:1644346404.135427:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.6@o2ib100) recovery failed with -110 00000400:00000200:6.0:1644346404.135429:0:170148:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.3@o2ib100:-110 00000400:00000200:6.0:1644346404.135431:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.3@o2ib100(00000000e73cae47) state 0x36060 rc -110 00000800:00000400:5.0:1644346404.135431:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.7@o2ib100: 693676 seconds 00000400:00000200:5.0:1644346404.135432:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:6.0:1644346404.135433:0:170148:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.3@o2ib100: -110 00000400:00000200:6.0:1644346404.135436:0:170148:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.3@o2ib100 00000400:00000200:5.0:1644346404.135436:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.7@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346404.135438:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:6.0:1644346404.135439:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.4@o2ib100(000000005d5e71b7) state 0x34860 00000400:00000200:6.0:1644346404.135440:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000009263fd81 00000400:00000200:5.0:1644346404.135440:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.7@o2ib100 recovery message sent unsuccessfully:-110 00000400:00000200:6.0:1644346404.135441:0:170148:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00020000:5.0:1644346404.135442:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.7@o2ib100) recovery failed with -110 00000400:00000200:6.0:1644346404.135443:0:170148:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.4@o2ib100:-110 00000400:00000200:6.0:1644346404.135445:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.4@o2ib100(000000005d5e71b7) state 0x36060 rc -110 00000400:00000200:6.0:1644346404.135447:0:170148:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.4@o2ib100: -110 00000400:00000200:6.0:1644346404.135448:0:170148:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.4@o2ib100 00000400:00000200:27.0:1644346405.159289:0:170149:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.3@o2ib100, cpt = 1 00000400:00000200:27.0:1644346405.159294:0:170149:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.3@o2ib100: 0 00000400:00000200:27.0:1644346405.159296:0:170149:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:27.0:1644346405.159298:0:170149:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:6.0:1644346405.159300:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000800:00000400:5.0:1644346405.159300:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.8@o2ib100: 57 seconds 00000400:00000200:27.0:1644346405.159301:0:170149:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.3@o2ib100 NID 172.19.2.3@o2ib100: 0. pending discovery 00000400:00000200:5.0:1644346405.159303:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:6.0:1644346405.159307:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.3@o2ib100(00000000e73cae47) state 0x36060 00000400:00000200:5.0:1644346405.159307:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.8@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:27.0:1644346405.159310:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000004045eb92 00000400:00000200:27.0:1644346405.159313:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:5.0:1644346405.159313:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:6.0:1644346405.159314:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.3@o2ib100 00000400:00000200:27.0:1644346405.159316:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.7@o2ib100 recovery ping unlinked 00000400:00000200:6.0:1644346405.159318:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.3@o2ib100 local destination 00000400:00000200:5.0:1644346405.159318:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.8@o2ib100 recovery message sent unsuccessfully:-110 00000400:00000200:6.0:1644346405.159321:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.3@o2ib100 00000400:00020000:5.0:1644346405.159321:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.8@o2ib100) recovery failed with -110 00000400:00000200:6.0:1644346405.159323:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.3@o2ib100(172.19.2.3@o2ib100:172.19.2.3@o2ib100) : GET try# 0 00000800:00000200:6.0:1644346405.159325:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.3@o2ib100 00000800:00000400:5.0:1644346405.159328:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.9@o2ib100: 693677 seconds 00000400:00000200:27.0:1644346405.159329:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.7@o2ib100 00000400:00000200:5.0:1644346405.159329:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000800:00000200:6.0:1644346405.159331:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000d977a909] -> 172.19.2.3@o2ib100 (2) version: 0 00000400:00000200:5.0:1644346405.159331:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.9@o2ib100: GET: NETWORK_TIMEOUT 00000800:00000200:6.0:1644346405.159332:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000d977a909] -> 172.19.2.3@o2ib100 (2) version: 0 00000400:00000200:5.0:1644346405.159332:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:27.0:1644346405.159334:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.7@o2ib100 local destination 00000400:00000200:6.0:1644346405.159334:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.3@o2ib100 00000400:00000200:5.0:1644346405.159334:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.9@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346405.159335:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.9@o2ib100) recovery failed with -110 00000400:00000200:6.0:1644346405.159336:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.3@o2ib100(00000000e73cae47) state 0x34260 rc 0 00000800:00000400:5.0:1644346405.159338:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.10@o2ib100: 120 seconds 00000400:00000200:5.0:1644346405.159339:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:27.0:1644346405.159341:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.7@o2ib100 00000400:00000200:5.0:1644346405.159342:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.10@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:27.0:1644346405.159347:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.7@o2ib100(172.19.2.7@o2ib100:172.19.2.7@o2ib100) : GET try# 0 00000400:00000200:5.0:1644346405.159350:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346405.159351:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.10@o2ib100 recovery message sent unsuccessfully:-110 00000800:00000200:27.0:1644346405.159352:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.7@o2ib100 00000400:00020000:5.0:1644346405.159352:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.10@o2ib100) recovery failed with -110 00000800:00000400:5.0:1644346405.159354:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.11@o2ib100: 4 seconds 00000800:00000400:5.0:1644346405.159355:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.11@o2ib100: 285 seconds 00000400:00000200:5.0:1644346405.159356:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346405.159357:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.11@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346405.159358:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346405.159359:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.11@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346405.159360:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.11@o2ib100) recovery failed with -110 00000400:00000200:5.0:1644346405.159361:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346405.159362:0:170138:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:5.0:1644346405.159365:0:170138:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.11@o2ib100: 1 00000800:00000200:27.0:1644346405.159366:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000b34b27af] -> 172.19.2.7@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346405.159369:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000b34b27af] -> 172.19.2.7@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346405.159372:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000042525ce2 00000400:00000200:6.0:1644346405.159372:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000800:00000400:5.0:1644346405.159372:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.12@o2ib100: 693677 seconds 00000400:00000200:27.0:1644346405.159373:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:6.0:1644346405.159373:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.11@o2ib100(000000004d74ddc6) state 0x34860 00000400:00000200:5.0:1644346405.159373:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:6.0:1644346405.159375:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000005560a4f 00000400:00000200:27.0:1644346405.159376:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.6@o2ib100 recovery ping unlinked 00000400:00000200:5.0:1644346405.159376:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.12@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346405.159377:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:6.0:1644346405.159378:0:170148:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:5.0:1644346405.159378:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.12@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346405.159379:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.12@o2ib100) recovery failed with -110 00000400:00000200:6.0:1644346405.159381:0:170148:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.11@o2ib100:-110 00000400:00000200:6.0:1644346405.159382:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.11@o2ib100(000000004d74ddc6) state 0x36060 rc -110 00000400:00000200:27.0:1644346405.159383:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.6@o2ib100 00000400:00000200:6.0:1644346405.159383:0:170148:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.11@o2ib100: -110 00000400:00000200:6.0:1644346405.159385:0:170148:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.11@o2ib100 00000400:00000200:27.0:1644346405.159389:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.6@o2ib100 local destination 00000400:00000200:27.0:1644346405.159393:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.6@o2ib100 00000400:00000200:27.0:1644346405.159398:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.6@o2ib100(172.19.2.6@o2ib100:172.19.2.6@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346405.159401:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.6@o2ib100 00000800:00000200:27.0:1644346405.159405:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000006f0e0d3a] -> 172.19.2.6@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346405.159407:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000006f0e0d3a] -> 172.19.2.6@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346405.159409:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000015445d5e 00000400:00000200:27.0:1644346405.159410:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346405.159412:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.5@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346405.159417:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.5@o2ib100 00000400:00000200:27.0:1644346405.159419:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.5@o2ib100 local destination 00000400:00000200:27.0:1644346405.159422:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.5@o2ib100 00000400:00000200:27.0:1644346405.159427:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.5@o2ib100(172.19.2.5@o2ib100:172.19.2.5@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346405.159430:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.5@o2ib100 00000800:00000200:27.0:1644346405.159435:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000055082794] -> 172.19.2.5@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346405.159437:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000055082794] -> 172.19.2.5@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346405.159439:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000067a42aa 00000400:00000200:27.0:1644346405.159440:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346405.159442:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.8@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346405.159445:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.8@o2ib100 00000400:00000200:27.0:1644346405.159448:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.8@o2ib100 local destination 00000400:00000200:27.0:1644346405.159450:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.8@o2ib100 00000400:00000200:27.0:1644346405.159455:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.8@o2ib100(172.19.2.8@o2ib100:172.19.2.8@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346405.159457:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.8@o2ib100 00000800:00000200:27.0:1644346405.159462:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000005241c541] -> 172.19.2.8@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346405.159464:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000005241c541] -> 172.19.2.8@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346405.159466:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000c0d74cb9 00000400:00000200:27.0:1644346405.159467:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346405.159469:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.4@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346405.159473:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.4@o2ib100 00000400:00000200:27.0:1644346405.159475:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.4@o2ib100 local destination 00000400:00000200:27.0:1644346405.159477:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.4@o2ib100 00000400:00000200:27.0:1644346405.159482:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.4@o2ib100(172.19.2.4@o2ib100:172.19.2.4@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346405.159485:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.4@o2ib100 00000800:00000200:27.0:1644346405.159487:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000004a830e13] -> 172.19.2.4@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346405.159489:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000004a830e13] -> 172.19.2.4@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346405.159491:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000920aee3e 00000400:00000200:27.0:1644346405.159492:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346405.159494:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.2@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346405.159498:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.2@o2ib100 00000400:00000200:27.0:1644346405.159500:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.2@o2ib100 local destination 00000400:00000200:27.0:1644346405.159502:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.2@o2ib100 00000400:00000200:27.0:1644346405.159506:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.2@o2ib100(172.19.2.2@o2ib100:172.19.2.2@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346405.159509:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.2@o2ib100 00000800:00000200:27.0:1644346405.159512:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000c9068579] -> 172.19.2.2@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346405.159513:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000c9068579] -> 172.19.2.2@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346405.159515:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000009c215f09 00000400:00000200:27.0:1644346405.159517:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346405.159518:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.1@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346405.159523:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.1@o2ib100 00000400:00000200:27.0:1644346405.159525:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.1@o2ib100 local destination 00000400:00000200:27.0:1644346405.159527:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.1@o2ib100 00000400:00000200:27.0:1644346405.159532:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.1@o2ib100(172.19.2.1@o2ib100:172.19.2.1@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346405.159534:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.1@o2ib100 00000800:00000200:27.0:1644346405.159539:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000b7e97ead] -> 172.19.2.1@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346405.159540:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000b7e97ead] -> 172.19.2.1@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346405.159542:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000000d69af4c 00000400:00000200:27.0:1644346405.159543:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346405.159545:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.3@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346405.159549:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.3@o2ib100 00000400:00000200:27.0:1644346405.159551:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.3@o2ib100 local destination 00000400:00000200:27.0:1644346405.159553:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.3@o2ib100 00000400:00000200:27.0:1644346405.159558:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.3@o2ib100(172.19.2.3@o2ib100:172.19.2.3@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346405.159561:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.3@o2ib100 00000800:00000200:27.0:1644346405.159563:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000d977a909] -> 172.19.2.3@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346405.159565:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000d977a909] -> 172.19.2.3@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346406.183291:0:170149:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.11@o2ib100, cpt = 1 00000400:00000200:27.0:1644346406.183297:0:170149:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.11@o2ib100: 0 00000400:00000200:27.0:1644346406.183298:0:170149:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:27.0:1644346406.183300:0:170149:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:6.0:1644346406.183301:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:27.0:1644346406.183302:0:170149:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.11@o2ib100 NID 172.19.2.11@o2ib100: 0. pending discovery 00000400:00000200:6.0:1644346406.183307:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.11@o2ib100(000000004d74ddc6) state 0x36060 00000400:00000200:27.0:1644346406.183310:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000044e51676 00000400:00000200:27.0:1644346406.183312:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:6.0:1644346406.183313:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.11@o2ib100 00000400:00000200:27.0:1644346406.183315:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.12@o2ib100 recovery ping unlinked 00000400:00000200:6.0:1644346406.183316:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.11@o2ib100 local destination 00000400:00000200:6.0:1644346406.183319:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.11@o2ib100 00000400:00000200:6.0:1644346406.183321:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.11@o2ib100(172.19.2.11@o2ib100:172.19.2.11@o2ib100) : GET try# 0 00000800:00000200:6.0:1644346406.183323:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.11@o2ib100 00000400:00000200:27.0:1644346406.183325:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.12@o2ib100 00000800:00000200:6.0:1644346406.183326:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f801af1f] -> 172.19.2.11@o2ib100 (2) version: 0 00000800:00000200:6.0:1644346406.183327:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f801af1f] -> 172.19.2.11@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346406.183329:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.12@o2ib100 local destination 00000400:00000200:6.0:1644346406.183329:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.11@o2ib100 00000400:00000200:6.0:1644346406.183330:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.11@o2ib100(000000004d74ddc6) state 0x34260 rc 0 00000400:00000200:27.0:1644346406.183347:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.12@o2ib100 00000400:00000200:27.0:1644346406.183349:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.12@o2ib100(172.19.2.12@o2ib100:172.19.2.12@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346406.183352:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.12@o2ib100 00000800:00000200:27.0:1644346406.183355:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a77c653d] -> 172.19.2.12@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346406.183356:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a77c653d] -> 172.19.2.12@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346406.183357:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000057bab685 00000400:00000200:27.0:1644346406.183357:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346406.183358:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.11@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346406.183360:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.11@o2ib100 00000400:00000200:27.0:1644346406.183364:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.11@o2ib100 local destination 00000400:00000200:27.0:1644346406.183366:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.11@o2ib100 00000400:00000200:27.0:1644346406.183368:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.11@o2ib100(172.19.2.11@o2ib100:172.19.2.11@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346406.183369:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.11@o2ib100 00000800:00000200:27.0:1644346406.183371:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f801af1f] -> 172.19.2.11@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346406.183372:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f801af1f] -> 172.19.2.11@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346406.183374:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000c6557ad2 00000400:00000200:27.0:1644346406.183374:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346406.183375:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.10@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346406.183377:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.10@o2ib100 00000400:00000200:27.0:1644346406.183377:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.10@o2ib100 local destination 00000400:00000200:27.0:1644346406.183378:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.10@o2ib100 00000400:00000200:27.0:1644346406.183380:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.10@o2ib100(172.19.2.10@o2ib100:172.19.2.10@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346406.183381:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.10@o2ib100 00000800:00000200:27.0:1644346406.183382:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000acd7a784] -> 172.19.2.10@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346406.183382:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000acd7a784] -> 172.19.2.10@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346406.183384:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000bcab80a7 00000400:00000200:27.0:1644346406.183384:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346406.183385:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.9@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346406.183387:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.9@o2ib100 00000400:00000200:27.0:1644346406.183388:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.9@o2ib100 local destination 00000400:00000200:27.0:1644346406.183388:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.9@o2ib100 00000400:00000200:27.0:1644346406.183390:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.9@o2ib100(172.19.2.9@o2ib100:172.19.2.9@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346406.183391:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.9@o2ib100 00000800:00000200:27.0:1644346406.183392:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f182ce9e] -> 172.19.2.9@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346406.183393:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f182ce9e] -> 172.19.2.9@o2ib100 (2) version: 0 00000800:00000200:56.2:1644346406.809755:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[000000003720f48b] (20)++ 00000800:00000200:50.0:1644346406.809818:0:170144:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[000000003720f48b] (21)++ 00000800:00000200:50.0:1644346406.809830:0:170144:0:(o2iblnd_cb.c:343:kiblnd_handle_rx()) Received d1[1] from 172.19.1.55@o2ib100 00000400:00000200:50.0:1644346406.809837:0:170144:0:(lib-move.c:4287:lnet_parse()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100) <- 192.168.128.103@o2ib18 : PUT - for me 00000400:00000200:50.0:1644346406.809844:0:170144:0:(lib-ptl.c:571:lnet_ptl_match_md()) Request from 12345-192.168.128.103@o2ib18 of length 224 into portal 28 MB=0x61afc2a8473c0 00000400:00000200:50.0:1644346406.809851:0:170144:0:(lib-ptl.c:200:lnet_try_match_md()) Incoming put index 1c from 12345-192.168.128.103@o2ib18 of length 224/224 into md 0x524f29 [8] + 8288 00000400:00000200:50.0:1644346406.809855:0:170144:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:50.0:1644346406.809862:0:170144:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.55@o2ib100: PUT: OK 00000100:00000200:50.0:1644346406.809865:0:170144:0:(events.c:313:request_in_callback()) event type 2, status 0, service ost 00000800:00000200:50.0:1644346406.809875:0:170144:0:(o2iblnd_cb.c:205:kiblnd_post_rx()) conn[000000003720f48b] (22)++ 00000800:00000200:51.0:1644346406.809878:0:170147:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000003720f48b] (23)-- 00000800:00000200:50.0:1644346406.809878:0:170144:0:(o2iblnd_cb.c:239:kiblnd_post_rx()) conn[000000003720f48b] (23)-- 00000800:00000200:50.0:1644346406.809883:0:170144:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000003720f48b] (21)-- 00000100:00000200:43.0:1644346406.809959:0:2689286:0:(service.c:2304:ptlrpc_server_handle_request()) got req 1718520207668160 00010000:00000200:43.0:1644346406.809975:0:2689286:0:(ldlm_lib.c:3215:target_send_reply_msg()) @@@ sending reply req@000000001d3833af x1718520207668160/t0(0) o400->e941be7c-6bba-b5a3-5d49-5e2cdc2d2e99@192.168.128.103@o2ib18:257/0 lens 224/224 e 0 to 0 dl 1644346467 ref 1 fl Interpret:H/0/0 rc 0/0 job:'kworker/52:0.0' 00000100:00000200:43.0:1644346406.809989:0:2689286:0:(niobuf.c:87:ptl_send_buf()) Sending 224 bytes to portal 4, xid 1718520207668160, offset 224 00000400:00000200:43.0:1644346406.809995:0:2689286:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-192.168.128.103@o2ib18 00000400:00000200:43.0:1644346406.810001:0:2689286:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source Specified: 172.19.1.137@o2ib100 to NMR: 192.168.128.103@o2ib18 routed destination 00000400:00000200:43.0:1644346406.810005:0:2689286:0:(lib-move.c:2014:lnet_handle_find_routed_path()) using src nid 172.19.1.137@o2ib100 for route restriction 00000400:00000200:43.0:1644346406.810008:0:2689286:0:(lib-move.c:1336:lnet_select_peer_ni()) 172.19.1.137@o2ib100 ni_is_pref = 1 00000400:00000200:43.0:1644346406.810010:0:2689286:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 192.168.128.103@o2ib18 00000400:00000200:43.0:1644346406.810012:0:2689286:0:(lib-move.c:1474:lnet_find_route_locked()) Looking up a route to o2ib18, from o2ib100 00000400:00000200:43.0:1644346406.810016:0:2689286:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 172.19.1.54@o2ib100 00000400:00000200:43.0:1644346406.810018:0:2689286:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 172.19.1.55@o2ib100 00000400:00000200:43.0:1644346406.810027:0:2689286:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:172.19.1.137@o2ib100) -> 192.168.128.103@o2ib18(192.168.128.103@o2ib18:172.19.1.54@o2ib100) : PUT try# 0 00000800:00000200:43.0:1644346406.810032:0:2689286:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 224 bytes in 1 frags to 12345-172.19.1.54@o2ib100 00000800:00000200:43.0:1644346406.810037:0:2689286:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000fd8faaad] -> 172.19.1.54@o2ib100 (3) version: 12 00000800:00000200:43.0:1644346406.810039:0:2689286:0:(o2iblnd_cb.c:1519:kiblnd_launch_tx()) conn[000000002c12efa1] (20)++ 00000800:00000200:43.0:1644346406.810042:0:2689286:0:(o2iblnd_cb.c:1265:kiblnd_queue_tx_locked()) conn[000000002c12efa1] (21)++ 00000800:00000200:43.0:1644346406.810053:0:2689286:0:(o2iblnd_cb.c:1525:kiblnd_launch_tx()) conn[000000002c12efa1] (22)-- 00000800:00000200:30.2:1644346406.810110:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[000000002c12efa1] (21)++ 00000800:00000200:53.0:1644346406.810172:0:170146:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[000000002c12efa1] (22)++ 00000800:00000200:53.0:1644346406.810193:0:170146:0:(o2iblnd_cb.c:75:kiblnd_tx_done()) conn[000000002c12efa1] (23)-- 00000400:00000200:53.0:1644346406.810195:0:170146:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:53.0:1644346406.810208:0:170146:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.54@o2ib100: PUT: OK 00000400:00000200:53.0:1644346406.810210:0:170146:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000ecc32d83 00000800:00000200:53.0:1644346406.810214:0:170146:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000002c12efa1] (22)-- 00000800:00000200:53.0:1644346406.810221:0:170146:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000002c12efa1] (21)-- 00000800:00000200:56.2:1644346408.921790:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[000000003720f48b] (20)++ 00000800:00000200:51.0:1644346408.921856:0:170147:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[000000003720f48b] (21)++ 00000800:00000200:51.0:1644346408.921876:0:170147:0:(o2iblnd_cb.c:343:kiblnd_handle_rx()) Received d1[0] from 172.19.1.55@o2ib100 00000400:00000200:51.0:1644346408.921878:0:170147:0:(lib-move.c:4287:lnet_parse()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100) <- 192.168.128.103@o2ib18 : PUT - for me 00000400:00000200:51.0:1644346408.921883:0:170147:0:(lib-ptl.c:571:lnet_ptl_match_md()) Request from 12345-192.168.128.103@o2ib18 of length 224 into portal 28 MB=0x61afc2a847a40 00000400:00000200:51.0:1644346408.921902:0:170147:0:(lib-ptl.c:200:lnet_try_match_md()) Incoming put index 1c from 12345-192.168.128.103@o2ib18 of length 224/224 into md 0x524f29 [8] + 8512 00000400:00000200:51.0:1644346408.921907:0:170147:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:51.0:1644346408.921909:0:170147:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.55@o2ib100: PUT: OK 00000100:00000200:51.0:1644346408.921910:0:170147:0:(events.c:313:request_in_callback()) event type 2, status 0, service ost 00000800:00000200:50.0:1644346408.921915:0:170144:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000003720f48b] (22)-- 00000800:00000200:51.0:1644346408.921917:0:170147:0:(o2iblnd_cb.c:205:kiblnd_post_rx()) conn[000000003720f48b] (21)++ 00000800:00000200:51.0:1644346408.921918:0:170147:0:(o2iblnd_cb.c:239:kiblnd_post_rx()) conn[000000003720f48b] (22)-- 00000800:00000200:51.0:1644346408.921919:0:170147:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000003720f48b] (21)-- 00000100:00000200:43.0:1644346408.921997:0:2689286:0:(service.c:2304:ptlrpc_server_handle_request()) got req 1718520207669824 00010000:00000200:43.0:1644346408.922005:0:2689286:0:(ldlm_lib.c:3215:target_send_reply_msg()) @@@ sending reply req@00000000b2ae7fc5 x1718520207669824/t0(0) o400->e941be7c-6bba-b5a3-5d49-5e2cdc2d2e99@192.168.128.103@o2ib18:259/0 lens 224/224 e 0 to 0 dl 1644346469 ref 1 fl Interpret:H/0/0 rc 0/0 job:'kworker/52:0.0' 00000100:00000200:43.0:1644346408.922011:0:2689286:0:(niobuf.c:87:ptl_send_buf()) Sending 224 bytes to portal 4, xid 1718520207669824, offset 224 00000400:00000200:43.0:1644346408.922013:0:2689286:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-192.168.128.103@o2ib18 00000400:00000200:43.0:1644346408.922033:0:2689286:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source Specified: 172.19.1.137@o2ib100 to NMR: 192.168.128.103@o2ib18 routed destination 00000400:00000200:43.0:1644346408.922034:0:2689286:0:(lib-move.c:2014:lnet_handle_find_routed_path()) using src nid 172.19.1.137@o2ib100 for route restriction 00000400:00000200:43.0:1644346408.922035:0:2689286:0:(lib-move.c:1336:lnet_select_peer_ni()) 172.19.1.137@o2ib100 ni_is_pref = 1 00000400:00000200:43.0:1644346408.922036:0:2689286:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 192.168.128.103@o2ib18 00000400:00000200:43.0:1644346408.922036:0:2689286:0:(lib-move.c:1474:lnet_find_route_locked()) Looking up a route to o2ib18, from o2ib100 00000400:00000200:43.0:1644346408.922038:0:2689286:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 172.19.1.54@o2ib100 00000400:00000200:43.0:1644346408.922039:0:2689286:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 172.19.1.55@o2ib100 00000400:00000200:43.0:1644346408.922042:0:2689286:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:172.19.1.137@o2ib100) -> 192.168.128.103@o2ib18(192.168.128.103@o2ib18:172.19.1.55@o2ib100) : PUT try# 0 00000800:00000200:43.0:1644346408.922044:0:2689286:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 224 bytes in 1 frags to 12345-172.19.1.55@o2ib100 00000800:00000200:43.0:1644346408.922047:0:2689286:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000b495b0db] -> 172.19.1.55@o2ib100 (3) version: 12 00000800:00000200:43.0:1644346408.922049:0:2689286:0:(o2iblnd_cb.c:1519:kiblnd_launch_tx()) conn[000000003720f48b] (20)++ 00000800:00000200:43.0:1644346408.922049:0:2689286:0:(o2iblnd_cb.c:1265:kiblnd_queue_tx_locked()) conn[000000003720f48b] (21)++ 00000800:00000200:43.0:1644346408.922063:0:2689286:0:(o2iblnd_cb.c:1525:kiblnd_launch_tx()) conn[000000003720f48b] (22)-- 00000800:00000200:56.2:1644346408.922076:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[000000003720f48b] (21)++ 00000800:00000200:53.0:1644346408.922133:0:170146:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[000000003720f48b] (22)++ 00000800:00000200:53.0:1644346408.922143:0:170146:0:(o2iblnd_cb.c:75:kiblnd_tx_done()) conn[000000003720f48b] (23)-- 00000400:00000200:53.0:1644346408.922146:0:170146:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:53.0:1644346408.922152:0:170146:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.55@o2ib100: PUT: OK 00000400:00000200:53.0:1644346408.922155:0:170146:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000e2a606e5 00000800:00000200:53.0:1644346408.922160:0:170146:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000003720f48b] (22)-- 00000800:00000200:53.0:1644346408.922163:0:170146:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000003720f48b] (21)-- 00000800:00000400:5.0:1644346413.159298:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.1.133@o2ib100: 93 seconds 00000400:00000200:5.0:1644346413.159303:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346413.159309:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.133@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346413.159313:0:170138:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:5.0:1644346413.159316:0:170138:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.1.133@o2ib100: 1 00000400:00000200:6.0:1644346413.159371:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:6.0:1644346413.159378:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.133@o2ib100(00000000f1da7397) state 0x4860 00000400:00000200:6.0:1644346413.159383:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000035d3df28 00000400:00000200:6.0:1644346413.159385:0:170148:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:6.0:1644346413.159389:0:170148:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.1.133@o2ib100:-110 00000400:00000200:6.0:1644346413.159392:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.133@o2ib100(00000000f1da7397) state 0x6060 rc -110 00000400:00000200:6.0:1644346413.159395:0:170148:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.1.133@o2ib100: -110 00000400:00000200:6.0:1644346413.159398:0:170148:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.1.133@o2ib100 00000400:00000200:6.0:1644346413.159400:0:170148:0:(lib-msg.c:1012:lnet_is_health_check()) msg 00000000c78a9a1c not committed for send or receive 00000400:00000200:6.0:1644346413.159402:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000001ffba440 00000100:00000200:6.0:1644346413.159406:0:170148:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@00000000fbc56cc2 x1723495536590720/t0(0) o250->MGC172.19.1.133@o2ib100@172.19.1.133@o2ib100:26/25 lens 520/544 e 0 to 0 dl 1644346456 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000400:00000200:6.0:1644346413.159422:0:170148:0:(lib-msg.c:1012:lnet_is_health_check()) msg 000000002a0bea91 not committed for send or receive 00000400:00000200:6.0:1644346413.159424:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000081fc26e 00000100:00000200:6.0:1644346413.159427:0:170148:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@00000000eb0b1976 x1723495536590784/t0(0) o38->lflood-MDT0000-lwp-OST0000@172.19.1.133@o2ib100:12/10 lens 520/544 e 0 to 0 dl 1644346456 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000400:00000200:6.0:1644346413.159445:0:170148:0:(lib-msg.c:1012:lnet_is_health_check()) msg 000000009ed887c1 not committed for send or receive 00000400:00000200:6.0:1644346413.159446:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000004ffa016d 00000100:00000200:6.0:1644346413.159448:0:170148:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@00000000045c2a92 x1723495536590848/t0(0) o38->lflood-MDT0001-lwp-OST0000@172.19.1.133@o2ib100:12/10 lens 520/544 e 0 to 0 dl 1644346456 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000400:00000200:4.0:1644346413.159476:0:170152:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000413f6aa6 00000100:00000200:4.0:1644346413.159483:0:170152:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@00000000fbc56cc2 x1723495536590720/t0(0) o250->MGC172.19.1.133@o2ib100@172.19.1.133@o2ib100:26/25 lens 520/544 e 0 to 1 dl 1644346456 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:4.0:1644346413.159494:0:170152:0:(events.c:122:reply_in_callback()) @@@ unlink req@00000000fbc56cc2 x1723495536590720/t0(0) o250->MGC172.19.1.133@o2ib100@172.19.1.133@o2ib100:26/25 lens 520/544 e 0 to 1 dl 1644346456 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000400:00000200:4.0:1644346413.159512:0:170152:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000ddf50ca8 00000100:00000200:4.0:1644346413.159517:0:170152:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@00000000eb0b1976 x1723495536590784/t0(0) o38->lflood-MDT0000-lwp-OST0000@172.19.1.133@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346456 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:4.0:1644346413.159524:0:170152:0:(events.c:122:reply_in_callback()) @@@ unlink req@00000000eb0b1976 x1723495536590784/t0(0) o38->lflood-MDT0000-lwp-OST0000@172.19.1.133@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346456 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000400:00000200:4.0:1644346413.159534:0:170152:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000abbb2c53 00000100:00000200:4.0:1644346413.159536:0:170152:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@00000000045c2a92 x1723495536590848/t0(0) o38->lflood-MDT0001-lwp-OST0000@172.19.1.133@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346456 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:4.0:1644346413.159542:0:170152:0:(events.c:122:reply_in_callback()) @@@ unlink req@00000000045c2a92 x1723495536590848/t0(0) o38->lflood-MDT0001-lwp-OST0000@172.19.1.133@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346456 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000100:00000200:4.0:1644346413.159571:0:170152:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495536591040, portal 25 00000100:00000200:4.0:1644346413.159575:0:170152:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 26, xid 1723495536591040, offset 0 00000400:00000200:4.0:1644346413.159579:0:170152:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.134@o2ib100 00000400:00000200:4.0:1644346413.159587:0:170152:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.134@o2ib100: 0 00000400:00000200:4.0:1644346413.159589:0:170152:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:4.0:1644346413.159591:0:170152:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:4.0:1644346413.159593:0:170152:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.134@o2ib100 NID 172.19.1.134@o2ib100: 0. pending discovery 00000400:00000200:4.0:1644346413.159595:0:170152:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 0000000077fe0700 delayed. 172.19.1.134@o2ib100 pending discovery 00000400:00000200:6.0:1644346413.159598:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:6.0:1644346413.159601:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.134@o2ib100(000000007e54abf4) state 0x6060 00000100:00000200:4.0:1644346413.159601:0:170152:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495536591104, portal 10 00000100:00000200:4.0:1644346413.159603:0:170152:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 12, xid 1723495536591104, offset 0 00000400:00000200:4.0:1644346413.159607:0:170152:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.134@o2ib100 00000400:00000200:6.0:1644346413.159622:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.1.134@o2ib100 00000400:00000200:4.0:1644346413.159622:0:170152:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.134@o2ib100: -114 00000400:00000200:4.0:1644346413.159622:0:170152:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:4.0:1644346413.159622:0:170152:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:4.0:1644346413.159623:0:170152:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.134@o2ib100 NID 172.19.1.134@o2ib100: 0. pending discovery 00000400:00000200:4.0:1644346413.159624:0:170152:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 000000003d54d472 delayed. 172.19.1.134@o2ib100 pending discovery 00000400:00000200:6.0:1644346413.159626:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.1.134@o2ib100 local destination 00000400:00000200:6.0:1644346413.159629:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.1.134@o2ib100 00000400:00000200:6.0:1644346413.159643:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.1.134@o2ib100(172.19.1.134@o2ib100:172.19.1.134@o2ib100) : GET try# 0 00000100:00000200:4.0:1644346413.159644:0:170152:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495536591168, portal 10 00000800:00000200:6.0:1644346413.159645:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.1.134@o2ib100 00000100:00000200:4.0:1644346413.159646:0:170152:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 12, xid 1723495536591168, offset 0 00000400:00000200:4.0:1644346413.159647:0:170152:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.134@o2ib100 00000800:00000200:6.0:1644346413.159648:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000035a33187] -> 172.19.1.134@o2ib100 (2) version: 0 00000400:00000200:4.0:1644346413.159648:0:170152:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.134@o2ib100: -114 00000400:00000200:4.0:1644346413.159648:0:170152:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000800:00000200:6.0:1644346413.159649:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000035a33187] -> 172.19.1.134@o2ib100 (2) version: 0 00000400:00000200:4.0:1644346413.159649:0:170152:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:6.0:1644346413.159650:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.1.134@o2ib100 00000400:00000200:6.0:1644346413.159651:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.134@o2ib100(000000007e54abf4) state 0x4260 rc 0 00000400:00000200:4.0:1644346413.159651:0:170152:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.134@o2ib100 NID 172.19.1.134@o2ib100: 0. pending discovery 00000400:00000200:4.0:1644346413.159652:0:170152:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 000000005f2c0c4a delayed. 172.19.1.134@o2ib100 pending discovery 00000800:00000200:30.2:1644346413.977808:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[000000002c12efa1] (20)++ 00000800:00000200:50.0:1644346413.977873:0:170144:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[000000002c12efa1] (21)++ 00000800:00000200:50.0:1644346413.977892:0:170144:0:(o2iblnd_cb.c:343:kiblnd_handle_rx()) Received d1[2] from 172.19.1.54@o2ib100 00000400:00000200:50.0:1644346413.977896:0:170144:0:(lib-move.c:4287:lnet_parse()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100) <- 192.168.128.103@o2ib18 : PUT - for me 00000400:00000200:50.0:1644346413.977900:0:170144:0:(lib-ptl.c:571:lnet_ptl_match_md()) Request from 12345-192.168.128.103@o2ib18 of length 224 into portal 28 MB=0x61afc2a848000 00000400:00000200:50.0:1644346413.977902:0:170144:0:(lib-ptl.c:200:lnet_try_match_md()) Incoming put index 1c from 12345-192.168.128.103@o2ib18 of length 224/224 into md 0x524f29 [8] + 8736 00000400:00000200:50.0:1644346413.977904:0:170144:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:50.0:1644346413.977906:0:170144:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.54@o2ib100: PUT: OK 00000100:00000200:50.0:1644346413.977908:0:170144:0:(events.c:313:request_in_callback()) event type 2, status 0, service ost 00000800:00000200:50.0:1644346413.977931:0:170144:0:(o2iblnd_cb.c:205:kiblnd_post_rx()) conn[000000002c12efa1] (22)++ 00000800:00000200:50.0:1644346413.977933:0:170144:0:(o2iblnd_cb.c:239:kiblnd_post_rx()) conn[000000002c12efa1] (23)-- 00000800:00000200:50.0:1644346413.977934:0:170144:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000002c12efa1] (22)-- 00000800:00000200:50.0:1644346413.977935:0:170144:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000002c12efa1] (21)-- 00000100:00000200:43.0:1644346413.978013:0:2689286:0:(service.c:2304:ptlrpc_server_handle_request()) got req 1718520207671296 00010000:00000200:43.0:1644346413.978029:0:2689286:0:(ldlm_lib.c:3215:target_send_reply_msg()) @@@ sending reply req@0000000002c28614 x1718520207671296/t0(0) o400->e941be7c-6bba-b5a3-5d49-5e2cdc2d2e99@192.168.128.103@o2ib18:264/0 lens 224/224 e 0 to 0 dl 1644346474 ref 1 fl Interpret:H/0/0 rc 0/0 job:'kworker/52:0.0' 00000100:00000200:43.0:1644346413.978041:0:2689286:0:(niobuf.c:87:ptl_send_buf()) Sending 224 bytes to portal 4, xid 1718520207671296, offset 224 00000400:00000200:43.0:1644346413.978047:0:2689286:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-192.168.128.103@o2ib18 00000400:00000200:43.0:1644346413.978054:0:2689286:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source Specified: 172.19.1.137@o2ib100 to NMR: 192.168.128.103@o2ib18 routed destination 00000400:00000200:43.0:1644346413.978057:0:2689286:0:(lib-move.c:2014:lnet_handle_find_routed_path()) using src nid 172.19.1.137@o2ib100 for route restriction 00000400:00000200:43.0:1644346413.978060:0:2689286:0:(lib-move.c:1336:lnet_select_peer_ni()) 172.19.1.137@o2ib100 ni_is_pref = 1 00000400:00000200:43.0:1644346413.978062:0:2689286:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 192.168.128.103@o2ib18 00000400:00000200:43.0:1644346413.978064:0:2689286:0:(lib-move.c:1474:lnet_find_route_locked()) Looking up a route to o2ib18, from o2ib100 00000400:00000200:43.0:1644346413.978076:0:2689286:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 172.19.1.54@o2ib100 00000400:00000200:43.0:1644346413.978080:0:2689286:0:(lib-move.c:1397:lnet_select_peer_ni()) sd_best_lpni = 172.19.1.55@o2ib100 00000400:00000200:43.0:1644346413.978089:0:2689286:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:172.19.1.137@o2ib100) -> 192.168.128.103@o2ib18(192.168.128.103@o2ib18:172.19.1.54@o2ib100) : PUT try# 0 00000800:00000200:43.0:1644346413.978094:0:2689286:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 224 bytes in 1 frags to 12345-172.19.1.54@o2ib100 00000800:00000200:43.0:1644346413.978099:0:2689286:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000fd8faaad] -> 172.19.1.54@o2ib100 (3) version: 12 00000800:00000200:43.0:1644346413.978102:0:2689286:0:(o2iblnd_cb.c:1519:kiblnd_launch_tx()) conn[000000002c12efa1] (20)++ 00000800:00000200:43.0:1644346413.978104:0:2689286:0:(o2iblnd_cb.c:1265:kiblnd_queue_tx_locked()) conn[000000002c12efa1] (21)++ 00000800:00000200:43.0:1644346413.978119:0:2689286:0:(o2iblnd_cb.c:1525:kiblnd_launch_tx()) conn[000000002c12efa1] (22)-- 00000800:00000200:30.2:1644346413.978174:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[000000002c12efa1] (21)++ 00000800:00000200:53.0:1644346413.978250:0:170146:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[000000002c12efa1] (22)++ 00000800:00000200:53.0:1644346413.978269:0:170146:0:(o2iblnd_cb.c:75:kiblnd_tx_done()) conn[000000002c12efa1] (23)-- 00000400:00000200:53.0:1644346413.978271:0:170146:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:53.0:1644346413.978274:0:170146:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.54@o2ib100: PUT: OK 00000400:00000200:53.0:1644346413.978275:0:170146:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000028dc0e37 00000800:00000200:53.0:1644346413.978279:0:170146:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000002c12efa1] (22)-- 00000800:00000200:53.0:1644346413.978280:0:170146:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000002c12efa1] (21)-- 00000800:00000400:5.0:1644346414.119223:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.1.134@o2ib100: 693686 seconds 00000400:00000200:5.0:1644346414.119227:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346414.119232:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.134@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346414.119235:0:170138:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:5.0:1644346414.119238:0:170138:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.1.134@o2ib100: 1 00000800:00000400:5.0:1644346414.119247:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.1.136@o2ib100: 63 seconds 00000400:00000200:5.0:1644346414.119249:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346414.119254:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.136@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346414.119255:0:170138:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:5.0:1644346414.119257:0:170138:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.1.136@o2ib100: 1 00000800:00000400:5.0:1644346414.119262:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.1.138@o2ib100: 3 seconds 00000400:00000200:5.0:1644346414.119263:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346414.119267:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.138@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346414.119270:0:170138:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000a1d695ea 00000400:00000200:5.0:1644346414.119272:0:170138:0:(api-ni.c:4180:lnet_ping_event_handler()) ping event (5 -110) unlinked 00000400:00000200:6.0:1644346414.119305:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:6.0:1644346414.119311:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.134@o2ib100(000000007e54abf4) state 0x4860 00000400:00000200:6.0:1644346414.119315:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000035d3df28 00000400:00000200:6.0:1644346414.119317:0:170148:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:6.0:1644346414.119321:0:170148:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.1.134@o2ib100:-110 00000400:00000200:6.0:1644346414.119323:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.134@o2ib100(000000007e54abf4) state 0x6060 rc -110 00000400:00000200:6.0:1644346414.119326:0:170148:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.1.134@o2ib100: -110 00000400:00000200:6.0:1644346414.119328:0:170148:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.1.134@o2ib100 00000400:00000200:6.0:1644346414.119330:0:170148:0:(lib-msg.c:1012:lnet_is_health_check()) msg 0000000077fe0700 not committed for send or receive 00000400:00000200:6.0:1644346414.119332:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000ddf50ca8 00000100:00000200:6.0:1644346414.119336:0:170148:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@00000000f30e307a x1723495536591040/t0(0) o250->MGC172.19.1.133@o2ib100@172.19.1.134@o2ib100:26/25 lens 520/544 e 0 to 0 dl 1644346468 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000400:00000200:6.0:1644346414.119350:0:170148:0:(lib-msg.c:1012:lnet_is_health_check()) msg 000000003d54d472 not committed for send or receive 00000400:00000200:6.0:1644346414.119352:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000009a37d965 00000100:00000200:6.0:1644346414.119354:0:170148:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@000000009ee425bb x1723495536591104/t0(0) o38->lflood-MDT0000-lwp-OST0000@172.19.1.134@o2ib100:12/10 lens 520/544 e 0 to 0 dl 1644346468 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000400:00000200:6.0:1644346414.119364:0:170148:0:(lib-msg.c:1012:lnet_is_health_check()) msg 000000005f2c0c4a not committed for send or receive 00000400:00000200:6.0:1644346414.119365:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000fa2b6be8 00000100:00000200:6.0:1644346414.119367:0:170148:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@00000000aa8d7a6d x1723495536591168/t0(0) o38->lflood-MDT0001-lwp-OST0000@172.19.1.134@o2ib100:12/10 lens 520/544 e 0 to 0 dl 1644346468 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000400:00000200:6.0:1644346414.119374:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.136@o2ib100(00000000d39439d1) state 0x4860 00000400:00000200:6.0:1644346414.119376:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000dccbd790 00000400:00000200:6.0:1644346414.119377:0:170148:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:6.0:1644346414.119379:0:170148:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.1.136@o2ib100:-110 00000400:00000200:6.0:1644346414.119381:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.136@o2ib100(00000000d39439d1) state 0x6060 rc -110 00000400:00000200:6.0:1644346414.119383:0:170148:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.1.136@o2ib100: -110 00000400:00000200:6.0:1644346414.119385:0:170148:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.1.136@o2ib100 00000400:00000200:6.0:1644346414.119386:0:170148:0:(lib-msg.c:1012:lnet_is_health_check()) msg 00000000cee34335 not committed for send or receive 00000400:00000200:6.0:1644346414.119387:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000001006d23 00000100:00000200:6.0:1644346414.119390:0:170148:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@000000002113e66a x1723495536590912/t0(0) o38->lflood-MDT0002-lwp-OST0000@172.19.1.136@o2ib100:12/10 lens 520/544 e 0 to 0 dl 1644346456 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000400:00000200:6.0:1644346414.119395:0:170148:0:(lib-msg.c:1012:lnet_is_health_check()) msg 0000000008a80a6d not committed for send or receive 00000400:00000200:6.0:1644346414.119396:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000030a1e219 00000100:00000200:6.0:1644346414.119399:0:170148:0:(events.c:59:request_out_callback()) @@@ type 5, status -110 req@0000000004cb8e4f x1723495536590976/t0(0) o38->lflood-MDT0003-lwp-OST0000@172.19.1.136@o2ib100:12/10 lens 520/544 e 0 to 0 dl 1644346456 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:'' 00000400:00000200:4.0:1644346414.119403:0:170152:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000abbb2c53 00000100:00000200:4.0:1644346414.119410:0:170152:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@00000000f30e307a x1723495536591040/t0(0) o250->MGC172.19.1.133@o2ib100@172.19.1.134@o2ib100:26/25 lens 520/544 e 0 to 1 dl 1644346468 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:4.0:1644346414.119427:0:170152:0:(events.c:122:reply_in_callback()) @@@ unlink req@00000000f30e307a x1723495536591040/t0(0) o250->MGC172.19.1.133@o2ib100@172.19.1.134@o2ib100:26/25 lens 520/544 e 0 to 1 dl 1644346468 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000400:00000200:4.0:1644346414.119435:0:170152:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000413f6aa6 00000100:00000200:4.0:1644346414.119436:0:170152:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@000000009ee425bb x1723495536591104/t0(0) o38->lflood-MDT0000-lwp-OST0000@172.19.1.134@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346468 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:4.0:1644346414.119439:0:170152:0:(events.c:122:reply_in_callback()) @@@ unlink req@000000009ee425bb x1723495536591104/t0(0) o38->lflood-MDT0000-lwp-OST0000@172.19.1.134@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346468 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000400:00000200:4.0:1644346414.119442:0:170152:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000004aac1500 00000100:00000200:4.0:1644346414.119442:0:170152:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@00000000aa8d7a6d x1723495536591168/t0(0) o38->lflood-MDT0001-lwp-OST0000@172.19.1.134@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346468 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:4.0:1644346414.119444:0:170152:0:(events.c:122:reply_in_callback()) @@@ unlink req@00000000aa8d7a6d x1723495536591168/t0(0) o38->lflood-MDT0001-lwp-OST0000@172.19.1.134@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346468 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000400:00000200:4.0:1644346414.119448:0:170152:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000733efadf 00000100:00000200:4.0:1644346414.119449:0:170152:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@0000000004cb8e4f x1723495536590976/t0(0) o38->lflood-MDT0003-lwp-OST0000@172.19.1.136@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346456 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:4.0:1644346414.119451:0:170152:0:(events.c:122:reply_in_callback()) @@@ unlink req@0000000004cb8e4f x1723495536590976/t0(0) o38->lflood-MDT0003-lwp-OST0000@172.19.1.136@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346456 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000400:00000200:4.0:1644346414.119455:0:170152:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000f2677084 00000100:00000200:4.0:1644346414.119456:0:170152:0:(events.c:100:reply_in_callback()) @@@ type 6, status 0 req@000000002113e66a x1723495536590912/t0(0) o38->lflood-MDT0002-lwp-OST0000@172.19.1.136@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346456 ref 1 fl Rpc:eXNQr/0/ffffffff rc 0/-1 job:'' 00000100:00000200:4.0:1644346414.119458:0:170152:0:(events.c:122:reply_in_callback()) @@@ unlink req@000000002113e66a x1723495536590912/t0(0) o38->lflood-MDT0002-lwp-OST0000@172.19.1.136@o2ib100:12/10 lens 520/544 e 0 to 1 dl 1644346456 ref 1 fl Rpc:eXNQU/0/ffffffff rc 0/-1 job:'' 00000100:00000200:5.0:1644346414.119522:0:170152:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495536591232, portal 25 00000100:00000200:5.0:1644346414.119524:0:170152:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 26, xid 1723495536591232, offset 0 00000400:00000200:5.0:1644346414.119526:0:170152:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.133@o2ib100 00000400:00000200:5.0:1644346414.119530:0:170152:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.133@o2ib100: 0 00000400:00000200:5.0:1644346414.119531:0:170152:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:5.0:1644346414.119531:0:170152:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:5.0:1644346414.119532:0:170152:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.133@o2ib100 NID 172.19.1.133@o2ib100: 0. pending discovery 00000400:00000200:5.0:1644346414.119533:0:170152:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 00000000218c0a2e delayed. 172.19.1.133@o2ib100 pending discovery 00000400:00000200:6.0:1644346414.119534:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:6.0:1644346414.119536:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.133@o2ib100(00000000f1da7397) state 0x6060 00000400:00000200:6.0:1644346414.119539:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.1.133@o2ib100 00000100:00000200:5.0:1644346414.119539:0:170152:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495536591296, portal 10 00000100:00000200:5.0:1644346414.119540:0:170152:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 12, xid 1723495536591296, offset 0 00000400:00000200:6.0:1644346414.119542:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.1.133@o2ib100 local destination 00000400:00000200:5.0:1644346414.119542:0:170152:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.133@o2ib100 00000400:00000200:6.0:1644346414.119545:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.1.133@o2ib100 00000400:00000200:6.0:1644346414.119547:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.1.133@o2ib100(172.19.1.133@o2ib100:172.19.1.133@o2ib100) : GET try# 0 00000800:00000200:6.0:1644346414.119548:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.1.133@o2ib100 00000400:00000200:5.0:1644346414.119549:0:170152:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.133@o2ib100: -114 00000400:00000200:5.0:1644346414.119551:0:170152:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:5.0:1644346414.119551:0:170152:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:5.0:1644346414.119552:0:170152:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.133@o2ib100 NID 172.19.1.133@o2ib100: 0. pending discovery 00000400:00000200:5.0:1644346414.119553:0:170152:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 00000000dfbaab81 delayed. 172.19.1.133@o2ib100 pending discovery 00000800:00000200:6.0:1644346414.119556:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000002172b4cf] -> 172.19.1.133@o2ib100 (2) version: 0 00000100:00000200:5.0:1644346414.119556:0:170152:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495536591360, portal 10 00000800:00000200:6.0:1644346414.119557:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000002172b4cf] -> 172.19.1.133@o2ib100 (2) version: 0 00000100:00000200:5.0:1644346414.119557:0:170152:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 12, xid 1723495536591360, offset 0 00000400:00000200:6.0:1644346414.119558:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.1.133@o2ib100 00000400:00000200:6.0:1644346414.119559:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.133@o2ib100(00000000f1da7397) state 0x4260 rc 0 00000400:00000200:5.0:1644346414.119559:0:170152:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.133@o2ib100 00000400:00000200:5.0:1644346414.119562:0:170152:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.133@o2ib100: -114 00000400:00000200:5.0:1644346414.119562:0:170152:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:5.0:1644346414.119563:0:170152:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:5.0:1644346414.119564:0:170152:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.133@o2ib100 NID 172.19.1.133@o2ib100: 0. pending discovery 00000400:00000200:5.0:1644346414.119564:0:170152:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 0000000096736145 delayed. 172.19.1.133@o2ib100 pending discovery 00000100:00000200:5.0:1644346414.119567:0:170152:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495536591424, portal 10 00000100:00000200:5.0:1644346414.119568:0:170152:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 12, xid 1723495536591424, offset 0 00000400:00000200:5.0:1644346414.119569:0:170152:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.135@o2ib100 00000400:00000200:5.0:1644346414.119573:0:170152:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.135@o2ib100: 0 00000400:00000200:5.0:1644346414.119573:0:170152:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:5.0:1644346414.119573:0:170152:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:5.0:1644346414.119574:0:170152:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.135@o2ib100 NID 172.19.1.135@o2ib100: 0. pending discovery 00000400:00000200:5.0:1644346414.119575:0:170152:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 000000007932cd5f delayed. 172.19.1.135@o2ib100 pending discovery 00000400:00000200:6.0:1644346414.119576:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:6.0:1644346414.119577:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.135@o2ib100(0000000024b6c9c7) state 0x6060 00000100:00000200:5.0:1644346414.119577:0:170152:0:(niobuf.c:903:ptl_send_rpc()) Setup reply buffer: 1024 bytes, xid 1723495536591488, portal 10 00000100:00000200:5.0:1644346414.119579:0:170152:0:(niobuf.c:87:ptl_send_buf()) Sending 520 bytes to portal 12, xid 1723495536591488, offset 0 00000400:00000200:6.0:1644346414.119580:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.1.135@o2ib100 00000400:00000200:5.0:1644346414.119581:0:170152:0:(lib-move.c:4787:LNetPut()) LNetPut -> 12345-172.19.1.135@o2ib100 00000400:00000200:6.0:1644346414.119583:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.1.135@o2ib100 local destination 00000400:00000200:6.0:1644346414.119584:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.1.135@o2ib100 00000400:00000200:6.0:1644346414.119585:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.1.135@o2ib100(172.19.1.135@o2ib100:172.19.1.135@o2ib100) : GET try# 0 00000800:00000200:6.0:1644346414.119587:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.1.135@o2ib100 00000400:00000200:5.0:1644346414.119587:0:170152:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.135@o2ib100: -114 00000400:00000200:5.0:1644346414.119587:0:170152:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:5.0:1644346414.119588:0:170152:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:5.0:1644346414.119588:0:170152:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.135@o2ib100 NID 172.19.1.135@o2ib100: 0. pending discovery 00000400:00000200:5.0:1644346414.119589:0:170152:0:(lib-move.c:1986:lnet_initiate_peer_discovery()) msg 00000000dc9011d5 delayed. 172.19.1.135@o2ib100 pending discovery 00000800:00000200:6.0:1644346414.119590:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000c3c4eac4] -> 172.19.1.135@o2ib100 (2) version: 0 00000800:00000200:6.0:1644346414.119591:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000c3c4eac4] -> 172.19.1.135@o2ib100 (2) version: 0 00000400:00000200:6.0:1644346414.119592:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.1.135@o2ib100 00000400:00000200:6.0:1644346414.119592:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.135@o2ib100(0000000024b6c9c7) state 0x4260 rc 0 00000800:00000400:5.0:1644346416.167286:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.1@o2ib100: 693688 seconds 00000400:00000200:5.0:1644346416.167291:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346416.167297:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.1@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346416.167300:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346416.167307:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.1@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346416.167310:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.1@o2ib100) recovery failed with -110 00000800:00000400:5.0:1644346416.167316:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.2@o2ib100: 131 seconds 00000400:00000200:5.0:1644346416.167319:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346416.167323:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.2@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346416.167324:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346416.167326:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.2@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346416.167329:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.2@o2ib100) recovery failed with -110 00000400:00000200:27.0:1644346416.423302:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000920aee3e 00000400:00000200:27.0:1644346416.423306:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346416.423310:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.2@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346416.423318:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.2@o2ib100 00000400:00000200:27.0:1644346416.423326:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.2@o2ib100 local destination 00000400:00000200:27.0:1644346416.423331:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.2@o2ib100 00000400:00000200:27.0:1644346416.423337:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.2@o2ib100(172.19.2.2@o2ib100:172.19.2.2@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346416.423342:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.2@o2ib100 00000800:00000200:27.0:1644346416.423348:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000c9068579] -> 172.19.2.2@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346416.423351:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000c9068579] -> 172.19.2.2@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346416.423353:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000009c215f09 00000400:00000200:27.0:1644346416.423354:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346416.423356:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.1@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346416.423360:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.1@o2ib100 00000400:00000200:27.0:1644346416.423363:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.1@o2ib100 local destination 00000400:00000200:27.0:1644346416.423366:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.1@o2ib100 00000400:00000200:27.0:1644346416.423371:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.1@o2ib100(172.19.2.1@o2ib100:172.19.2.1@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346416.423375:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.1@o2ib100 00000800:00000200:27.0:1644346416.423379:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000b7e97ead] -> 172.19.2.1@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346416.423381:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000b7e97ead] -> 172.19.2.1@o2ib100 (2) version: 0 00000800:00000400:5.0:1644346417.127294:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.3@o2ib100: 69 seconds 00000800:00000400:5.0:1644346417.127310:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.3@o2ib100: 26 seconds 00000400:00000200:5.0:1644346417.127312:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346417.127315:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.3@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346417.127317:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346417.127319:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.3@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346417.127321:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.3@o2ib100) recovery failed with -110 00000400:00000200:5.0:1644346417.127324:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346417.127325:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.3@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346417.127326:0:170138:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:5.0:1644346417.127327:0:170138:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.3@o2ib100: 1 00000800:00000400:5.0:1644346417.127331:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.4@o2ib100: 693689 seconds 00000400:00000200:5.0:1644346417.127333:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346417.127334:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.4@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346417.127335:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346417.127336:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.4@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346417.127337:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.4@o2ib100) recovery failed with -110 00000800:00000400:5.0:1644346417.127340:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.5@o2ib100: 16 seconds 00000400:00000200:5.0:1644346417.127341:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346417.127342:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.5@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346417.127343:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346417.127344:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.5@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346417.127347:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.5@o2ib100) recovery failed with -110 00000800:00000400:5.0:1644346417.127349:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.6@o2ib100: 693689 seconds 00000400:00000200:5.0:1644346417.127350:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346417.127351:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.6@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346417.127352:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346417.127352:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.6@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346417.127353:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.6@o2ib100) recovery failed with -110 00000800:00000400:5.0:1644346417.127354:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.7@o2ib100: 693689 seconds 00000400:00000200:5.0:1644346417.127356:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346417.127357:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.7@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346417.127358:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346417.127359:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.7@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346417.127359:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.7@o2ib100) recovery failed with -110 00000800:00000400:5.0:1644346417.127361:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.8@o2ib100: 297 seconds 00000400:00000200:5.0:1644346417.127362:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346417.127363:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.8@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346417.127364:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346417.127364:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.8@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346417.127365:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.8@o2ib100) recovery failed with -110 00000800:00000400:5.0:1644346417.127366:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.9@o2ib100: 693689 seconds 00000400:00000200:5.0:1644346417.127367:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346417.127368:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.9@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346417.127369:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346417.127370:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.9@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346417.127371:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.9@o2ib100) recovery failed with -110 00000800:00000400:5.0:1644346417.127372:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.10@o2ib100: 6 seconds 00000800:00000400:5.0:1644346417.127373:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.10@o2ib100: 31 seconds 00000400:00000200:6.0:1644346417.127374:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:5.0:1644346417.127374:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346417.127375:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.10@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:6.0:1644346417.127377:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.3@o2ib100(00000000e73cae47) state 0x34860 00000400:00000200:5.0:1644346417.127377:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346417.127378:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.10@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346417.127379:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.10@o2ib100) recovery failed with -110 00000400:00000200:5.0:1644346417.127380:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346417.127381:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.10@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:6.0:1644346417.127394:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000009263fd81 00000400:00000200:6.0:1644346417.127395:0:170148:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:5.0:1644346417.127396:0:170138:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:5.0:1644346417.127397:0:170138:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.10@o2ib100: 1 00000400:00000200:6.0:1644346417.127398:0:170148:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.3@o2ib100:-110 00000400:00000200:6.0:1644346417.127399:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.3@o2ib100(00000000e73cae47) state 0x36060 rc -110 00000400:00000200:6.0:1644346417.127400:0:170148:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.3@o2ib100: -110 00000400:00000200:6.0:1644346417.127402:0:170148:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.3@o2ib100 00000400:00000200:6.0:1644346417.127403:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.10@o2ib100(00000000454f9c95) state 0x34860 00000400:00000200:6.0:1644346417.127405:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000132d77ee 00000400:00000200:6.0:1644346417.127405:0:170148:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:6.0:1644346417.127406:0:170148:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.10@o2ib100:-110 00000400:00000200:6.0:1644346417.127407:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.10@o2ib100(00000000454f9c95) state 0x36060 rc -110 00000400:00000200:6.0:1644346417.127407:0:170148:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.10@o2ib100: -110 00000400:00000200:6.0:1644346417.127408:0:170148:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.10@o2ib100 00000400:00000200:27.0:1644346417.447290:0:170149:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.10@o2ib100, cpt = 1 00000400:00000200:27.0:1644346417.447298:0:170149:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.10@o2ib100: 0 00000400:00000200:27.0:1644346417.447299:0:170149:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:27.0:1644346417.447300:0:170149:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:27.0:1644346417.447303:0:170149:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.10@o2ib100 NID 172.19.2.10@o2ib100: 0. pending discovery 00000400:00000200:27.0:1644346417.447308:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000c6557ad2 00000400:00000200:27.0:1644346417.447311:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346417.447325:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.10@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346417.447330:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.10@o2ib100 00000400:00000200:27.0:1644346417.447331:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.10@o2ib100 local destination 00000400:00000200:27.0:1644346417.447333:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.10@o2ib100 00000400:00000200:27.0:1644346417.447336:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.10@o2ib100(172.19.2.10@o2ib100:172.19.2.10@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346417.447338:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.10@o2ib100 00000800:00000200:27.0:1644346417.447341:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000acd7a784] -> 172.19.2.10@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346417.447344:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000acd7a784] -> 172.19.2.10@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346417.447345:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000bcab80a7 00000400:00000200:27.0:1644346417.447346:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346417.447347:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.9@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346417.447349:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.9@o2ib100 00000400:00000200:27.0:1644346417.447350:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.9@o2ib100 local destination 00000400:00000200:6.0:1644346417.447351:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:27.0:1644346417.447352:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.9@o2ib100 00000400:00000200:27.0:1644346417.447354:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.9@o2ib100(172.19.2.9@o2ib100:172.19.2.9@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346417.447355:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.9@o2ib100 00000800:00000200:27.0:1644346417.447359:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f182ce9e] -> 172.19.2.9@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346417.447359:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f182ce9e] -> 172.19.2.9@o2ib100 (2) version: 0 00000400:00000200:6.0:1644346417.447359:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.10@o2ib100(00000000454f9c95) state 0x36060 00000400:00000200:27.0:1644346417.447361:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000004045eb92 00000400:00000200:27.0:1644346417.447361:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346417.447363:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.7@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346417.447365:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.7@o2ib100 00000400:00000200:27.0:1644346417.447366:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.7@o2ib100 local destination 00000400:00000200:27.0:1644346417.447367:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.7@o2ib100 00000400:00000200:27.0:1644346417.447368:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.7@o2ib100(172.19.2.7@o2ib100:172.19.2.7@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346417.447369:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.7@o2ib100 00000800:00000200:27.0:1644346417.447372:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000b34b27af] -> 172.19.2.7@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346417.447372:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000b34b27af] -> 172.19.2.7@o2ib100 (2) version: 0 00000400:00000200:6.0:1644346417.447372:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.10@o2ib100 00000400:00000200:27.0:1644346417.447374:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000042525ce2 00000400:00000200:27.0:1644346417.447374:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346417.447375:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.6@o2ib100 recovery ping unlinked 00000400:00000200:6.0:1644346417.447378:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.10@o2ib100 local destination 00000400:00000200:6.0:1644346417.447384:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.10@o2ib100 00000400:00000200:6.0:1644346417.447393:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.10@o2ib100(172.19.2.10@o2ib100:172.19.2.10@o2ib100) : GET try# 0 00000400:00000200:27.0:1644346417.447397:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.6@o2ib100 00000400:00000200:27.0:1644346417.447399:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.6@o2ib100 local destination 00000800:00000200:6.0:1644346417.447399:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.10@o2ib100 00000400:00000200:27.0:1644346417.447400:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.6@o2ib100 00000400:00000200:27.0:1644346417.447402:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.6@o2ib100(172.19.2.6@o2ib100:172.19.2.6@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346417.447403:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.6@o2ib100 00000800:00000200:27.0:1644346417.447404:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000006f0e0d3a] -> 172.19.2.6@o2ib100 (2) version: 0 00000800:00000200:6.0:1644346417.447407:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000acd7a784] -> 172.19.2.10@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346417.447411:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000006f0e0d3a] -> 172.19.2.6@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346417.447412:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000015445d5e 00000400:00000200:27.0:1644346417.447413:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000800:00000200:6.0:1644346417.447413:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000acd7a784] -> 172.19.2.10@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346417.447414:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.5@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346417.447416:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.5@o2ib100 00000400:00000200:27.0:1644346417.447417:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.5@o2ib100 local destination 00000400:00000200:6.0:1644346417.447417:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.10@o2ib100 00000400:00000200:6.0:1644346417.447419:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.10@o2ib100(00000000454f9c95) state 0x34260 rc 0 00000400:00000200:27.0:1644346417.447420:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.5@o2ib100 00000400:00000200:27.0:1644346417.447422:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.5@o2ib100(172.19.2.5@o2ib100:172.19.2.5@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346417.447424:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.5@o2ib100 00000800:00000200:27.0:1644346417.447426:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000055082794] -> 172.19.2.5@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346417.447427:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [0000000055082794] -> 172.19.2.5@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346417.447429:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000067a42aa 00000400:00000200:27.0:1644346417.447429:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346417.447430:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.8@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346417.447432:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.8@o2ib100 00000400:00000200:27.0:1644346417.447433:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.8@o2ib100 local destination 00000400:00000200:27.0:1644346417.447433:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.8@o2ib100 00000400:00000200:27.0:1644346417.447435:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.8@o2ib100(172.19.2.8@o2ib100:172.19.2.8@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346417.447436:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.8@o2ib100 00000800:00000200:27.0:1644346417.447437:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000005241c541] -> 172.19.2.8@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346417.447438:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000005241c541] -> 172.19.2.8@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346417.447439:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 00000000c0d74cb9 00000400:00000200:27.0:1644346417.447439:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346417.447440:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.4@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346417.447442:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.4@o2ib100 00000400:00000200:27.0:1644346417.447443:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.4@o2ib100 local destination 00000400:00000200:27.0:1644346417.447443:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.4@o2ib100 00000400:00000200:27.0:1644346417.447445:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.4@o2ib100(172.19.2.4@o2ib100:172.19.2.4@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346417.447446:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.4@o2ib100 00000800:00000200:27.0:1644346417.447447:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000004a830e13] -> 172.19.2.4@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346417.447451:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000004a830e13] -> 172.19.2.4@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346417.447452:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000000d69af4c 00000400:00000200:27.0:1644346417.447453:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346417.447453:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.3@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346417.447455:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.3@o2ib100 00000400:00000200:27.0:1644346417.447456:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.3@o2ib100 local destination 00000400:00000200:27.0:1644346417.447457:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.3@o2ib100 00000400:00000200:27.0:1644346417.447459:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.3@o2ib100(172.19.2.3@o2ib100:172.19.2.3@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346417.447460:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.3@o2ib100 00000800:00000200:27.0:1644346417.447461:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000d977a909] -> 172.19.2.3@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346417.447461:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000d977a909] -> 172.19.2.3@o2ib100 (2) version: 0 00000800:00000400:5.0:1644346418.151299:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.11@o2ib100: 22 seconds 00000800:00000400:5.0:1644346418.151301:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.11@o2ib100: 15 seconds 00000400:00000200:5.0:1644346418.151303:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346418.151305:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.11@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346418.151307:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346418.151308:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.11@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346418.151310:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.11@o2ib100) recovery failed with -110 00000400:00000200:5.0:1644346418.151313:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346418.151314:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.11@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346418.151315:0:170138:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:5.0:1644346418.151316:0:170138:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.2.11@o2ib100: 1 00000800:00000400:5.0:1644346418.151321:0:170138:0:(o2iblnd_cb.c:3383:kiblnd_check_conns()) Timed out tx for 172.19.2.12@o2ib100: 693690 seconds 00000400:00000200:5.0:1644346418.151322:0:170138:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = -110, hstatus = 10 00000400:00000200:5.0:1644346418.151323:0:170138:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.2.12@o2ib100: GET: NETWORK_TIMEOUT 00000400:00000200:5.0:1644346418.151324:0:170138:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 5 status: -110 00000400:00000200:5.0:1644346418.151325:0:170138:0:(lib-move.c:3787:lnet_mt_event_handler()) 172.19.2.12@o2ib100 recovery message sent unsuccessfully:-110 00000400:00020000:5.0:1644346418.151326:0:170138:0:(lib-move.c:3756:lnet_handle_recovery_reply()) peer NI (172.19.2.12@o2ib100) recovery failed with -110 00000400:00000200:6.0:1644346418.151364:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:6.0:1644346418.151367:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.11@o2ib100(000000004d74ddc6) state 0x34860 00000400:00000200:6.0:1644346418.151382:0:170148:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000005560a4f 00000400:00000200:6.0:1644346418.151383:0:170148:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 6 00000400:00000200:6.0:1644346418.151385:0:170148:0:(peer.c:2955:lnet_peer_ping_failed()) peer 172.19.2.11@o2ib100:-110 00000400:00000200:6.0:1644346418.151385:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.11@o2ib100(000000004d74ddc6) state 0x36060 rc -110 00000400:00000200:6.0:1644346418.151386:0:170148:0:(peer.c:3193:lnet_peer_discovery_error()) Discovery error 172.19.2.11@o2ib100: -110 00000400:00000200:6.0:1644346418.151387:0:170148:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.2.11@o2ib100 00000400:00000200:27.0:1644346418.471288:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000044e51676 00000400:00000200:27.0:1644346418.471292:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346418.471296:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.12@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346418.471304:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.12@o2ib100 00000400:00000200:27.0:1644346418.471307:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.12@o2ib100 local destination 00000400:00000200:27.0:1644346418.471312:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.12@o2ib100 00000400:00000200:27.0:1644346418.471318:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.12@o2ib100(172.19.2.12@o2ib100:172.19.2.12@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346418.471322:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.12@o2ib100 00000800:00000200:27.0:1644346418.471327:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a77c653d] -> 172.19.2.12@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346418.471329:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a77c653d] -> 172.19.2.12@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346418.471332:0:170149:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000057bab685 00000400:00000200:27.0:1644346418.471333:0:170149:0:(lib-move.c:3772:lnet_mt_event_handler()) Received event: 6 status: 0 00000400:00000200:27.0:1644346418.471335:0:170149:0:(lib-move.c:3777:lnet_mt_event_handler()) 172.19.2.11@o2ib100 recovery ping unlinked 00000400:00000200:27.0:1644346418.471339:0:170149:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.11@o2ib100 00000400:00000200:27.0:1644346418.471341:0:170149:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.11@o2ib100 local destination 00000400:00000200:27.0:1644346418.471344:0:170149:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.11@o2ib100 00000400:00000200:27.0:1644346418.471350:0:170149:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.11@o2ib100(172.19.2.11@o2ib100:172.19.2.11@o2ib100) : GET try# 0 00000800:00000200:27.0:1644346418.471353:0:170149:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.11@o2ib100 00000800:00000200:27.0:1644346418.471357:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f801af1f] -> 172.19.2.11@o2ib100 (2) version: 0 00000800:00000200:27.0:1644346418.471362:0:170149:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f801af1f] -> 172.19.2.11@o2ib100 (2) version: 0 00000400:00000200:27.0:1644346421.543292:0:170149:0:(router.c:1231:lnet_check_routers()) discover 172.19.1.54@o2ib100, cpt = 1 00000400:00000200:27.0:1644346421.543299:0:170149:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.54@o2ib100: 0 00000400:00000200:27.0:1644346421.543301:0:170149:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:27.0:1644346421.543302:0:170149:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:27.0:1644346421.543304:0:170149:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.54@o2ib100 NID 172.19.1.54@o2ib100: 0. pending discovery 00000400:00000200:27.0:1644346421.543307:0:170149:0:(router.c:1231:lnet_check_routers()) discover 172.19.1.55@o2ib100, cpt = 1 00000400:00000200:27.0:1644346421.543310:0:170149:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.1.55@o2ib100: 0 00000400:00000200:27.0:1644346421.543311:0:170149:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:27.0:1644346421.543312:0:170149:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:27.0:1644346421.543314:0:170149:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.1.55@o2ib100 NID 172.19.1.55@o2ib100: 0. pending discovery 00000400:00000200:27.0:1644346421.543316:0:170149:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.1@o2ib100, cpt = 1 00000400:00000200:27.0:1644346421.543318:0:170149:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.1@o2ib100: 0 00000400:00000200:27.0:1644346421.543319:0:170149:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:27.0:1644346421.543320:0:170149:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:27.0:1644346421.543322:0:170149:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.1@o2ib100 NID 172.19.2.1@o2ib100: 0. pending discovery 00000400:00000200:27.0:1644346421.543325:0:170149:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.4@o2ib100, cpt = 1 00000400:00000200:27.0:1644346421.543327:0:170149:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.4@o2ib100: 0 00000400:00000200:27.0:1644346421.543328:0:170149:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:27.0:1644346421.543328:0:170149:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:27.0:1644346421.543330:0:170149:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.4@o2ib100 NID 172.19.2.4@o2ib100: 0. pending discovery 00000400:00000200:27.0:1644346421.543333:0:170149:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.6@o2ib100, cpt = 1 00000400:00000200:27.0:1644346421.543335:0:170149:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.6@o2ib100: 0 00000400:00000200:27.0:1644346421.543335:0:170149:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:27.0:1644346421.543336:0:170149:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:27.0:1644346421.543338:0:170149:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.6@o2ib100 NID 172.19.2.6@o2ib100: 0. pending discovery 00000400:00000200:27.0:1644346421.543340:0:170149:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.7@o2ib100, cpt = 1 00000400:00000200:27.0:1644346421.543342:0:170149:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.7@o2ib100: 0 00000400:00000200:27.0:1644346421.543343:0:170149:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:27.0:1644346421.543345:0:170149:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:27.0:1644346421.543347:0:170149:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.7@o2ib100 NID 172.19.2.7@o2ib100: 0. pending discovery 00000400:00000200:27.0:1644346421.543349:0:170149:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.9@o2ib100, cpt = 1 00000400:00000200:27.0:1644346421.543358:0:170149:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.9@o2ib100: 0 00000400:00000200:6.0:1644346421.543358:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:27.0:1644346421.543363:0:170149:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:27.0:1644346421.543364:0:170149:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:27.0:1644346421.543366:0:170149:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.9@o2ib100 NID 172.19.2.9@o2ib100: 0. pending discovery 00000400:00000200:27.0:1644346421.543369:0:170149:0:(router.c:1231:lnet_check_routers()) discover 172.19.2.12@o2ib100, cpt = 1 00000400:00000200:6.0:1644346421.543372:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.54@o2ib100(00000000ff93ee65) state 0x36056 00000400:00000200:27.0:1644346421.543373:0:170149:0:(peer.c:1913:lnet_peer_queue_for_discovery()) Queue peer 172.19.2.12@o2ib100: 0 00000400:00000200:27.0:1644346421.543374:0:170149:0:(peer.c:2224:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:27.0:1644346421.543375:0:170149:0:(peer.c:2265:lnet_discover_peer_locked()) non-blocking discovery 00000400:00000200:27.0:1644346421.543377:0:170149:0:(peer.c:2272:lnet_discover_peer_locked()) peer 172.19.2.12@o2ib100 NID 172.19.2.12@o2ib100: 0. pending discovery 00000400:00000200:6.0:1644346421.543382:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.1.54@o2ib100 00000400:00000200:6.0:1644346421.543386:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.1.54@o2ib100 local destination 00000400:00000200:6.0:1644346421.543389:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.1.54@o2ib100 00000400:00000200:6.0:1644346421.543392:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.1.54@o2ib100(172.19.1.54@o2ib100:172.19.1.54@o2ib100) : GET try# 0 00000800:00000200:6.0:1644346421.543395:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.1.54@o2ib100 00000800:00000200:6.0:1644346421.543398:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000fd8faaad] -> 172.19.1.54@o2ib100 (3) version: 12 00000800:00000200:6.0:1644346421.543400:0:170148:0:(o2iblnd_cb.c:1519:kiblnd_launch_tx()) conn[000000002c12efa1] (20)++ 00000800:00000200:6.0:1644346421.543401:0:170148:0:(o2iblnd_cb.c:1265:kiblnd_queue_tx_locked()) conn[000000002c12efa1] (21)++ 00000800:00000200:6.0:1644346421.543406:0:170148:0:(o2iblnd_cb.c:1525:kiblnd_launch_tx()) conn[000000002c12efa1] (22)-- 00000400:00000200:6.0:1644346421.543407:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.1.54@o2ib100 00000400:00000200:6.0:1644346421.543408:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.54@o2ib100(00000000ff93ee65) state 0x34256 rc 0 00000400:00000200:6.0:1644346421.543410:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.55@o2ib100(000000006a0e6dce) state 0x36056 00000400:00000200:6.0:1644346421.543412:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.1.55@o2ib100 00000400:00000200:6.0:1644346421.543413:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.1.55@o2ib100 local destination 00000400:00000200:6.0:1644346421.543414:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.1.55@o2ib100 00000400:00000200:6.0:1644346421.543419:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.1.55@o2ib100(172.19.1.55@o2ib100:172.19.1.55@o2ib100) : GET try# 0 00000800:00000200:6.0:1644346421.543420:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.1.55@o2ib100 00000800:00000200:6.0:1644346421.543421:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000b495b0db] -> 172.19.1.55@o2ib100 (3) version: 12 00000800:00000200:6.0:1644346421.543422:0:170148:0:(o2iblnd_cb.c:1519:kiblnd_launch_tx()) conn[000000003720f48b] (20)++ 00000800:00000200:6.0:1644346421.543423:0:170148:0:(o2iblnd_cb.c:1265:kiblnd_queue_tx_locked()) conn[000000003720f48b] (21)++ 00000800:00000200:6.0:1644346421.543425:0:170148:0:(o2iblnd_cb.c:1525:kiblnd_launch_tx()) conn[000000003720f48b] (22)-- 00000400:00000200:6.0:1644346421.543426:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.1.55@o2ib100 00000400:00000200:6.0:1644346421.543426:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.55@o2ib100(000000006a0e6dce) state 0x34256 rc 0 00000400:00000200:6.0:1644346421.543427:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.1@o2ib100(0000000017e41955) state 0x36060 00000400:00000200:6.0:1644346421.543430:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.1@o2ib100 00000400:00000200:6.0:1644346421.543431:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.1@o2ib100 local destination 00000400:00000200:6.0:1644346421.543432:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.1@o2ib100 00000400:00000200:6.0:1644346421.543434:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.1@o2ib100(172.19.2.1@o2ib100:172.19.2.1@o2ib100) : GET try# 0 00000800:00000200:6.0:1644346421.543435:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.1@o2ib100 00000800:00000200:6.0:1644346421.543437:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000b7e97ead] -> 172.19.2.1@o2ib100 (2) version: 0 00000800:00000200:6.0:1644346421.543438:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000b7e97ead] -> 172.19.2.1@o2ib100 (2) version: 0 00000400:00000200:6.0:1644346421.543439:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.1@o2ib100 00000400:00000200:6.0:1644346421.543439:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.1@o2ib100(0000000017e41955) state 0x34260 rc 0 00000400:00000200:6.0:1644346421.543440:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.4@o2ib100(000000005d5e71b7) state 0x36060 00000400:00000200:6.0:1644346421.543443:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.4@o2ib100 00000400:00000200:6.0:1644346421.543443:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.4@o2ib100 local destination 00000400:00000200:6.0:1644346421.543444:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.4@o2ib100 00000400:00000200:6.0:1644346421.543446:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.4@o2ib100(172.19.2.4@o2ib100:172.19.2.4@o2ib100) : GET try# 0 00000800:00000200:6.0:1644346421.543447:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.4@o2ib100 00000800:00000200:6.0:1644346421.543448:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000004a830e13] -> 172.19.2.4@o2ib100 (2) version: 0 00000800:00000200:6.0:1644346421.543449:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000004a830e13] -> 172.19.2.4@o2ib100 (2) version: 0 00000400:00000200:6.0:1644346421.543450:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.4@o2ib100 00000400:00000200:6.0:1644346421.543450:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.4@o2ib100(000000005d5e71b7) state 0x34260 rc 0 00000400:00000200:6.0:1644346421.543451:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.6@o2ib100(0000000096bb0739) state 0x36060 00000400:00000200:6.0:1644346421.543453:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.6@o2ib100 00000400:00000200:6.0:1644346421.543455:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.6@o2ib100 local destination 00000400:00000200:6.0:1644346421.543456:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.6@o2ib100 00000400:00000200:6.0:1644346421.543458:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.6@o2ib100(172.19.2.6@o2ib100:172.19.2.6@o2ib100) : GET try# 0 00000800:00000200:6.0:1644346421.543458:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.6@o2ib100 00000800:00000200:6.0:1644346421.543460:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000006f0e0d3a] -> 172.19.2.6@o2ib100 (2) version: 0 00000800:00000200:6.0:1644346421.543460:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [000000006f0e0d3a] -> 172.19.2.6@o2ib100 (2) version: 0 00000400:00000200:6.0:1644346421.543461:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.6@o2ib100 00000800:00000200:30.2:1644346421.543462:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[000000002c12efa1] (21)++ 00000400:00000200:6.0:1644346421.543462:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.6@o2ib100(0000000096bb0739) state 0x34260 rc 0 00000400:00000200:6.0:1644346421.543463:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.7@o2ib100(000000002fde759c) state 0x36060 00000400:00000200:6.0:1644346421.543465:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.7@o2ib100 00000400:00000200:6.0:1644346421.543466:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.7@o2ib100 local destination 00000400:00000200:6.0:1644346421.543467:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.7@o2ib100 00000400:00000200:6.0:1644346421.543468:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.7@o2ib100(172.19.2.7@o2ib100:172.19.2.7@o2ib100) : GET try# 0 00000800:00000200:6.0:1644346421.543469:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.7@o2ib100 00000800:00000200:6.0:1644346421.543471:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000b34b27af] -> 172.19.2.7@o2ib100 (2) version: 0 00000800:00000200:6.0:1644346421.543471:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000b34b27af] -> 172.19.2.7@o2ib100 (2) version: 0 00000400:00000200:6.0:1644346421.543472:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.7@o2ib100 00000400:00000200:6.0:1644346421.543472:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.7@o2ib100(000000002fde759c) state 0x34260 rc 0 00000400:00000200:6.0:1644346421.543473:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.9@o2ib100(000000008a76668c) state 0x36060 00000400:00000200:6.0:1644346421.543476:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.9@o2ib100 00000400:00000200:6.0:1644346421.543476:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.9@o2ib100 local destination 00000400:00000200:6.0:1644346421.543477:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.9@o2ib100 00000400:00000200:6.0:1644346421.543479:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.9@o2ib100(172.19.2.9@o2ib100:172.19.2.9@o2ib100) : GET try# 0 00000800:00000200:6.0:1644346421.543480:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.9@o2ib100 00000800:00000200:6.0:1644346421.543481:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f182ce9e] -> 172.19.2.9@o2ib100 (2) version: 0 00000800:00000200:6.0:1644346421.543481:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000f182ce9e] -> 172.19.2.9@o2ib100 (2) version: 0 00000800:00000200:56.2:1644346421.543482:0:0:0:(o2iblnd_cb.c:3743:kiblnd_cq_completion()) conn[000000003720f48b] (21)++ 00000400:00000200:6.0:1644346421.543482:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.9@o2ib100 00000400:00000200:6.0:1644346421.543483:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.9@o2ib100(000000008a76668c) state 0x34260 rc 0 00000400:00000200:6.0:1644346421.543484:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.2.12@o2ib100(0000000061ad14f1) state 0x36060 00000400:00000200:6.0:1644346421.543488:0:170148:0:(lib-move.c:5016:LNetGet()) LNetGet -> 12345-172.19.2.12@o2ib100 00000400:00000200:6.0:1644346421.543489:0:170148:0:(lib-move.c:2637:lnet_handle_send_case_locked()) Source ANY to NMR: 172.19.2.12@o2ib100 local destination 00000400:00000200:6.0:1644346421.543490:0:170148:0:(lib-move.c:1818:lnet_handle_send()) rspt_next_hop_nid = 172.19.2.12@o2ib100 00000400:00000200:6.0:1644346421.543492:0:170148:0:(lib-move.c:1833:lnet_handle_send()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100:) -> 172.19.2.12@o2ib100(172.19.2.12@o2ib100:172.19.2.12@o2ib100) : GET try# 0 00000800:00000200:6.0:1644346421.543492:0:170148:0:(o2iblnd_cb.c:1638:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.19.2.12@o2ib100 00000800:00000200:6.0:1644346421.543494:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a77c653d] -> 172.19.2.12@o2ib100 (2) version: 0 00000800:00000200:6.0:1644346421.543494:0:170148:0:(o2iblnd.c:407:kiblnd_find_peer_locked()) got peer_ni [00000000a77c653d] -> 172.19.2.12@o2ib100 (2) version: 0 00000400:00000200:6.0:1644346421.543495:0:170148:0:(peer.c:3030:lnet_peer_send_ping()) peer 172.19.2.12@o2ib100 00000400:00000200:6.0:1644346421.543495:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.2.12@o2ib100(0000000061ad14f1) state 0x34260 rc 0 00000800:00000200:50.0:1644346421.543547:0:170144:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[000000002c12efa1] (22)++ 00000800:00000200:50.0:1644346421.543554:0:170144:0:(o2iblnd_cb.c:75:kiblnd_tx_done()) conn[000000002c12efa1] (23)-- 00000800:00000200:51.0:1644346421.543556:0:170147:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[000000003720f48b] (22)++ 00000400:00000200:50.0:1644346421.543556:0:170144:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:50.0:1644346421.543559:0:170144:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.54@o2ib100: GET: OK 00000400:00000200:50.0:1644346421.543560:0:170144:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:50.0:1644346421.543561:0:170144:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.1.54@o2ib100: 0 00000800:00000200:50.0:1644346421.543581:0:170144:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000002c12efa1] (22)-- 00000800:00000200:50.0:1644346421.543583:0:170144:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[000000002c12efa1] (21)++ 00000800:00000200:51.0:1644346421.543584:0:170147:0:(o2iblnd_cb.c:75:kiblnd_tx_done()) conn[000000003720f48b] (23)-- 00000800:00000200:50.0:1644346421.543584:0:170144:0:(o2iblnd_cb.c:343:kiblnd_handle_rx()) Received d1[2] from 172.19.1.54@o2ib100 00000400:00000200:51.0:1644346421.543586:0:170147:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:50.0:1644346421.543586:0:170144:0:(lib-move.c:4287:lnet_parse()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100) <- 172.19.1.54@o2ib100 : REPLY - for me 00000400:00000200:51.0:1644346421.543588:0:170147:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.55@o2ib100: GET: OK 00000400:00000200:50.0:1644346421.543590:0:170144:0:(lib-move.c:4115:lnet_parse_reply()) 172.19.1.137@o2ib100: Reply from 12345-172.19.1.54@o2ib100 of length 64/64 into md 0x585899 00000400:00000200:51.0:1644346421.543591:0:170147:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 5 00000400:00000200:51.0:1644346421.543592:0:170147:0:(peer.c:2482:lnet_discovery_event_send()) Ping Send to 172.19.1.55@o2ib100: 0 00000400:00000200:50.0:1644346421.543592:0:170144:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:50.0:1644346421.543593:0:170144:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.54@o2ib100: REPLY: OK 00000400:00000200:50.0:1644346421.543594:0:170144:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 0000000005560a4f 00000800:00000200:51.0:1644346421.543595:0:170147:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000003720f48b] (22)-- 00000800:00000200:51.0:1644346421.543596:0:170147:0:(o2iblnd_cb.c:3861:kiblnd_scheduler()) conn[000000003720f48b] (21)++ 00000800:00000200:51.0:1644346421.543598:0:170147:0:(o2iblnd_cb.c:343:kiblnd_handle_rx()) Received d1[2] from 172.19.1.55@o2ib100 00000400:00000200:50.0:1644346421.543599:0:170144:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 3 00000400:00000200:51.0:1644346421.543600:0:170147:0:(lib-move.c:4287:lnet_parse()) TRACE: 172.19.1.137@o2ib100(172.19.1.137@o2ib100) <- 172.19.1.55@o2ib100 : REPLY - for me 00000400:00000200:50.0:1644346421.543600:0:170144:0:(peer.c:2351:lnet_discovery_event_reply()) Peer 172.19.1.54@o2ib100 has discovery disabled 00000400:00000200:50.0:1644346421.543601:0:170144:0:(peer.c:2374:lnet_discovery_event_reply()) peer 172.19.1.54@o2ib100(00000000ff93ee65) not MR: DD disabled remotely 00000400:00000200:50.0:1644346421.543602:0:170144:0:(peer.c:2432:lnet_discovery_event_reply()) peer 172.19.1.54@o2ib100 data present 0. state = 0x34256 00000400:00000200:51.0:1644346421.543603:0:170147:0:(lib-move.c:4115:lnet_parse_reply()) 172.19.1.137@o2ib100: Reply from 12345-172.19.1.55@o2ib100 of length 64/64 into md 0x5858a1 00000400:00000200:50.0:1644346421.543603:0:170144:0:(router.c:457:lnet_router_discovery_ping_reply()) Discovery is disabled. Processing reply for gw: 172.19.1.54@o2ib100:3 00000400:00000200:51.0:1644346421.543605:0:170147:0:(lib-msg.c:1038:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0 00000400:00000200:51.0:1644346421.543606:0:170147:0:(lib-msg.c:860:lnet_health_check()) health check: 172.19.1.137@o2ib100->172.19.1.55@o2ib100: REPLY: OK 00000400:00000200:51.0:1644346421.543608:0:170147:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md 000000009263fd81 00000800:00000200:50.0:1644346421.543608:0:170144:0:(o2iblnd_cb.c:205:kiblnd_post_rx()) conn[000000002c12efa1] (22)++ 00000400:00000200:51.0:1644346421.543609:0:170147:0:(peer.c:2530:lnet_discovery_event_handler()) Received event: 3 00000400:00000200:51.0:1644346421.543610:0:170147:0:(peer.c:2351:lnet_discovery_event_reply()) Peer 172.19.1.55@o2ib100 has discovery disabled 00000800:00000200:50.0:1644346421.543610:0:170144:0:(o2iblnd_cb.c:239:kiblnd_post_rx()) conn[000000002c12efa1] (23)-- 00000800:00000200:50.0:1644346421.543610:0:170144:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000002c12efa1] (22)-- 00000400:00000200:51.0:1644346421.543611:0:170147:0:(peer.c:2374:lnet_discovery_event_reply()) peer 172.19.1.55@o2ib100(000000006a0e6dce) not MR: DD disabled remotely 00000800:00000200:50.0:1644346421.543611:0:170144:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000002c12efa1] (21)-- 00000400:00000200:51.0:1644346421.543612:0:170147:0:(peer.c:2432:lnet_discovery_event_reply()) peer 172.19.1.55@o2ib100 data present 0. state = 0x34256 00000400:00000200:6.0:1644346421.543612:0:170148:0:(peer.c:3276:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:51.0:1644346421.543613:0:170147:0:(router.c:457:lnet_router_discovery_ping_reply()) Discovery is disabled. Processing reply for gw: 172.19.1.55@o2ib100:3 00000800:00000200:50.0:1644346421.543613:0:170144:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000003720f48b] (22)-- 00000800:00000200:51.0:1644346421.543616:0:170147:0:(o2iblnd_cb.c:205:kiblnd_post_rx()) conn[000000003720f48b] (21)++ 00000800:00000200:51.0:1644346421.543617:0:170147:0:(o2iblnd_cb.c:239:kiblnd_post_rx()) conn[000000003720f48b] (22)-- 00000800:00000200:51.0:1644346421.543618:0:170147:0:(o2iblnd_cb.c:3877:kiblnd_scheduler()) conn[000000003720f48b] (21)-- 00000400:00000200:6.0:1644346421.543618:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.54@o2ib100(00000000ff93ee65) state 0x340d6 00000400:00000200:6.0:1644346421.543622:0:170148:0:(peer.c:2727:lnet_peer_merge_data()) peer 172.19.1.54@o2ib100 (00000000ff93ee65): 0 00000400:00000200:6.0:1644346421.543623:0:170148:0:(peer.c:2922:lnet_peer_data_present()) peer 172.19.1.54@o2ib100(00000000ff93ee65): 0. state = 0x34156 00000400:00000200:6.0:1644346421.543624:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.54@o2ib100(00000000ff93ee65) state 0x34156 rc 1 00000400:00000200:6.0:1644346421.543625:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.54@o2ib100(00000000ff93ee65) state 0x34156 00000400:00000200:6.0:1644346421.543626:0:170148:0:(peer.c:3086:lnet_peer_discovered()) peer 172.19.1.54@o2ib100 00000400:00000200:6.0:1644346421.543626:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.54@o2ib100(00000000ff93ee65) state 0x30116 rc 0 00000400:00000200:6.0:1644346421.543627:0:170148:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.1.54@o2ib100 00000400:00000200:6.0:1644346421.543629:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.55@o2ib100(000000006a0e6dce) state 0x340d6 00000400:00000200:6.0:1644346421.543630:0:170148:0:(peer.c:2727:lnet_peer_merge_data()) peer 172.19.1.55@o2ib100 (000000006a0e6dce): 0 00000400:00000200:6.0:1644346421.543631:0:170148:0:(peer.c:2922:lnet_peer_data_present()) peer 172.19.1.55@o2ib100(000000006a0e6dce): 0. state = 0x34156 00000400:00000200:6.0:1644346421.543632:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.55@o2ib100(000000006a0e6dce) state 0x34156 rc 1 00000400:00000200:6.0:1644346421.543632:0:170148:0:(peer.c:3388:lnet_peer_discovery()) peer 172.19.1.55@o2ib100(000000006a0e6dce) state 0x34156 00000400:00000200:6.0:1644346421.543633:0:170148:0:(peer.c:3086:lnet_peer_discovered()) peer 172.19.1.55@o2ib100 00000400:00000200:6.0:1644346421.543633:0:170148:0:(peer.c:3407:lnet_peer_discovery()) peer 172.19.1.55@o2ib100(000000006a0e6dce) state 0x30116 rc 0 00000400:00000200:6.0:1644346421.543634:0:170148:0:(peer.c:1929:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 172.19.1.55@o2ib100 Debug log: 2171 lines, 2171 kept, 0 dropped, 0 bad.