[LU-13777] Discovery push needs to account for fixed src nid when selecting destination nid Created: 11/Jul/20  Updated: 27/Jan/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Chris Horn Assignee: Amir Shehata (Inactive)
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The patch https://review.whamcloud.com/38320:

71ca66bcd9 LU-13471 lnet: use the same src nid for discovery

fixes the source NID that is used for a discovery push, but we need to account for that when selecting a destination NID. Otherwise we may choose a destination NID on a different NET than the source NID.

For example:

00000400:00000080:10.0:1594492733.653168:0:18193:0:(module.c:90:libcfs_ioctl()) libcfs ioctl cmd 3221775693
00000400:00000200:10.0:1594492733.653174:0:18193:0:(peer.c:1927:lnet_peer_queue_for_discovery()) Queue peer 624@gni99: 0
00000400:00000200:13.0:1594492733.653222:0:3629:0:(peer.c:3414:lnet_peer_discovery()) peer 624@gni99(ffff880835bdca00) state 0x6051
00000400:00000200:13.0:1594492733.653231:0:3629:0:(lib-move.c:5100:LNetGet()) LNetGet -> 12345-624@gni
00000400:00000200:13.0:1594492733.653237:0:3629:0:(lib-move.c:2658:lnet_handle_send_case_locked()) Source ANY to MR:  624@gni local destination
00000400:00000200:13.0:1594492733.653256:0:3629:0:(lib-move.c:1436:lnet_select_peer_ni()) sd_best_lpni = 624@gni99
00000400:00000200:13.0:1594492733.653270:0:3629:0:(lib-move.c:1858:lnet_handle_send()) rspt_next_hop_nid = 624@gni99
00000400:00000200:13.0:1594492733.653271:0:3629:0:(lib-move.c:1873:lnet_handle_send()) TRACE: 593@gni99(593@gni99:<?>) -> 624@gni99(624@gni:624@gni99) <?> : GET try# 0

00000400:00000200:13.0:1594492733.653286:0:3629:0:(peer.c:3053:lnet_peer_send_ping()) peer 624@gni99
00000400:00000200:13.0:1594492733.653287:0:3629:0:(peer.c:3433:lnet_peer_discovery()) peer 624@gni99(ffff880835bdca00) state 0x4251 rc 0
00000400:00000200:13.0:1594492733.653565:0:3629:0:(peer.c:3414:lnet_peer_discovery()) peer 624@gni99(ffff880835bdca00) state 0x40d1
00000400:00000200:13.0:1594492733.653568:0:3629:0:(peer.c:2683:lnet_peer_merge_data()) cur lpni 624@gni status 363512030
00000400:00000200:13.0:1594492733.653569:0:3629:0:(peer.c:2683:lnet_peer_merge_data()) cur lpni 624@gni99 status 363512030
00000400:00000200:13.0:1594492733.653571:0:3629:0:(peer.c:2750:lnet_peer_merge_data()) peer 624@gni99 (ffff880835bdca00): 0
00000400:00000200:13.0:1594492733.653572:0:3629:0:(peer.c:2945:lnet_peer_data_present()) peer 624@gni99(ffff880835bdca00): 0. state = 0x4151
00000400:00000200:13.0:1594492733.653573:0:3629:0:(peer.c:3433:lnet_peer_discovery()) peer 624@gni99(ffff880835bdca00) state 0x4151 rc 1
00000400:00000200:13.0:1594492733.653574:0:3629:0:(peer.c:3414:lnet_peer_discovery()) peer 624@gni99(ffff880835bdca00) state 0x4151
00000400:00000200:13.0:1594492733.653577:0:3629:0:(lib-move.c:4879:LNetPut()) LNetPut -> 12345-624@gni
00000400:00000200:13.0:1594492733.653579:0:3629:0:(lib-move.c:2658:lnet_handle_send_case_locked()) Source Specified: 593@gni99 to MR:  624@gni local destination
00000400:00000200:13.0:1594492733.653581:0:3629:0:(lib-move.c:1858:lnet_handle_send()) rspt_next_hop_nid = 624@gni
00000400:00000200:13.0:1594492733.653583:0:3629:0:(lib-move.c:1873:lnet_handle_send()) TRACE: 593@gni99(593@gni99:593@gni99) -> 624@gni(624@gni:624@gni) <?> : PUT try# 0


 Comments   
Comment by Gerrit Updater [ 12/Jul/20 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39341
Subject: LU-13777 lnet: select reachable peer ni for discovery
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ae088a03c24333b7bf05da7bdb04790d86e23eac

Comment by Chris Horn [ 27/Jan/23 ]

This issue can be closed with the landing of https://review.whamcloud.com/43507 under LU-14660

Generated at Sat Feb 10 03:04:07 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.