[LU-11144] Dynamic Discovery is not triggered for router peers Created: 11/Jul/18  Updated: 27/Jan/23  Resolved: 27/Jan/23

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Chris Horn Assignee: Amir Shehata (Inactive)
Resolution: Fixed Votes: 0
Labels: LNet

Issue Links:
Related
is related to LU-11297 Align LNet routing with Multi-Rail an... Resolved
is related to LU-11143 Multi-Rail/Dynamic Discovery break LN... Closed
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

It appears to me that dynamic discovery will not be triggered for LNet routers under normal operating conditions (i.e. without an explicit lnetctl discover of the routers).

The discovery logic is early in lnet_select_pathway() and based on the peer associated with the dst_nid. The only LNetGet where dst_nid is a router is going to be the router checker traffic. But that traffic goes on the LNET_RESERVED_PORTAL, so it will never trigger discovery.
Seems like we need to check for discovery of the router peer after selecting it later in lnet_select_pathway().



 Comments   
Comment by Chris Horn [ 12/Jul/18 ]

I think what I was missing with router peers and dynamic discovery is the DD push
router should queue other peers for discovery based on where it is forwarding traffic, and the discovery thread should push the router's MR info to those peers as part of that. Right?

So while the description in this ticket is, I believe, accurate it's probably by design and not necessarily a problem.

Comment by Amir Shehata (Inactive) [ 08/Aug/18 ]

both LU-11143 and LU-11144 are related.

I address them here:

https://wiki.whamcloud.com/display/LNet/Routing+and+MR+integration

Might be a good idea to use that link for feedback on the proposals

Comment by Cory Spitz [ 26/Jul/19 ]

ashehata, ready to resolve this and LU-11143?

Comment by Amir Shehata (Inactive) [ 26/Jul/19 ]

I believe this issue has been resolved in the new routing code.

Comment by Cory Spitz [ 26/Aug/19 ]

ashehata, will you be resolving this issue then? Can you point at a specific commit or LU that resolved it?

Comment by Chris Horn [ 27/Jan/23 ]

Resolved with MR routing feature

Generated at Sat Feb 10 02:41:19 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.