[LU-13714] LNet Router: ni status flip flopping unnecessarily Created: 24/Jun/20  Updated: 29/Aug/22  Resolved: 03/Apr/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Minor
Reporter: Amir Shehata (Inactive) Assignee: Amir Shehata (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Each peer with a route discovers the gateway specified in the route. The gateway updates its NI status to UP when it receives a message. If it doesn't receive a message on one of its NIs for a timeout period then it brings the NI status down. This helps other peers to decide whether to use the route or not if their asym_router is set to 1.

However there is a flaw in the logic. The NI status is set to UP on any message received.  This is problematic because the gateway pushes an update when it's NI status goes down. This PUSH message gets an ACK from the peer without the route. That ACK ends up setting the NI status to UP. So we endup toggling the NI status.

Because of this flaw, the other peers see the route go down and up almost immediately, even though the other side has removed its route.

The gateway's NI status should be set to UP only when it receives a discovery GET message. Basically a GET on the RESERVED portal.

This solution is not entirely fool proof. A manually triggered discovery from the peer to the gateway will bring the NI status UP, because the gateway can not make a distinction between a manual discovery or a gateway alive discovery. However, this is the best we can do.



 Comments   
Comment by Gerrit Updater [ 25/Jun/20 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39176
Subject: LU-13714 lnet: update gateway NI status on discovery
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: be1b3fb860f4ee8cdd5e3ffefca18d91fe183233

Comment by Gerrit Updater [ 03/Apr/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/39176/
Subject: LU-13714 lnet: only update gateway NI status on discovery
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3e3f70eb1ec95f32d9a97795d7fdf02cca82b5a0

Comment by Peter Jones [ 03/Apr/22 ]

Landed for 2.15

Generated at Sat Feb 10 03:03:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.