Align LNet routing with Multi-Rail and LNet health (LU-11297)

[LU-11478] LNet: discovery sequence numbers could be misleading Created: 05/Oct/18  Updated: 12/Oct/19  Resolved: 10/Jun/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.13.0, Lustre 2.12.3

Type: Technical task Priority: Minor
Reporter: Amir Shehata (Inactive) Assignee: Amir Shehata (Inactive)
Resolution: Fixed Votes: 0
Labels: lnet-health, lnet-router

Issue Links:
Duplicate
is duplicated by LU-12197 incorrect peer nids with discovery en... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

There is a sequence number used when sending discovery messages. This sequence number is intended to detect stale messages. However it could be misleading if the peer reboots. In this case the peer's sequence number will reset. The node will think that all information being sent to it is stale, while in reality the peer might've changed configuration.

There is no reliable why to know whether a peer rebooted, so we'll always assume that the messages we're receiving are valid. So we'll operate on the first come first serve basis.



 Comments   
Comment by Gerrit Updater [ 05/Oct/18 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33304
Subject: LU-11478 lnet: misleading discovery seqno.
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set: 1
Commit: 09146925f301c951513ea70fd6f80ad544a73f97

Comment by Gerrit Updater [ 07/Jun/19 ]

Amir Shehata (ashehata@whamcloud.com) merged in patch https://review.whamcloud.com/33304/
Subject: LU-11478 lnet: misleading discovery seqno.
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set:
Commit: 42d999ed8f6113724b1ac103b832d5b74b878d55

Comment by Joseph Gmitter (Inactive) [ 10/Jun/19 ]

Work has landed as part of the MR Routing merge commit: https://review.whamcloud.com/#/c/34983/

Comment by Gerrit Updater [ 03/Sep/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36041
Subject: LU-11478 lnet: misleading discovery seqno.
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 139abfc32612018eea7cc4b882fbfcd7f1f643e1

Comment by Gerrit Updater [ 08/Oct/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36041/
Subject: LU-11478 lnet: misleading discovery seqno.
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: da9998767a9093c088d28119179ee591f42910dc

Generated at Sat Feb 10 02:44:14 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.