[LU-16563] LNet: use discovery ni status to set peer ni availability Created: 16/Feb/23  Updated: 25/Apr/23  Resolved: 11/Apr/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Improvement Priority: Minor
Reporter: Serguei Smirnov Assignee: Serguei Smirnov
Resolution: Fixed Votes: 0
Labels: Multi-Rail, lnet

Rank (Obsolete): 9223372036854775807

 Description   

Currently when MR peer is being discovered, it replies with the list of its NIs and their status.  Even if NI is "down" due to a "fatal" condition like locally detected "link down", it is listed with "UP" status in the reply, so the recipient can find out that the NI is not reachable only by trying to communicate to it and failing.

Instead, to avoid unnecessary delay in this scenario, NI status can be tracked such that locally recognized "down" state is available to any discovering peer.



 Comments   
Comment by Gerrit Updater [ 16/Feb/23 ]

"Serguei Smirnov <ssmirnov@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50027
Subject: LU-16563 lnet: use discovered ni status to set initial health
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 2244b767ff0920c0ce488b5147e92744ed462c24

Comment by Gerrit Updater [ 02/Mar/23 ]

"Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50188
Subject: LU-16563 tests: Check peer NI health after link down
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e4b96c5512c3d2d2c5bda49d23cb445b05eb9678

Comment by Gerrit Updater [ 28/Mar/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50027/
Subject: LU-16563 lnet: use discovered ni status to set initial health
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: da230373bd14306cb97fb48748ebce205f09d468

Comment by Gerrit Updater [ 11/Apr/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50188/
Subject: LU-16563 tests: Check peer NI health after link down
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e82e57414d324c1065ddbdaef5baab2ec5b42026

Comment by Peter Jones [ 11/Apr/23 ]

Landed for 2.16

Generated at Sat Feb 10 03:28:06 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.