Align LNet routing with Multi-Rail and LNet health (LU-11297)

[LU-11300] LNet: Router Aliveness and Health Created: 30/Aug/18  Updated: 22/Apr/23  Resolved: 10/Jun/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.14.0

Type: Technical task Priority: Minor
Reporter: Amir Shehata (Inactive) Assignee: Amir Shehata (Inactive)
Resolution: Fixed Votes: 0
Labels: lnet-health, lnet-router

Issue Links:
Related
is related to LU-14069 OBD_FAIL_LDLM_CANCEL_BL_CB_RACE is bu... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

Routers and gateway peer_nis are special cased peers that maintain their own aliveness and health status. Consolidate that with LNet Health.



 Comments   
Comment by Gerrit Updater [ 18/Sep/18 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33185
Subject: LU-11300 lnet: router aliveness
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set: 1
Commit: 9039581cea98e9c86ae886abd8d81631e0a9d620

Comment by Gerrit Updater [ 18/Sep/18 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33186
Subject: LU-11300 lnet: peer aliveness
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set: 1
Commit: 022b1647a384b1779fb7f6934e9d1b5ee08a4cce

Comment by James A Simmons [ 18/Sep/18 ]

Does this replace LU-5570?

Comment by Amir Shehata (Inactive) [ 18/Sep/18 ]

It should. This is still in development/testing. But the idea is to use the health infrastructure instead of timestamps to detect when routes are down and switch to a different interface on the same router, or a different router all together.

Comment by Gerrit Updater [ 05/Oct/18 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33298
Subject: LU-11300 lnet: consider router_check_interval
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set: 1
Commit: 0cfc60da16fa7368b220edf6db1e2ecca1fe34b8

Comment by Gerrit Updater [ 05/Oct/18 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33302
Subject: LU-11300 lnet: use lnet_is_peer_ni_alive()
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set: 1
Commit: c424cce4c3048b51ac6f8f43f7603fa7dc39b9c0

Comment by Gerrit Updater [ 05/Oct/18 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33303
Subject: LU-11300 lnet: start with peer down
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set: 1
Commit: 51dc3745dfde4350b3d433a3a51d73c54c246d77

Comment by Gerrit Updater [ 23/Oct/18 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33449
Subject: LU-11300 lnet: router sensitivity
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set: 1
Commit: 2c72f148afe2112d744c8abe85238d0f2a964410

Comment by Gerrit Updater [ 23/Oct/18 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33450
Subject: LU-11300 lnet: cache ni status
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set: 1
Commit: 88353d9757409a7063f30f4511b255960e1e237c

Comment by Gerrit Updater [ 23/Oct/18 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33451
Subject: LU-11300 lnet: Cache the routing feature
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set: 1
Commit: ead873c69e5872f5fa8ffb3e4cef0510e7407c98

Comment by Gerrit Updater [ 23/Oct/18 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33452
Subject: LU-11300 lnet: simplify lnet_handle_local_failure()
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set: 1
Commit: 7f0caad0cdc0c88df01da3f9830d9cca4fc35a5f

Comment by Gerrit Updater [ 23/Oct/18 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33455
Subject: LU-11300 lnet: configure lnet router senstivity
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set: 1
Commit: f6e59815fe6807bd97f74f6ec692d86da49112fb

Comment by Gerrit Updater [ 07/Jun/19 ]

Amir Shehata (ashehata@whamcloud.com) merged in patch https://review.whamcloud.com/33449/
Subject: LU-11300 lnet: router sensitivity
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set:
Commit: 2b59dae54efc23066f33c4c19f945568de2ee3b2

Comment by Gerrit Updater [ 07/Jun/19 ]

Amir Shehata (ashehata@whamcloud.com) merged in patch https://review.whamcloud.com/33455/
Subject: LU-11300 lnet: configure lnet router senstivity
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set:
Commit: aef3d58d585ee818b405b5ff197b7a98b6c5157d

Comment by Gerrit Updater [ 07/Jun/19 ]

Amir Shehata (ashehata@whamcloud.com) merged in patch https://review.whamcloud.com/33450/
Subject: LU-11300 lnet: cache ni status
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set:
Commit: 398f4071dc17c83e6ac1600174b46e2675579ce7

Comment by Gerrit Updater [ 07/Jun/19 ]

Amir Shehata (ashehata@whamcloud.com) merged in patch https://review.whamcloud.com/33451/
Subject: LU-11300 lnet: Cache the routing feature
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set:
Commit: d65a7b8727ee0c80ecfcc6f8ba952b38ae9e5962

Comment by Gerrit Updater [ 07/Jun/19 ]

Amir Shehata (ashehata@whamcloud.com) merged in patch https://review.whamcloud.com/33186/
Subject: LU-11300 lnet: peer aliveness
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set:
Commit: 8e498d3f23ea9bcbef524153c6613f93a6229431

Comment by Gerrit Updater [ 07/Jun/19 ]

Amir Shehata (ashehata@whamcloud.com) merged in patch https://review.whamcloud.com/33185/
Subject: LU-11300 lnet: router aliveness
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set:
Commit: 21d2252648bea9edb107292c4a720ff9ab557748

Comment by Gerrit Updater [ 07/Jun/19 ]

Amir Shehata (ashehata@whamcloud.com) merged in patch https://review.whamcloud.com/33452/
Subject: LU-11300 lnet: simplify lnet_handle_local_failure()
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set:
Commit: f8c7dd6f53748cf589b2a1f18d93b92761f9d983

Comment by Gerrit Updater [ 07/Jun/19 ]

Amir Shehata (ashehata@whamcloud.com) merged in patch https://review.whamcloud.com/33298/
Subject: LU-11300 lnet: consider alive_router_check_interval
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set:
Commit: 434456256f30c33d36a3968ea5f24495e5413c62

Comment by Joseph Gmitter (Inactive) [ 10/Jun/19 ]

Work has landed as part of the MR Routing merge commit: https://review.whamcloud.com/#/c/34983/

Comment by Gerrit Updater [ 28/Jan/20 ]

Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/37337
Subject: LU-11300 lnet: remove lnd_query interface.
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c5b97d7ff88611a0771a2e66f0f365f0d120aec5

Comment by Gerrit Updater [ 08/Feb/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37337/
Subject: LU-11300 lnet: remove lnd_query interface.
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0d816af574b7063c0ce339b67d2066b229d20f59

Comment by Gerrit Updater [ 24/Apr/20 ]

Olaf Faaland-LLNL (faaland1@llnl.gov) uploaded a new patch: https://review.whamcloud.com/38353
Subject: LU-11300 lnet: simplify lnet_handle_local_failure()
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: ba5daaf6b8e828e92693c8cb64896a26f92d035c

Comment by Gerrit Updater [ 24/Apr/20 ]

Olaf Faaland-LLNL (faaland1@llnl.gov) uploaded a new patch: https://review.whamcloud.com/38356
Subject: LU-11300 lnet: simplify lnet_handle_local_failure()
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 64ff8b82a1a08503b464e6e51697a0ed72255ecc

Generated at Sat Feb 10 02:42:41 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.