[LU-13782] LNet Routers should monitor the ni_fatal flag to inform peers of changes to route status Created: 13/Jul/20  Updated: 07/Aug/20  Resolved: 07/Aug/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.14.0

Type: Improvement Priority: Minor
Reporter: Chris Horn Assignee: Chris Horn
Resolution: Fixed Votes: 0
Labels: None

Rank (Obsolete): 9223372036854775807

 Description   
commit 1e16d48a23784c7b98ab0653c54852f062dc2418
Author: Tatsushi Takamura <takamr.tatsushi@jp.fujitsu.com>
Date:   Mon Jun 3 10:11:24 2019 +0900

    LU-12287 lnet: handling device failure by IB event handler

The above commit allows o2iblnd to handle device failure events. When it receives those events it sets or clears the ni_fatal_error_on flag of the associated lnet_ni object. If this flag is set, then the NI is inoperable. LNet routers ought to monitor for when this flag is set or cleared so that they can push that information to peers. This will allow peers to update their route status appropriately.

When the ni_fatal flag is set, the associated interface is inoperable, so pushes to any peers on that network will fail (unless the router has another path). It might also be worth looking at whether there is a smarter way to determine which peers should be pushed to.



 Comments   
Comment by Gerrit Updater [ 13/Jul/20 ]

Chris Horn (chris.horn@hpe.com) uploaded a new patch: https://review.whamcloud.com/39353
Subject: LU-13782 lnet: Have LNet routers monitor the ni_fatal flag
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6e2e44060d7a6ce260a8107152c4aefa12e30688

Comment by Gerrit Updater [ 07/Aug/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39353/
Subject: LU-13782 lnet: Have LNet routers monitor the ni_fatal flag
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 7e0ec0f809ea1e0eda3c0fd804273bdaf0dc2b03

Comment by Peter Jones [ 07/Aug/20 ]

Landed for 2.14

Generated at Sat Feb 10 03:04:10 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.