Details
-
Improvement
-
Resolution: Fixed
-
Minor
-
None
-
None
-
9223372036854775807
Description
commit 1e16d48a23784c7b98ab0653c54852f062dc2418 Author: Tatsushi Takamura <takamr.tatsushi@jp.fujitsu.com> Date: Mon Jun 3 10:11:24 2019 +0900 LU-12287 lnet: handling device failure by IB event handler
The above commit allows o2iblnd to handle device failure events. When it receives those events it sets or clears the ni_fatal_error_on flag of the associated lnet_ni object. If this flag is set, then the NI is inoperable. LNet routers ought to monitor for when this flag is set or cleared so that they can push that information to peers. This will allow peers to update their route status appropriately.
When the ni_fatal flag is set, the associated interface is inoperable, so pushes to any peers on that network will fail (unless the router has another path). It might also be worth looking at whether there is a smarter way to determine which peers should be pushed to.
Landed for 2.14