Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13782

LNet Routers should monitor the ni_fatal flag to inform peers of changes to route status

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.14.0
    • None
    • None
    • 9223372036854775807

    Description

      commit 1e16d48a23784c7b98ab0653c54852f062dc2418
      Author: Tatsushi Takamura <takamr.tatsushi@jp.fujitsu.com>
      Date:   Mon Jun 3 10:11:24 2019 +0900
      
          LU-12287 lnet: handling device failure by IB event handler
      

      The above commit allows o2iblnd to handle device failure events. When it receives those events it sets or clears the ni_fatal_error_on flag of the associated lnet_ni object. If this flag is set, then the NI is inoperable. LNet routers ought to monitor for when this flag is set or cleared so that they can push that information to peers. This will allow peers to update their route status appropriately.

      When the ni_fatal flag is set, the associated interface is inoperable, so pushes to any peers on that network will fail (unless the router has another path). It might also be worth looking at whether there is a smarter way to determine which peers should be pushed to.

      Attachments

        Activity

          People

            hornc Chris Horn
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: