Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13782

LNet Routers should monitor the ni_fatal flag to inform peers of changes to route status

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.14.0
    • None
    • None
    • 9223372036854775807

    Description

      commit 1e16d48a23784c7b98ab0653c54852f062dc2418
      Author: Tatsushi Takamura <takamr.tatsushi@jp.fujitsu.com>
      Date:   Mon Jun 3 10:11:24 2019 +0900
      
          LU-12287 lnet: handling device failure by IB event handler
      

      The above commit allows o2iblnd to handle device failure events. When it receives those events it sets or clears the ni_fatal_error_on flag of the associated lnet_ni object. If this flag is set, then the NI is inoperable. LNet routers ought to monitor for when this flag is set or cleared so that they can push that information to peers. This will allow peers to update their route status appropriately.

      When the ni_fatal flag is set, the associated interface is inoperable, so pushes to any peers on that network will fail (unless the router has another path). It might also be worth looking at whether there is a smarter way to determine which peers should be pushed to.

      Attachments

        Activity

          [LU-13782] LNet Routers should monitor the ni_fatal flag to inform peers of changes to route status
          pjones Peter Jones added a comment -

          Landed for 2.14

          pjones Peter Jones added a comment - Landed for 2.14

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39353/
          Subject: LU-13782 lnet: Have LNet routers monitor the ni_fatal flag
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 7e0ec0f809ea1e0eda3c0fd804273bdaf0dc2b03

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39353/ Subject: LU-13782 lnet: Have LNet routers monitor the ni_fatal flag Project: fs/lustre-release Branch: master Current Patch Set: Commit: 7e0ec0f809ea1e0eda3c0fd804273bdaf0dc2b03

          Chris Horn (chris.horn@hpe.com) uploaded a new patch: https://review.whamcloud.com/39353
          Subject: LU-13782 lnet: Have LNet routers monitor the ni_fatal flag
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 6e2e44060d7a6ce260a8107152c4aefa12e30688

          gerrit Gerrit Updater added a comment - Chris Horn (chris.horn@hpe.com) uploaded a new patch: https://review.whamcloud.com/39353 Subject: LU-13782 lnet: Have LNet routers monitor the ni_fatal flag Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 6e2e44060d7a6ce260a8107152c4aefa12e30688

          People

            hornc Chris Horn
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: