Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18380

LNet: set NI health to 0 on fatal error

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Upon detection of a "link down" event, the "fatal" flag is set on the corresponding NI marking it as ineligible for selection on tx, but at the same time the NIs health score remains unchanged. This means that as soon as the link comes back up, the "fatal" flag is removed making the NI available for selection with the same health score as before the "link down" event.

      It is proposed to slow down the reintroduction the NI into the tx selection in this case by degrading its health score to 0 on a "fatal" event. This would force NI to go through local recovery before its score is restored to maximum level. This can help the system stay more resilient when dealing with "link flapping".

       

      Attachments

        Activity

          People

            ssmirnov Serguei Smirnov
            ssmirnov Serguei Smirnov
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: