Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16643

LNet health logging improvements

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 9223372036854775807

    Description

      Some improvements to LNet health logging.

      LNet health activity can generate noise in console logs. The NI/Peer
      NI recovery pings could be expected to fail and the related messages
      from lnet_handle_recovery_reply() are generally redundant.

      Improve this logging by having the lnet_monitor_thread() provide a
      summary of NIs in recovery.

      Another useful metric in spotting network trouble is if we have
      messages exceeding their deadline. We do not currently log this
      information. Keep a count of messages that have exceeded their
      deadline and track the total excess time. The lnet_monitor_thread()
      will then provide a summary of the number of messages and their
      average excess time at a regular interval. These stats are then
      reset when the monitor thread prints this information to the console.

      Because NIs can be in recovery for extended periods of time, the
      interval of these console updates will increase from 1 to 5 minutes.
      The interval is reset when it is detected that there are no longer any
      NIs in recovery and there haven't been any messages past their
      deadline since the last console update.

      Attachments

        Activity

          People

            hornc Chris Horn
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: