Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17519

Remove dead peers automatically

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      When a client is decommissioned, it stays forever in the peer list of the servers and generates a stream of messages like:

      lnet_handle_recovery_reply()) peer NI (10.11.12.13@o2ib) recovery failed with -113
      lnet_handle_recovery_reply()) Skipped 1234 similar messages
      

      It is cleaner to remove it from the list of peers. It avoids the need to change the value of lnet_recovery_limit to remove LNetError messages about this removed client. Moreover, having this parameter changed can mask a problem on an active but faulty node.

      However, it can be cumbersome to remove it manually. That is why an automatic deletion could be relevant.

      This feature could use a parameter to enable it and to set the delay before a client is considered to be removed.

      Attachments

        Activity

          People

            cbordage Cyril Bordage
            cbordage Cyril Bordage
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated: