Details

    • Improvement
    • Resolution: Unresolved
    • Major
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      When a client is decommissioned, it stays forever in the peer list of the servers and generates a stream of messages like:

      lnet_handle_recovery_reply()) peer NI (10.11.12.13@o2ib) recovery failed with -113
      lnet_handle_recovery_reply()) Skipped 1234 similar messages
      

      It is cleaner to remove it from the list of peers. It avoids the need to change the value of lnet_recovery_limit to remove LNetError messages about this removed client. Moreover, having this parameter changed can mask a problem on an active but faulty node.

      However, it can be cumbersome to remove it manually. That is why an automatic deletion could be relevant.

      This feature could use a parameter to enable it and to set the delay before a client is considered to be removed.

      Attachments

        Issue Links

          Activity

            People

              cbordage Cyril Bordage
              cbordage Cyril Bordage
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated: