Details
-
Improvement
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
When a client is decommissioned, it stays forever in the peer list of the servers and generates a stream of messages like:
lnet_handle_recovery_reply()) peer NI (10.11.12.13@o2ib) recovery failed with -113 lnet_handle_recovery_reply()) Skipped 1234 similar messages
It is cleaner to remove it from the list of peers. It avoids the need to change the value of lnet_recovery_limit to remove LNetError messages about this removed client. Moreover, having this parameter changed can mask a problem on an active but faulty node.
However, it can be cumbersome to remove it manually. That is why an automatic deletion could be relevant.
This feature could use a parameter to enable it and to set the delay before a client is considered to be removed.
Attachments
Issue Links
- is related to
-
LU-14654 Need to check if lnet_recovery_limit is non-zero in lnet_peer_ni_add_to_recoveryq_locked()
-
- Resolved
-