Details
-
Improvement
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.14.0, Lustre 2.16.0
-
3
-
9223372036854775807
Description
In some error cases, it is OK to have an occasional error (e.g. RPC timeout) that is handled transparently by RPC retry, but repeated errors on the local node or with the same peer indicates a more significant error.
It would be useful to re-enable some CWARN/CERROR messages that were quieted because they were too noisy, but now we are losing insight into problems on nodes that have continuous errors. There should be a new variant of CERROR/CWARN that have a "skip first N messages" parameter and then start printing to the console as normal.