[LU-17432] add "slow start" to some CWARN/CERROR messages Created: 16/Jan/24  Updated: 20/Jan/24

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0, Lustre 2.16.0
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Andreas Dilger Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: easy

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

In some error cases, it is OK to have an occasional error (e.g. RPC timeout) that is handled transparently by RPC retry, but repeated errors on the local node or with the same peer indicates a more significant error.

It would be useful to re-enable some CWARN/CERROR messages that were quieted because they were too noisy, but now we are losing insight into problems on nodes that have continuous errors. There should be a new variant of CERROR/CWARN that have a "skip first N messages" parameter and then start printing to the console as normal.


Generated at Sat Feb 10 03:35:24 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.