Details
-
Improvement
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
We do not have good data on the effectiveness of the LNet health resend feature. I propose that we instrument the code to track completed and expired resends, so we could see how many times we actually complete a network transaction via retry mechanism vs. how many times the transaction fails despite resends. This data can then be used to improve the feature.
"Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57491
Subject: LU-18555 lnet: Track successful and failed resends
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 008765975d4509626840a7e660eae493c6a05651