[LU-16643] LNet health logging improvements Created: 15/Mar/23  Updated: 04/Apr/23  Resolved: 04/Apr/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Improvement Priority: Minor
Reporter: Chris Horn Assignee: Chris Horn
Resolution: Fixed Votes: 0
Labels: None

Rank (Obsolete): 9223372036854775807

 Description   

Some improvements to LNet health logging.

LNet health activity can generate noise in console logs. The NI/Peer
NI recovery pings could be expected to fail and the related messages
from lnet_handle_recovery_reply() are generally redundant.

Improve this logging by having the lnet_monitor_thread() provide a
summary of NIs in recovery.

Another useful metric in spotting network trouble is if we have
messages exceeding their deadline. We do not currently log this
information. Keep a count of messages that have exceeded their
deadline and track the total excess time. The lnet_monitor_thread()
will then provide a summary of the number of messages and their
average excess time at a regular interval. These stats are then
reset when the monitor thread prints this information to the console.

Because NIs can be in recovery for extended periods of time, the
interval of these console updates will increase from 1 to 5 minutes.
The interval is reset when it is detected that there are no longer any
NIs in recovery and there haven't been any messages past their
deadline since the last console update.



 Comments   
Comment by Gerrit Updater [ 15/Mar/23 ]

"Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50305
Subject: LU-16643 lnet: Health logging improvements
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b7a9169a80c63763a23f9b92dc150f6a5ed8e078

Comment by Gerrit Updater [ 04/Apr/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50305/
Subject: LU-16643 lnet: Health logging improvements
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0cb3d86c4004d75810c54bb897ad7fbb6d5ec05f

Comment by Peter Jones [ 04/Apr/23 ]

Landed for 2.16

Generated at Sat Feb 10 03:28:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.