[LU-12439] Convert "Response timed out..." in lnet_finalize_expired_responses to CDEBUG Created: 14/Jun/19 Updated: 30/Jul/19 Resolved: 30/Jul/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.13.0 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Chris Horn | Assignee: | Chris Horn |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
I've noticed that this error message causes a lot of noise when a router node goes down. For example: I rebooted two routers on a system with just 40 compute nodes. The first of these error messages popped up about 1 minute or so after I initiated a reboot of the routers: Reboot started at - Fri Jun 14 14:34:35 CDT 2019 saturn-smw:/var/opt/cray/log/p2-current # grep -m 1 lnet_finalize_expired_responses console-20190614 2019-06-14T14:35:33.704182-05:00 c0-1c1s9n3 LNet: 10316:0:(lib-move.c:2888:lnet_finalize_expired_responses()) Response timed out: md = ffff8810119a32a8: nid = 485@gni4 In the time it took the routers to reboot, about 8 minutes, there were 797 entries from lnet_finalize_expired_responses in the console log: saturn-smw:/var/opt/cray/log/p2-current # grep -c lnet_finalize_expired_responses console-20190614 797 saturn-smw:/var/opt/cray/log/p2-current # I don't see much value from this message for system administrators, so I think it should be converted to a CDEBUG |
| Comments |
| Comment by Gerrit Updater [ 14/Jun/19 ] |
|
Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/35233 |
| Comment by Gerrit Updater [ 30/Jul/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35233/ |
| Comment by Peter Jones [ 30/Jul/19 ] |
|
Landed for 2.13 |