Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.12.0
-
clients and routers: Lustre 2.12.0_1.chaos
lustre servers: Lustre 2.10.6_2.chaos
Linux version 3.10.0-957.1.3.1chaos.ch6.x86_64
Clients OmniPath <-> routers <-> Servers mlx5
-
3
-
9223372036854775807
Description
Over the span of about 20 minutes, routers reported the following in their console logs:
2019-02-19 10:05:02 [330235.278414] LNetError: 33048:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 2)
2019-02-19 10:05:02 [330235.294305] LNetError: 33048:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1646 previous similar messages
While the lustre servers were being rebooted.
(0, 2) corresponds to:
msg->msg_ev.status == 0 (success)
msg->msg_health_status == 2 (LNET_MSG_STATUS_LOCAL_DROPPED)
See https://github.com/LLNL/lustre/releases for contents of 2.12.0_1.chaos.
Landed for 2.12.4. Not needed on master