Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.12.0
-
clients and routers: Lustre 2.12.0_1.chaos
lustre servers: Lustre 2.10.6_2.chaos
Linux version 3.10.0-957.1.3.1chaos.ch6.x86_64
Clients OmniPath <-> routers <-> Servers mlx5
-
3
-
9223372036854775807
Description
Over the span of about 20 minutes, routers reported the following in their console logs:
2019-02-19 10:05:02 [330235.278414] LNetError: 33048:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 2)
2019-02-19 10:05:02 [330235.294305] LNetError: 33048:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1646 previous similar messages
While the lustre servers were being rebooted.
(0, 2) corresponds to:
msg->msg_ev.status == 0 (success)
msg->msg_health_status == 2 (LNET_MSG_STATUS_LOCAL_DROPPED)
See https://github.com/LLNL/lustre/releases for contents of 2.12.0_1.chaos.
Attachments
Activity
Labels | Original: llnl topllnl | New: llnl |
Link | Original: This issue is related to JFC-27 [ JFC-27 ] |
Link | New: This issue is related to JFC-20 [ JFC-20 ] |
Link | Original: This issue is related to JFC-21 [ JFC-21 ] |
Fix Version/s | New: Lustre 2.12.4 [ 14690 ] | |
Resolution | New: Fixed [ 1 ] | |
Status | Original: Open [ 1 ] | New: Resolved [ 5 ] |
Link | New: This issue is related to JFC-27 [ JFC-27 ] |
Link | New: This issue is related to JFC-21 [ JFC-21 ] |
Labels | Original: llnl | New: llnl topllnl |
Attachment | New: dk.opal190.1550688817.txt.gz [ 32040 ] |
Assignee | Original: WC Triage [ wc-triage ] | New: Amir Shehata [ ashehata ] |