Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11981

lnet_is_health_check() Msg is in inconsistent state, don't perform health checking (0, 2)

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.12.4
    • Lustre 2.12.0
    • clients and routers: Lustre 2.12.0_1.chaos
      lustre servers: Lustre 2.10.6_2.chaos

      Linux version 3.10.0-957.1.3.1chaos.ch6.x86_64
      Clients OmniPath <-> routers <-> Servers mlx5
    • 3
    • 9223372036854775807

    Description

      Over the span of about 20 minutes, routers reported the following in their console logs:
      2019-02-19 10:05:02 [330235.278414] LNetError: 33048:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 2)
      2019-02-19 10:05:02 [330235.294305] LNetError: 33048:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1646 previous similar messages

      While the lustre servers were being rebooted.
      (0, 2) corresponds to:
      msg->msg_ev.status == 0 (success)
      msg->msg_health_status == 2 (LNET_MSG_STATUS_LOCAL_DROPPED)

      See https://github.com/LLNL/lustre/releases for contents of 2.12.0_1.chaos.

      Attachments

        Activity

          [LU-11981] lnet_is_health_check() Msg is in inconsistent state, don't perform health checking (0, 2)
          ofaaland Olaf Faaland made changes -
          Labels Original: llnl topllnl New: llnl
          pjones Peter Jones made changes -
          Link Original: This issue is related to JFC-27 [ JFC-27 ]
          pjones Peter Jones made changes -
          Link New: This issue is related to JFC-20 [ JFC-20 ]
          pjones Peter Jones made changes -
          Link Original: This issue is related to JFC-21 [ JFC-21 ]
          pjones Peter Jones made changes -
          Fix Version/s New: Lustre 2.12.4 [ 14690 ]
          Resolution New: Fixed [ 1 ]
          Status Original: Open [ 1 ] New: Resolved [ 5 ]
          pjones Peter Jones made changes -
          Link New: This issue is related to JFC-27 [ JFC-27 ]
          pjones Peter Jones made changes -
          Link New: This issue is related to JFC-21 [ JFC-21 ]
          ofaaland Olaf Faaland made changes -
          Labels Original: llnl New: llnl topllnl
          ofaaland Olaf Faaland made changes -
          Attachment New: dk.opal190.1550688817.txt.gz [ 32040 ]
          ashehata Amir Shehata (Inactive) made changes -
          Assignee Original: WC Triage [ wc-triage ] New: Amir Shehata [ ashehata ]
          ofaaland Olaf Faaland created issue -

          People

            ashehata Amir Shehata (Inactive)
            ofaaland Olaf Faaland
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: