Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12199

md's are not detached from uncommitted messages that have health check performed on them

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Lustre 2.12.0, Lustre 2.13.0
    • Fix Version/s: Lustre 2.13.0, Lustre 2.12.3
    • Labels:
      None
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      It's possible for lnet_is_health_check() to return "true" when the
      message has not hit the network. In this situation the message is freed
      without detaching the MD. As a result, requests do not receive their
      unlink events and these requests are stuck forever.

      This issue was discovered while testing the MR routing feature under LNet router failure conditions.

      Bug was introduced by the LNet health feature commit 70616605dd44be37068f4e1a4745a2f8b90eb1f5 https://review.whamcloud.com/32764

        Attachments

          Activity

            People

            • Assignee:
              hornc Chris Horn
              Reporter:
              hornc Chris Horn
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: