Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9072

Upstream lnet-selftest causing node to crash on load

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Upstream
    • Upstream
    • None
    • 3
    • 9223372036854775807

    Description

      The upstream version of lnet-selftest is triggering a node crash when it is loaded.  We know that a kernel developer changed the definition of kiov in both LNet and lnet-selftest.  The crash may be related.  In one run, I saw this log before crashing:

      LNet: 16216:0:(framework.c:1712:sfw_startup()) Failed to reserve enough buffers: service debug, 256 needed: -30720

      This may be a "hint" we are running out of memory and thereby causing instability in the kernel. It may be the new kiov system is causing memory exhaustion especially when allocating per-CPT.

      Attachments

        Activity

          [LU-9072] Upstream lnet-selftest causing node to crash on load

          Oleg figured this out and submitted a patch upstream.  I cannot find the "send-email" email right now, but will post it here when I do.  

          Once it is landed, this ticket will be marked as resolved.

          The problem did not affect master so no patch is needed there.

          doug Doug Oucharek (Inactive) added a comment - Oleg figured this out and submitted a patch upstream.  I cannot find the "send-email" email right now, but will post it here when I do.   Once it is landed, this ticket will be marked as resolved. The problem did not affect master so no patch is needed there.

          People

            doug Doug Oucharek (Inactive)
            doug Doug Oucharek (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: