Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3447

Client RDMA too fragmented: 128/255 src 128/256 dst frags

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.1.5
    • Lustre servers running 2.1.5, Lustre clients with 1.8.9.
    • 3
    • 8618

    Description

      During an IOR-like benchmark doing directIO from multiple clients (16, 64) clients get disconnected and evicted. The MPI process dies in misery and some of it's processes aren't even killable.

      We've seen that there was a similar bug a while ago that was marked as solved, it was occuring on lnet routers (https://bugzilla.lustre.org/show_bug.cgi?id=13607). This one is on clients.

      What can lead to the "RDMA too fragmented" issue? Any hint or suggestion? Client log messages are in the attached file.

      Regards,
      Erich

      Attachments

        Activity

          People

            bfaccini Bruno Faccini (Inactive)
            efocht Erich Focht
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: