Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3447

Client RDMA too fragmented: 128/255 src 128/256 dst frags

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • None
    • Lustre 2.1.5
    • Lustre servers running 2.1.5, Lustre clients with 1.8.9.
    • 3
    • 8618

      During an IOR-like benchmark doing directIO from multiple clients (16, 64) clients get disconnected and evicted. The MPI process dies in misery and some of it's processes aren't even killable.

      We've seen that there was a similar bug a while ago that was marked as solved, it was occuring on lnet routers (https://bugzilla.lustre.org/show_bug.cgi?id=13607). This one is on clients.

      What can lead to the "RDMA too fragmented" issue? Any hint or suggestion? Client log messages are in the attached file.

      Regards,
      Erich

            bfaccini Bruno Faccini (Inactive)
            efocht Erich Focht
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: