Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9478

Potential Performance problem under MOFED 4.x

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • Lustre 2.10.0, Upstream
    • 3
    • 9223372036854775807

    Description

      While investigating LU-9472, I found the performance of mlx5 FDR cards under MOFED 4 to be horrible.  With lnet-selftest, I could only get to about 750MB/s when I should be seeing over 6GB/s!!

      I changed the code to allow us to use the global MR (which upstream devs won't let us do).  This allows us to have map_on_demand of zero.  Performance only improved about 10% (expected) so the performance issue is not related to FastReg.

      I also ported in the MultiQP fix to see if using multiple QPs on a connection would help.  It did not.  In fact, things got worse.

      To get MOFED 4, I installed the tarball: MLNX_OFED_LINUX-4.0-2.0.0.1-rhel7.3-x86_64.tar.gz to a RHEL 7.3 install doing all the needed steps which included removing the inkernel OFED.  I then built master on this node so it picked up MOFED 4 APIs.  

      The probability that I messed something up: very high. As such, I am not confident in this ticket being a real problem.  I need someone much better at Linux IT than me to build with LU-9472 against a properly installed MOFED 4 and see what performance is.

      Attachments

        Activity

          People

            wc-triage WC Triage
            doug Doug Oucharek (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: