Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9179

Upstream ko2iblnd has poor performance

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      On a MLX QDR system, I get the following performance with current master on RHEL 7.3:

      Read: 3.1 GB/s, Write: 3.15 GB/s

      With the latest upstream build and LU-9026 fix, I am getting:

      Read: 1.25 GB/s, Write: 1.13 GB/s

      To see if the problem is due to LU-9026, I went back to before the RDMA API changes which broke ko2iblnd (4.8 rc2) and got:

      Read: 0.63 GB/s, Write: 0.62 GB/s

      So, I feel we have a bad problem with upstream LNet IB performance.  It is possible that lnet-selftest is broken (certainly for 4.8rc2, that is possible).

      I'm still unable to validate LU-9026 on the upstream client.  In theory, I get the same effect on master by setting map_on_demand to 256.  When I do that, I see about a 5% drop in performance only.  So, my suspicion is we have a problem with ko2i

      Attachments

        Issue Links

          Activity

            [LU-9179] Upstream ko2iblnd has poor performance
            simmonsja James A Simmons made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]

            This was due to left overs from LU-7650 which was incorrect. All the code has been removed upstream and replaced with what landed during 2.11 development cycle.

            simmonsja James A Simmons added a comment - This was due to left overs from LU-7650 which was incorrect. All the code has been removed upstream and replaced with what landed during 2.11 development cycle.
            simmonsja James A Simmons made changes -
            Link Original: This issue is related to LU-6215 [ LU-6215 ]
            simmonsja James A Simmons made changes -
            Link New: This issue is related to LU-9679 [ LU-9679 ]
            simmonsja James A Simmons made changes -
            Link Original: This issue is related to LU-4011 [ LU-4011 ]
            simmonsja James A Simmons made changes -
            Link New: This issue is related to LU-4011 [ LU-4011 ]
            simmonsja James A Simmons made changes -
            Link New: This issue is related to LU-6215 [ LU-6215 ]

            The LNet layer upstream is pretty much in sync with master just before multi-rail landed. The major difference is Al Viro's biovec patches are missing in master.

            simmonsja James A Simmons added a comment - The LNet layer upstream is pretty much in sync with master just before multi-rail landed. The major difference is Al Viro's biovec patches are missing in master.
            doug Doug Oucharek (Inactive) made changes -
            Labels New: lnet
            doug Doug Oucharek (Inactive) created issue -

            People

              wc-triage WC Triage
              doug Doug Oucharek (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: