[LU-9478] Potential Performance problem under MOFED 4.x Created: 09/May/17  Updated: 21/Jan/22  Resolved: 21/Jan/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0, Upstream
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Doug Oucharek (Inactive) Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: lnet

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

While investigating LU-9472, I found the performance of mlx5 FDR cards under MOFED 4 to be horrible.  With lnet-selftest, I could only get to about 750MB/s when I should be seeing over 6GB/s!!

I changed the code to allow us to use the global MR (which upstream devs won't let us do).  This allows us to have map_on_demand of zero.  Performance only improved about 10% (expected) so the performance issue is not related to FastReg.

I also ported in the MultiQP fix to see if using multiple QPs on a connection would help.  It did not.  In fact, things got worse.

To get MOFED 4, I installed the tarball: MLNX_OFED_LINUX-4.0-2.0.0.1-rhel7.3-x86_64.tar.gz to a RHEL 7.3 install doing all the needed steps which included removing the inkernel OFED.  I then built master on this node so it picked up MOFED 4 APIs.  

The probability that I messed something up: very high. As such, I am not confident in this ticket being a real problem.  I need someone much better at Linux IT than me to build with LU-9472 against a properly installed MOFED 4 and see what performance is.



 Comments   
Comment by James A Simmons [ 10/Sep/18 ]

Still a problem?

Generated at Sat Feb 10 02:26:32 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.