Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
Lustre 2.10.0, Upstream
-
3
-
9223372036854775807
Description
While investigating LU-9472, I found the performance of mlx5 FDR cards under MOFED 4 to be horrible. With lnet-selftest, I could only get to about 750MB/s when I should be seeing over 6GB/s!!
I changed the code to allow us to use the global MR (which upstream devs won't let us do). This allows us to have map_on_demand of zero. Performance only improved about 10% (expected) so the performance issue is not related to FastReg.
I also ported in the MultiQP fix to see if using multiple QPs on a connection would help. It did not. In fact, things got worse.
To get MOFED 4, I installed the tarball: MLNX_OFED_LINUX-4.0-2.0.0.1-rhel7.3-x86_64.tar.gz to a RHEL 7.3 install doing all the needed steps which included removing the inkernel OFED. I then built master on this node so it picked up MOFED 4 APIs.
The probability that I messed something up: very high. As such, I am not confident in this ticket being a real problem. I need someone much better at Linux IT than me to build with LU-9472 against a properly installed MOFED 4 and see what performance is.