Description
During performance testing of a new Lustre file system, we discovered that read/write performance aren't where we would expect. As an example, the block level read performance for the system is just over 65GB/s. In scaling tests, we can only get to around 30 GB/s for reads. Writes are slightly better, but still in the 35GB/s range. At single node scale, we seem to cap out at a few GB/s.
After going through tunings and everything that we can find, we're slightly better, but still miles behind where performance should be. We've played with various ksocklnd parameters (nconnds, nscheds, tx/rx buffer size, etc), but really to not much change. Current tunings that may be relevant: credits 2560, peer credits 63, max_rpcs_in_flight 32.
Network configuration on the servers is 2x 100G ethernet bonded together (active/active) using kernel bonding (not ksocklnd bonding).
iperf between two nodes gets nearly line rate at ~98Gb/s and iperf from two nodes to a single node can push ~190Gb/s, consistent with what would be expected from the kernel bonding.
lnet selftest shows about ~2.5GB/s (20Gb/s) rates for node to node tests. I'm not sure if this is a bug in lnet selftest or a real reflection of the performance.
We found the following related tickets/mailing list discussions which seem to be very similar to what we're seeing, but with no resolutions:
http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2019-August/016630.html
https://jira.whamcloud.com/browse/LU-11415
https://jira.whamcloud.com/browse/LU-12815 (maybe performance limiting, but I doubt it for what we're seeing)
Any help or suggestions would be awesome.
Thanks!
- Jeff
As a side note, it may make sense for us to close this particular issue and open a new, tailored one for the issues we're seeing now, since the particular issue described in this ticket (slow lnet performance) has been resolved with the `conns_per_peer` patch.
Along those lines, since that patch resolved our network performance issue and we'd like to keep running 2.12 (LTS), could we lobby to get James' backport of it included in the next 2.12 point release so that we don't have to keep carrying that patch?