Description
During performance testing of a new Lustre file system, we discovered that read/write performance aren't where we would expect. As an example, the block level read performance for the system is just over 65GB/s. In scaling tests, we can only get to around 30 GB/s for reads. Writes are slightly better, but still in the 35GB/s range. At single node scale, we seem to cap out at a few GB/s.
After going through tunings and everything that we can find, we're slightly better, but still miles behind where performance should be. We've played with various ksocklnd parameters (nconnds, nscheds, tx/rx buffer size, etc), but really to not much change. Current tunings that may be relevant: credits 2560, peer credits 63, max_rpcs_in_flight 32.
Network configuration on the servers is 2x 100G ethernet bonded together (active/active) using kernel bonding (not ksocklnd bonding).
iperf between two nodes gets nearly line rate at ~98Gb/s and iperf from two nodes to a single node can push ~190Gb/s, consistent with what would be expected from the kernel bonding.
lnet selftest shows about ~2.5GB/s (20Gb/s) rates for node to node tests. I'm not sure if this is a bug in lnet selftest or a real reflection of the performance.
We found the following related tickets/mailing list discussions which seem to be very similar to what we're seeing, but with no resolutions:
http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2019-August/016630.html
https://jira.whamcloud.com/browse/LU-11415
https://jira.whamcloud.com/browse/LU-12815 (maybe performance limiting, but I doubt it for what we're seeing)
Any help or suggestions would be awesome.
Thanks!
- Jeff
Jeff, I definitely have some comments related to ZFS performance, but it should really go into a separate ticket. If I file that ticket, it will not be tracked correctly as a customer issue, so it is best if you do that.
As for including conns_per_peer into 2.12, that is a bit tricky in the short term since that patch depends on another one that is removing the socklnd-level TCP bonding feature. While the LNet Multi-Rail provides better functionality, use_tcp_bonding may be in use at customer sites and shouldn't be removed in an LTS release without any warning. A patch will go into the next 2.12.7 LTS and 2.14.0 releases to announce that this option is deprecated, which will allow sites to become aware of this change and move over to LNet Multi-Rail. I've asked in
LU-12815for an email to be sent out to lustre-discuss and lustre-devel asking if anyone is using this feature, and maybe it can be removed from 2.12.8 if there is no feedback on its usage.