Description
During performance testing of a new Lustre file system, we discovered that read/write performance aren't where we would expect. As an example, the block level read performance for the system is just over 65GB/s. In scaling tests, we can only get to around 30 GB/s for reads. Writes are slightly better, but still in the 35GB/s range. At single node scale, we seem to cap out at a few GB/s.
After going through tunings and everything that we can find, we're slightly better, but still miles behind where performance should be. We've played with various ksocklnd parameters (nconnds, nscheds, tx/rx buffer size, etc), but really to not much change. Current tunings that may be relevant: credits 2560, peer credits 63, max_rpcs_in_flight 32.
Network configuration on the servers is 2x 100G ethernet bonded together (active/active) using kernel bonding (not ksocklnd bonding).
iperf between two nodes gets nearly line rate at ~98Gb/s and iperf from two nodes to a single node can push ~190Gb/s, consistent with what would be expected from the kernel bonding.
lnet selftest shows about ~2.5GB/s (20Gb/s) rates for node to node tests. I'm not sure if this is a bug in lnet selftest or a real reflection of the performance.
We found the following related tickets/mailing list discussions which seem to be very similar to what we're seeing, but with no resolutions:
http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2019-August/016630.html
https://jira.whamcloud.com/browse/LU-11415
https://jira.whamcloud.com/browse/LU-12815 (maybe performance limiting, but I doubt it for what we're seeing)
Any help or suggestions would be awesome.
Thanks!
- Jeff
It might make sense to keep this issue open to track the socklnd conns_per_peer feature for your use in 2.12.x, since
LU-12815will be closed once the patches are landed on master for 2.15 (though Peter may have other methods for tracking this). In the meantime, pending final review, testing, and landing of theLU-12815patch series, there isn't a particular reason for you not to use the conns_per_peer patch on your system, since you are presumably not using the use_tcp_bonding feature yourself.