Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14293

Poor lnet/ksocklnd(?) performance on 2x100G bonded ethernet

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Major
    • None
    • Lustre 2.12.6
    • 3
    • 9223372036854775807

    Description

      During performance testing of a new Lustre file system, we discovered that read/write performance aren't where we would expect. As an example, the block level read performance for the system is just over 65GB/s. In scaling tests, we can only get to around 30 GB/s for reads. Writes are slightly better, but still in the 35GB/s range. At single node scale, we seem to cap out at a few GB/s.

      After going through tunings and everything that we can find, we're slightly better, but still miles behind where performance should be. We've played with various ksocklnd parameters (nconnds, nscheds, tx/rx buffer size, etc), but really to not much change. Current tunings that may be relevant: credits 2560, peer credits 63, max_rpcs_in_flight 32.

      Network configuration on the servers is 2x 100G ethernet bonded together (active/active) using kernel bonding (not ksocklnd bonding).

      iperf between two nodes gets nearly line rate at ~98Gb/s and iperf from two nodes to a single node can push ~190Gb/s, consistent with what would be expected from the kernel bonding.

      lnet selftest shows about ~2.5GB/s (20Gb/s) rates for node to node tests. I'm not sure if this is a bug in lnet selftest or a real reflection of the performance.

      We found the following related tickets/mailing list discussions which seem to be very similar to what we're seeing, but with no resolutions:

      http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2019-August/016630.html

      https://jira.whamcloud.com/browse/LU-11415

      https://jira.whamcloud.com/browse/LU-12815 (maybe performance limiting, but I doubt it for what we're seeing)

       

      Any help or suggestions would be awesome.

      Thanks!

      • Jeff

      Attachments

        Issue Links

          Activity

            People

              ashehata Amir Shehata (Inactive)
              nilesj Jeff Niles
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: