Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6228

How to balance network connections across socknal_sd tasks?

    XMLWordPrintable

Details

    • Question/Request
    • Resolution: Won't Fix
    • Major
    • None
    • None
    • Linux 3.10

    Description

      While using the ksocklnd LNET driver, I've noticed uneven load across the socknal_sd* tasks on an OSS. The number of tasks is controllable using combinations of nscheds and cpu_npartitions or cpu_pattern. I've also tried adjusting /proc/sys/lnet/portal_rotor, but this does not appear to be the right thing to try.

      On a dual socket, 6 core per processor system with

      $ cat ksocklnd.conf 
      options ksocklnd nscheds=6 peer_credits=128 credits=1024
      $ cat libcfs.conf 
      options libcfs cpu_pattern="0[0,1,2,3,4,5] 1[6,7,8,9,10,11]"
      

      there are 12 socknal_sd tasks. However, with up to 60 clients doing the same streaming IO, only 4 of the tasks will be heavily loaded (CPU time over 80%). Oddly, when running an LNET bulk_rw self test, up to 10 of the task will be loaded, and can consume 9.2 GB/s on the server's bonded 40GbE links.

      What am I missing? I thought it was the mapping of TCP connections to process, but I can't seem to track them through /proc/*/fd/ and /proc/net/tcp.

      I'm working from a recent pull of the master branch.

      Attachments

        1. lst-1-to-1-conc-1-to-64.txt
          17 kB
        2. lnet-test-alt-nics-irqmap.sh
          1 kB
        3. lnet-test-alt-nics.sh
          1 kB
        4. lnet-test-2cli.sh
          1 kB
        5. lnet-results-alternate-NICs-irqmap.txt
          12 kB
        6. lnet-results-alternate-NICs.txt
          3 kB
        7. lnet-results-2cli.txt
          8 kB
        8. lnet-bandwidth-cdev-single.sh
          1 kB

        Issue Links

          Activity

            People

              ashehata Amir Shehata (Inactive)
              rpwagner Rick Wagner (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: