Details

    • Bug
    • Resolution: Incomplete
    • Major
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      On dense cored systems with hyperthreading active, you end up with a lot of execution threads. An example would be a system with 256 total threads.

      The libcfs partition code divides those up into 16 partitions of 16 threads each (based on a math algorithm...not logical grouping based on hyperthreading).

      Lnet-selftest then creates a group of scheduler threads in each partition equal to the number of execution threads minus 1 (so 15 per partition for our example). That means it creates 240 scheduler threads. This actually reduces performance rather than improves it.

      We need to limit the number of lnet-selftest scheduler threads to some reasonable number per partition.

      Note: another bug will be created to look at how we group the execution threads into partitions. It needs to make more sense than it does.

      Attachments

        1. knl.png
          13 kB
          Dmitry Eremin

        Issue Links

          Activity

            [LU-8395] Limit lnet-selftest Threads

            This configuration is not a development target today, and there is no real information in this ticket. Maybe revisit this in the future.

            adilger Andreas Dilger added a comment - This configuration is not a development target today, and there is no real information in this ticket. Maybe revisit this in the future.
            dmiter Dmitry Eremin (Inactive) added a comment - - edited

            This is results with different partitions count. The current settings looks much better.

            dmiter Dmitry Eremin (Inactive) added a comment - - edited This is results with different partitions count. The current settings looks much better.
            adilger Andreas Dilger added a comment - - edited

            Doug, as for the CPT partition calculations, this is discussed in LU-5050 and LU-7553, so this ticket should be focused on the LST thread count.

            There was one patch http://review.whamcloud.com/17824 "LU-5050 libcfs: default CPT matches NUMA topology" that tried to change this, but it had to be reverted for a relatively minor reason, and should probably be revived.

            adilger Andreas Dilger added a comment - - edited Doug, as for the CPT partition calculations, this is discussed in LU-5050 and LU-7553 , so this ticket should be focused on the LST thread count. There was one patch http://review.whamcloud.com/17824 " LU-5050 libcfs: default CPT matches NUMA topology" that tried to change this, but it had to be reverted for a relatively minor reason, and should probably be revived.

            Doug, Dmitry, could you please describe the parameters of your performance testing so that you can agree on what is being measured.

            Doug, was the 64-thread limit based on anything concrete, or just a guess? It might be useful to run some testing with different thread counts to see where the threshold is hit. It might also be possible to determine at compile time (based on CPP constant for KNL) or at runtime (not sure how) to limit the threads differently for Xeon Phi vs. other systems with many cores.

            Dmitry, can you run your testing similarly, to see where the threshold is for reducing thread counts. Do we need to have one LNet thread per core, or do we have some benefit to reserve a few cores for other tasks?

            adilger Andreas Dilger added a comment - Doug, Dmitry, could you please describe the parameters of your performance testing so that you can agree on what is being measured. Doug, was the 64-thread limit based on anything concrete, or just a guess? It might be useful to run some testing with different thread counts to see where the threshold is hit. It might also be possible to determine at compile time (based on CPP constant for KNL) or at runtime (not sure how) to limit the threads differently for Xeon Phi vs. other systems with many cores. Dmitry, can you run your testing similarly, to see where the threshold is for reducing thread counts. Do we need to have one LNet thread per core, or do we have some benefit to reserve a few cores for other tasks?

            Unfortunately this patch makes performance on KNL worse.

            dmiter Dmitry Eremin (Inactive) added a comment - Unfortunately this patch makes performance on KNL worse.

            Doug Oucharek (doug.s.oucharek@intel.com) uploaded a new patch: http://review.whamcloud.com/21299
            Subject: LU-8395 lnet: Limit total number of lnet-selftest threads
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 9a3bc67c22e9c52b64a1a2eb3fad220ace85d434

            gerrit Gerrit Updater added a comment - Doug Oucharek (doug.s.oucharek@intel.com) uploaded a new patch: http://review.whamcloud.com/21299 Subject: LU-8395 lnet: Limit total number of lnet-selftest threads Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 9a3bc67c22e9c52b64a1a2eb3fad220ace85d434

            People

              ashehata Amir Shehata (Inactive)
              doug Doug Oucharek (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: