[LU-8395] Limit lnet-selftest Threads Created: 13/Jul/16  Updated: 11/Dec/19  Resolved: 11/Dec/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Doug Oucharek (Inactive) Assignee: Amir Shehata (Inactive)
Resolution: Incomplete Votes: 0
Labels: None

Attachments: PNG File knl.png    
Issue Links:
Related
is related to LU-7553 Lustre cpu_npartitions default value ... Resolved
is related to LU-5050 cpu partitioning oddities Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

On dense cored systems with hyperthreading active, you end up with a lot of execution threads. An example would be a system with 256 total threads.

The libcfs partition code divides those up into 16 partitions of 16 threads each (based on a math algorithm...not logical grouping based on hyperthreading).

Lnet-selftest then creates a group of scheduler threads in each partition equal to the number of execution threads minus 1 (so 15 per partition for our example). That means it creates 240 scheduler threads. This actually reduces performance rather than improves it.

We need to limit the number of lnet-selftest scheduler threads to some reasonable number per partition.

Note: another bug will be created to look at how we group the execution threads into partitions. It needs to make more sense than it does.



 Comments   
Comment by Gerrit Updater [ 13/Jul/16 ]

Doug Oucharek (doug.s.oucharek@intel.com) uploaded a new patch: http://review.whamcloud.com/21299
Subject: LU-8395 lnet: Limit total number of lnet-selftest threads
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9a3bc67c22e9c52b64a1a2eb3fad220ace85d434

Comment by Dmitry Eremin (Inactive) [ 15/Jul/16 ]

Unfortunately this patch makes performance on KNL worse.

Comment by Andreas Dilger [ 15/Jul/16 ]

Doug, Dmitry, could you please describe the parameters of your performance testing so that you can agree on what is being measured.

Doug, was the 64-thread limit based on anything concrete, or just a guess? It might be useful to run some testing with different thread counts to see where the threshold is hit. It might also be possible to determine at compile time (based on CPP constant for KNL) or at runtime (not sure how) to limit the threads differently for Xeon Phi vs. other systems with many cores.

Dmitry, can you run your testing similarly, to see where the threshold is for reducing thread counts. Do we need to have one LNet thread per core, or do we have some benefit to reserve a few cores for other tasks?

Comment by Andreas Dilger [ 15/Jul/16 ]

Doug, as for the CPT partition calculations, this is discussed in LU-5050 and LU-7553, so this ticket should be focused on the LST thread count.

There was one patch http://review.whamcloud.com/17824 "LU-5050 libcfs: default CPT matches NUMA topology" that tried to change this, but it had to be reverted for a relatively minor reason, and should probably be revived.

Comment by Dmitry Eremin (Inactive) [ 18/Jul/16 ]

This is results with different partitions count. The current settings looks much better.

Comment by Andreas Dilger [ 11/Dec/19 ]

This configuration is not a development target today, and there is no real information in this ticket. Maybe revisit this in the future.

Generated at Sat Feb 10 02:17:10 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.