[LU-8395] Limit lnet-selftest Threads Created: 13/Jul/16 Updated: 11/Dec/19 Resolved: 11/Dec/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Doug Oucharek (Inactive) | Assignee: | Amir Shehata (Inactive) |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
On dense cored systems with hyperthreading active, you end up with a lot of execution threads. An example would be a system with 256 total threads. The libcfs partition code divides those up into 16 partitions of 16 threads each (based on a math algorithm...not logical grouping based on hyperthreading). Lnet-selftest then creates a group of scheduler threads in each partition equal to the number of execution threads minus 1 (so 15 per partition for our example). That means it creates 240 scheduler threads. This actually reduces performance rather than improves it. We need to limit the number of lnet-selftest scheduler threads to some reasonable number per partition. Note: another bug will be created to look at how we group the execution threads into partitions. It needs to make more sense than it does. |
| Comments |
| Comment by Gerrit Updater [ 13/Jul/16 ] |
|
Doug Oucharek (doug.s.oucharek@intel.com) uploaded a new patch: http://review.whamcloud.com/21299 |
| Comment by Dmitry Eremin (Inactive) [ 15/Jul/16 ] |
|
Unfortunately this patch makes performance on KNL worse. |
| Comment by Andreas Dilger [ 15/Jul/16 ] |
|
Doug, Dmitry, could you please describe the parameters of your performance testing so that you can agree on what is being measured. Doug, was the 64-thread limit based on anything concrete, or just a guess? It might be useful to run some testing with different thread counts to see where the threshold is hit. It might also be possible to determine at compile time (based on CPP constant for KNL) or at runtime (not sure how) to limit the threads differently for Xeon Phi vs. other systems with many cores. Dmitry, can you run your testing similarly, to see where the threshold is for reducing thread counts. Do we need to have one LNet thread per core, or do we have some benefit to reserve a few cores for other tasks? |
| Comment by Andreas Dilger [ 15/Jul/16 ] |
|
Doug, as for the CPT partition calculations, this is discussed in There was one patch http://review.whamcloud.com/17824 " |
| Comment by Dmitry Eremin (Inactive) [ 18/Jul/16 ] |
|
This is results with different partitions count. The current settings looks much better. |
| Comment by Andreas Dilger [ 11/Dec/19 ] |
|
This configuration is not a development target today, and there is no real information in this ticket. Maybe revisit this in the future. |