[LU-14564] Allow number of threads to grow when all existing threads are stuck Created: 26/Mar/21 Updated: 22/Jan/24 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major |
| Reporter: | Oleg Drokin | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Currently when we have a chokepoint of some sort (be it slow disk, a blocked ldlm resource and the like), relatively low number of threads we start by default gets consumed pretty quickly resulting in no requests being processed even those that would not block because they don't touch the contended resource. While normally it's a good idea to maintain the thread pool small as it improves cpu cache effectiveness and also puts less of a load on a spinning disk setup (if present) when the entire thread pool is plugged for some time I think it is beneficial to actually spawn a lot more threads as a "one shot" only and we should not be any worse off wrt the end outcome. Memory consumption of server threads is relatively minor so even if you have a massive number of clients send you dozens of RPCs that results in tens of thousands of threads, modern systems should take it relatively easily. Worst comes to worst all the new threads are also stuck, but some of them might actually progress which would be a net benefit. Half measures are also possible to only spawn threads that would e.g. process "high priority" requests. |
| Comments |
| Comment by Neil Brown [ 28/Mar/21 ] |
|
This is an interesting problem. One that I have pondered for nfsd, but have no working solution to demonstrate. Thinking about it now, I would try using demand + memory availabillity to drive the size of the pool. So when all threads are in use (for more than some threshold) try a non-blocking memory allocation to allocate a new one. If it fails, maybe increase the threshold. Also register a shrinker which agressively prunes idle threads whenever memory is tight. This would only address the "memory" cost of a large pool. If there were other costs (device contention?), they would need to be address separately (n-way semaphore on the device?).
|