[LU-7224] Tunning ko2iblnd for Large clustre Created: 28/Sep/15 Updated: 16/Apr/16 Resolved: 16/Apr/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Mahmoud Hanafi | Assignee: | Doug Oucharek (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Environment: |
lustre 2.5.3 |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
We have large cluster We need your best recommendations for ko2iblnd tunning such as peer_credits, credits, ntx, etc. |
| Comments |
| Comment by Peter Jones [ 29/Sep/15 ] |
|
Hi Mahmoud There is definitely an art to getting these values set appropriately. The basic effects of the settings are detailed in the ops manual but I understand that there can be surprising results when these settings are made in relation to each other. I will ask around to see what experiential knowledge people can share that might help you in this process Peter |
| Comment by Doug Oucharek (Inactive) [ 03/Nov/15 ] |
|
As I understand it, this ticket is related to the problems being found in ticket
Please confirm is my understanding is incorrect. So, two things need to be tuned for: 1- Resources on the OSSs to make it easier to accommodate the load. The traffic shaping is managed by "lowering" the peer credit value on the clients. This reduces the number of outstanding messages to any given OSS from any given client. However, you can also change the max_rpc_in_flight parameter (Lustre, not LNet, parameter) to management how many operations can be outstanding at any given time. This is the better parameter to change as, unlike peer_credits, it does not have to be the same on two peers communicating. Lowering max_rpc_in_flight keeps the door open for increasing peer_credits. You may want to do this on the OSSs so they are not holding back sending out responses. But, as that parameter currently needs to be the same on all nodes, you would need to increase it on the clients as well as the OSSs. max_rpc_in_flight will make sure the clients don't make use of the higher peer_credits value but the OSSs do. FMR will probably not help much here. I looked at the code for FMR and do not see it using any less memory than regular buffers. In fact, it may use a little more. FMR helps out when using Truescale IB cards and when dealing with high latency networks (like a WAN). If using Mellanox over a LAN, you should not see much benefit from FMR. With regards to the resources on the OSSs, memory seems to be the key one here. The TX pool allocation system returns the pools back to the system after 300 seconds. To avoid this, it is good to allocate a very large initial TX pool by setting a very high NTX value. This initial pool is never returned back to the system so having a large pool means we don't needs to spend any time in memory allocation/deallocation routines. Of course, to have a large TX pool also means having a lot of physical memory in the OSSs so they can accommodate so much pre-allocated buffers. So, in summary, I am recommending:
|
| Comment by Mahmoud Hanafi [ 11/Apr/16 ] |
|
We can close this issue. |
| Comment by John Fuchs-Chesney (Inactive) [ 16/Apr/16 ] |
|
Thanks Mahmoud. |