[LU-17515] dynamically tune 'conns_per_peer' as needed - Whamcloud Community JIRA

Details

Type: Improvement
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.14.0, Lustre 2.16.0
Labels:
- lnet
- medium
- multi-rail
- usability

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

If there is a mismatch between conns_per_peer on a client and server (e.g. different Ethernet network speed across Ethernet switches, or other reasons below) then each side will try to establish a different number of TCP sockets for the peer. ~~LU-17258~~ is handling this by "giving up" on establishing more peer connections, as long as one could be established for each type.

When this happens, the client should save the conn_count as the new (in memory, until next unmount/remount) conns_per_peer value the remote peer NID, so that it doesn't continue trying to establish more connections whenever there is a problem.

Otherwise, the server will have to handle and reject these connections on a regular basis, which may seem like a DDOS if 10000 clients are all trying to (re-)establish thousands of connections at mount, recovery, or whenever there is a network hiccup. This makes the configuration more "hands off" without the need to tune conns_per_peer explicitly (and in coordination) across all nodes.

It is likely that the servers also need to dynamically shrink conns_per_peer when they start having a lot of connected peers to avoid the need to explicitly tune this for large clusters (and make us get involved to fix the system after it breaks). This will (eventually) cause the remote peers to also shrink their connection count over time due to their backoff of failed connections. I'm thinking something simple like shrinking conns_per_peer by 1 as the number of established peer connections grows past 20000 and again at 40000 (if it hasn't already started shrinking the number of connections when passing 20000). It couldn't be set < 1.

It could print a console message when this is done, suggesting to "set 'options socklnd conns_per_peer=N' in /etc/modprobe.d/lustre.conf to avoid this in the future", but at least the system would continue to work.

I don't know if the server would need to actively disconnect client connections > conns_per_peer, but that might be needed if the number of connections continues to grow (e.g. > 50000).

It would never increase conns_per_peer until the system is restarted, or maybe if explicitly set from userspace again if the admin really thinks they know better.

I've also filed LU-17514 for tracking an "expected_clients" tunable that can be used to set a ballpark figure for the number of clients, so that various runtime parameters like conns_per_peer could be set appropriately early in the cluster mount process.

Attachments

Issue Links

is related to

LU-17513 how does 'conns_per_peer' apply with multiple NIDs?

Open

LU-17258 socklnd connection type not established upon connection race

Resolved

LU-17514 hint for expected number of connected clients

Open

Activity

People

Assignee:: WC Triage

Reporter:: Andreas Dilger

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 07/Feb/24 10:25 PM

Updated:: 30/Apr/25 8:18 AM