[LU-16975] Automatically setup all interfaces for socklnd, o2iblnd Created: 23/Jul/23  Updated: 18/Aug/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Tim Day Assignee: Tim Day
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Currently, setting up LNet with socklnd or o2iblnd only automatically sets up the first network interface. Unless a user knows to manually setup the remaining interfaces, their node will experience subpar network performance. All interfaces should be automatically setup.



 Comments   
Comment by Gerrit Updater [ 23/Jul/23 ]

"Timothy Day <timday@amazon.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51748
Subject: LU-16975 lnet: setup all available interfaces
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 70e8174723687fa6adbf0c0298bbdcd907d01967

Comment by Andreas Dilger [ 23/Jul/23 ]

This changes behavior fairly significantly, and it isn't clear that it is always the right thing to do. For example, on NVIDIA DGX machines there are 8 IB interfaces, but only 2 of them are normally used for storage traffic, while the rest of them are for compute traffic.

Also, most large clusters have a dedicated administration Ethernet network that is intended for logging and remote IPMI traffic, and flooding this with Lustre traffic would make it difficult to manage the nodes.

If anything like this is done, it should also be possible to disable the functionality in an easy manner. I'm thinking that rather than building this into the LNDs IMHO, it would be better to install clients with a default /etc/modprobe.d/lnet.conf that matches all Ethernet interfaces or similar but would be replaced easily in large clusters by an appropriate ip2nets line that matched the right interfaces.

Comment by Tim Day [ 24/Jul/23 ]

Currently, LNet automatically sets up the first Ethernet or IB interface, which I think is also often wrong. If an appropriate parameter is passed to the module (via /etc/modprobe.d/lnet.conf or otherwise), LNet uses that instead of the default behavior. This patch preserves that, so the default settings are easy to disable.

Right now, default setup behavior is decided in LNet rather than the LNDs (this patch doesn't change this). It would be better if each LND could define it's own default behavior without having to have special logic in LNet (this would make things much more modular). But, I think that would need a larger refactor since lnd_startup only accepts one NI at a time.

I think the goal any default setting is to pick the least wrong choice for the uninformed user. People already familiar with Lustre likely already know the best network config for their machines.

Comment by Andreas Dilger [ 24/Jul/23 ]

This is probably broad enough reaching a change that it should be sent out to Lustre-discuss for comments.

It is good to know that this doesn't enable all interfaces if one is explicitly specified, and that should be in the commit message.

I can understand that in the cloud world it is likely that "enable all visible interfaces" is probably OK, because there are management interfaces not visible to the VM that can be used to control the system. Possibly in this case, LNet would be listening on the other interfaces but not using them because the servers do not have NIDs there. That wouldn't be terrible, but a bit of an increased security risk if these interfaces are externally visible.

I think if we used all interfaces for Lustre traffic on real hardware it would be considered a bug and we would be asked to change it back. That said, maybe I'm wrong and most clusters already have explicit interface selection and this will be a no-op.

Comment by Steve Crusan [ 24/Jul/23 ]

I'm a lowly peasant that only reads this stuff normally, but this change should be opt-in via a driver tunable or something.

Andreas already covered most of it, but in our situation, we don't want Lustre using slower (read: not high speed data) interfaces, nor do we want Lustre to make these decisions for us. We specifically choose which interfaces to use with Lustre before we mark a node as "production". 

I think an example of "doing the wrong thing and assuming default behavior" could be found by looking for "skip_mr_route_setup" in the source code. Without setting skip_mr_route_setup=1 as a ksocklnd module option, that broke things for us when it came out.

Comment by Tim Day [ 18/Aug/23 ]

I'll likely refactor this to make it opt-in. That way, if someone builds a custom client - they could enable it easily. If a lot of people use the flag, it'd be easy to change the default in the future. I might look at implementing something like "lnetctl add --net tcp --if *" which would enable all interfaces (for a particular LND). That would be a QoL improvement, imo.

Comment by Chris Horn [ 18/Aug/23 ]

Some kind of pattern matching would be nice, too. Our products often have a naming scheme for the HSN interfaces. e.g. hsn0, hsn1, ... hsnX, heth0, heth1, ... hethX, etc.

Comment by James A Simmons [ 18/Aug/23 ]

You can get pattern matching with glob_match() which the kernel provides. I plan to use it for some of the tunables with Netlink in the near future.

Generated at Sat Feb 10 03:31:33 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.