[LU-16975] Automatically setup all interfaces for socklnd, o2iblnd Created: 23/Jul/23 Updated: 18/Aug/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Tim Day | Assignee: | Tim Day |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Currently, setting up LNet with socklnd or o2iblnd only automatically sets up the first network interface. Unless a user knows to manually setup the remaining interfaces, their node will experience subpar network performance. All interfaces should be automatically setup. |
| Comments |
| Comment by Gerrit Updater [ 23/Jul/23 ] |
|
"Timothy Day <timday@amazon.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51748 |
| Comment by Andreas Dilger [ 23/Jul/23 ] |
|
This changes behavior fairly significantly, and it isn't clear that it is always the right thing to do. For example, on NVIDIA DGX machines there are 8 IB interfaces, but only 2 of them are normally used for storage traffic, while the rest of them are for compute traffic. Also, most large clusters have a dedicated administration Ethernet network that is intended for logging and remote IPMI traffic, and flooding this with Lustre traffic would make it difficult to manage the nodes. If anything like this is done, it should also be possible to disable the functionality in an easy manner. I'm thinking that rather than building this into the LNDs IMHO, it would be better to install clients with a default /etc/modprobe.d/lnet.conf that matches all Ethernet interfaces or similar but would be replaced easily in large clusters by an appropriate ip2nets line that matched the right interfaces. |
| Comment by Tim Day [ 24/Jul/23 ] |
|
Currently, LNet automatically sets up the first Ethernet or IB interface, which I think is also often wrong. If an appropriate parameter is passed to the module (via /etc/modprobe.d/lnet.conf or otherwise), LNet uses that instead of the default behavior. This patch preserves that, so the default settings are easy to disable. Right now, default setup behavior is decided in LNet rather than the LNDs (this patch doesn't change this). It would be better if each LND could define it's own default behavior without having to have special logic in LNet (this would make things much more modular). But, I think that would need a larger refactor since lnd_startup only accepts one NI at a time. I think the goal any default setting is to pick the least wrong choice for the uninformed user. People already familiar with Lustre likely already know the best network config for their machines. |
| Comment by Andreas Dilger [ 24/Jul/23 ] |
|
This is probably broad enough reaching a change that it should be sent out to Lustre-discuss for comments. It is good to know that this doesn't enable all interfaces if one is explicitly specified, and that should be in the commit message. I can understand that in the cloud world it is likely that "enable all visible interfaces" is probably OK, because there are management interfaces not visible to the VM that can be used to control the system. Possibly in this case, LNet would be listening on the other interfaces but not using them because the servers do not have NIDs there. That wouldn't be terrible, but a bit of an increased security risk if these interfaces are externally visible. I think if we used all interfaces for Lustre traffic on real hardware it would be considered a bug and we would be asked to change it back. That said, maybe I'm wrong and most clusters already have explicit interface selection and this will be a no-op. |
| Comment by Steve Crusan [ 24/Jul/23 ] |
|
I'm a lowly peasant that only reads this stuff normally, but this change should be opt-in via a driver tunable or something. Andreas already covered most of it, but in our situation, we don't want Lustre using slower (read: not high speed data) interfaces, nor do we want Lustre to make these decisions for us. We specifically choose which interfaces to use with Lustre before we mark a node as "production". I think an example of "doing the wrong thing and assuming default behavior" could be found by looking for "skip_mr_route_setup" in the source code. Without setting skip_mr_route_setup=1 as a ksocklnd module option, that broke things for us when it came out. |
| Comment by Tim Day [ 18/Aug/23 ] |
|
I'll likely refactor this to make it opt-in. That way, if someone builds a custom client - they could enable it easily. If a lot of people use the flag, it'd be easy to change the default in the future. I might look at implementing something like "lnetctl add --net tcp --if *" which would enable all interfaces (for a particular LND). That would be a QoL improvement, imo. |
| Comment by Chris Horn [ 18/Aug/23 ] |
|
Some kind of pattern matching would be nice, too. Our products often have a naming scheme for the HSN interfaces. e.g. hsn0, hsn1, ... hsnX, heth0, heth1, ... hethX, etc. |
| Comment by James A Simmons [ 18/Aug/23 ] |
|
You can get pattern matching with glob_match() which the kernel provides. I plan to use it for some of the tunables with Netlink in the near future. |