[LU-11978] Increase maximum number of configured NIDs per device - Whamcloud Community JIRA

Details

Type: Improvement
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.10.5
Labels:
None
Environment:
Multi-tenancy, several filesets per node

Epic/Theme:
- mgs
Rank (Obsolete):
9223372036854775807

Description

We're finding some sort of limitation in our particular use case for Lustre where we export Lustre to many different VLANs with many filesets. Because the VLANs are directly connected to the servers (the servers' interfaces are connected to trunk ports on the switches) we need to specify, for each VLAN, a different set of NIDs for each device. If we have, for instance, 10 VLANs, that means 10 NIDs for each server.

Besides, if we export several filesets to the same client, this means we need several LNETs on the same client (e.g. tcp10 for fileset1, tcp11 for fileset2, tcp12 for fileset3). This would mean 3 NIDs per VLAN.

That means that in the hypothetic configuration above we would need to define 30 NIDs for each server and at least 60 NIDs for each device (failover NIDs). Currently Lustre supports around 20 NIDs per device, above this number the configuration will silently fail and only after trying to unsuccessfully mount the Lustre client we could realize that the config in the device is truncated after around 20 NIDs.

Lustre routers can help to scale with problem 1 (multiple VLANs) but they cannot help if we have multiple filesets per client (maximum of 20 filesets per client in a routed environment).

It would be a good improvement if this limit could be increased now that new Lustre use cases are in play.

Attachments

Issue Links

is related to

LU-18815 proceed gracefully with more than 32 NIDs configured on server

Open

LU-18417 Finish IPv6 support

Open

LU-18986 Target registration to handle large amount of NIDs

Open

is related to

LU-10391 LNET: Support IPv6

Resolved

Activity

[LU-11978] Increase maximum number of configured NIDs per device

Andreas Dilger added a comment - 22/Mar/25 9:33 AM

simmonsja, I don't see much value in keeping this 6-year-old ticket open when there are a number of other tickets (e.g. LU-18815) actively working on the same issue.

Andreas Dilger added a comment - 22/Mar/25 9:33 AM simmonsja , I don't see much value in keeping this 6-year-old ticket open when there are a number of other tickets (e.g. LU-18815 ) actively working on the same issue.

James A Simmons added a comment - 21/Mar/25 6:58 PM

I disagree. Mike's patch only gracefully handles requesting more than 32 in that it returns a over flow error. Not actually supporting more than 32 NIDs. I have a prototype patch but its got bugs to work out.

James A Simmons added a comment - 21/Mar/25 6:58 PM I disagree. Mike's patch only gracefully handles requesting more than 32 in that it returns a over flow error. Not actually supporting more than 32 NIDs. I have a prototype patch but its got bugs to work out.

Andreas Dilger added a comment - 18/Mar/25 6:32 AM

This is being fixed by linked tickets.

Andreas Dilger added a comment - 18/Mar/25 6:32 AM This is being fixed by linked tickets.

Diego Moreno added a comment - 14/Mar/19 2:58 PM

Sebastien,

Thanks for the very detailed answer. I agree with you that both, the feature and the patch, help to solve this complicated use-case of multi-tenancy. Together with filesets and dynamic LNET configuration it should be possible to do all we need in that context.

Thanks for the help!

Diego Moreno added a comment - 14/Mar/19 2:58 PM Sebastien, Thanks for the very detailed answer. I agree with you that both, the feature and the patch, help to solve this complicated use-case of multi-tenancy. Together with filesets and dynamic LNET configuration it should be possible to do all we need in that context. Thanks for the help!

Sebastien Buisson added a comment - 11/Mar/19 4:56 PM

Hi Diego,

After discussing with Amir, it seems there are two options to help with your configuration:

with LNet health feature introduced in 2.12, there is no need to specify targets' service NIDs that are on the same node (or even better on the same interface). Indeed, thanks to LNet health, the failover between various NIDs on the same node is handled directly at the LNet level (LNet knows about the NIDs pertaining to the same node because they are reported along with 'pings'). So upper layers need to specify as service NIDs only NIDs that are on different nodes. So potentially in the use case you are describing, with LNet health you may reduce the number of service NIDs to only 4: 'primary' NID on tcp-based network, 'primary' NID on IB-based network, and 2 others for actual failover node.
in the case where you are using routers, the restriction of having only one route (ie via one LNet) to reach a remote LNet can easily be removed, thanks to patch https://review.whamcloud.com/33447. It will enable you to declare tenant-specific LNet networks only on client side. On server side, you can only have one LNet per network type/interface, and then one router for each tenant network.
To illustrate this, let's consider the following setup, like a light version of yours:
```
client:
tcp10(eth0),tcp11(eth0)
server:
tcp0(eth0)
router1:
tcp0(eth0),tcp10(eth1)
router2:
tcp0(eth0),tcp11(eth1)
```
And the following routes:
```
routes="tcp10 router1@tcp0;
              tcp11 router2@tcp0;
              tcp0 router1@tcp10;
              tcp0 router2@tcp11"
```
Without patch #33447, it fails with an error message like:
```
LNetError: 52886:0:(router.c:489:lnet_check_routes()) Routes to tcp0 via router1@tcp10 and router2@tcp11 not supported
```
But with patch #33447, this routing configuration is permitted.

HTH,
Sebastien.

Sebastien Buisson added a comment - 11/Mar/19 4:56 PM Hi Diego, After discussing with Amir, it seems there are two options to help with your configuration: with LNet health feature introduced in 2.12, there is no need to specify targets' service NIDs that are on the same node (or even better on the same interface). Indeed, thanks to LNet health, the failover between various NIDs on the same node is handled directly at the LNet level (LNet knows about the NIDs pertaining to the same node because they are reported along with 'pings'). So upper layers need to specify as service NIDs only NIDs that are on different nodes. So potentially in the use case you are describing, with LNet health you may reduce the number of service NIDs to only 4: 'primary' NID on tcp-based network, 'primary' NID on IB-based network, and 2 others for actual failover node. in the case where you are using routers, the restriction of having only one route (ie via one LNet) to reach a remote LNet can easily be removed, thanks to patch https://review.whamcloud.com/33447 . It will enable you to declare tenant-specific LNet networks only on client side. On server side, you can only have one LNet per network type/interface, and then one router for each tenant network. To illustrate this, let's consider the following setup, like a light version of yours: client: tcp10(eth0),tcp11(eth0) server: tcp0(eth0) router1: tcp0(eth0),tcp10(eth1) router2: tcp0(eth0),tcp11(eth1) And the following routes: routes="tcp10 router1@tcp0; tcp11 router2@tcp0; tcp0 router1@tcp10; tcp0 router2@tcp11" Without patch #33447, it fails with an error message like: LNetError: 52886:0:(router.c:489:lnet_check_routes()) Routes to tcp0 via router1@tcp10 and router2@tcp11 not supported But with patch #33447, this routing configuration is permitted. HTH, Sebastien.

Diego Moreno added a comment - 26/Feb/19 4:19 PM

This is currently the example on an OST, where we have 22 NIDs per server so, with just one failover server, a total of 44 NIDs defined:

[root@le-oss01 ]# tunefs.lustre /dev/sfa0003
checking for existing Lustre data: found
Reading CONFIGS/mountdata

Read previous values:
Target: fs1-OST0002
Index: 2
Lustre FS: fs1
Mount type: ldiskfs
Flags: 0x1002
 (OST no_primnode )
Persistent mount opts: ,errors=remount-ro
Parameters: failover.node=10.207.208.121@tcp1001,10.207.64.121@o2ib,10.204.64.121@tcp1101,10.204.64.121@tcp1102,10.212.0.121@tcp1111,10.212.0.121@tcp1112,10.212.0.121@tcp1113,10.212.0.121@tcp1114,10.212.0.121@tcp1115,10.212.16.121@tcp1121,10.212.16.121@tcp1122,10.212.16.121@tcp1123,10.212.32.121@tcp1131,10.212.32.121@tcp1132,10.212.32.121@tcp1133,10.212.48.121@tcp1141,10.212.48.121@tcp1142,10.212.64.121@tcp1151,10.212.64.121@tcp1152,10.212.80.121@tcp1161,10.212.96.121@tcp1171,10.212.112.121@tcp1181:10.207.208.122@tcp1001,10.207.64.122@o2ib,10.204.64.122@tcp1101,10.204.64.122@tcp1102,10.212.0.122@tcp1111,10.212.0.122@tcp1112,10.212.0.122@tcp1113,10.212.0.122@tcp1114,10.212.0.122@tcp1115,10.212.16.122@tcp1121,10.212.16.122@tcp1122,10.212.16.122@tcp1123,10.212.32.122@tcp1131,10.212.32.122@tcp1132,10.212.32.122@tcp1133,10.212.48.122@tcp1141,10.212.48.122@tcp1142,10.212.64.122@tcp1151,10.212.64.122@tcp1152,10.212.80.122@tcp1161,10.212.96.122@tcp1171,10.212.112.122@tcp1181 mgsnode=10.207.208.102@tcp1001:10.207.208.101@tcp1001

When I tried to find the limiting number of NIDs it was around 45 NIDs, the 46th NID was truncated in the MGS and when the client tried to mount the filesystem it failed because the 46th NID was basically trash. This configuration for instance is used as follows:

Tenant 1 has configured tcp1101, tcp1102 and mounts fileset1 (tcp1101) and fileset2 (tcp1102)
Tenant 2 has configured tcp1111, tcp1112, tcp1113, tcp1114 and tcp1115 and mounts fileset3, fileset4, fileset2 and fileset5. It could eventually mount another fileset on the last tcp.
Tenant 3 has configured tcp1171 and it just mounts fileset7

This is just an example but you can find a more detailed explanation on this presentation from LAD'18: Multi-tenancy at ETHZ

In practical terms the more complex the configuration is the more we need Lustre routers to help simplifying the configuration. That's also what we'll do in the future as soon as we have more tenants. But even with Lustre routers there's still the need of 1 LNET per fileset and per mapping client <- router -> server. That would mean that in a configuration with 4 failover OSSes then there would be a hard limitation of 11 filesets that can be mounted at the same time from a tenant (this comes from the 45 NID limitation). It seems extreme, ok, but I guess our current configuration was also considered "extreme" at the time the limitation on the number of NIDs was introduced...

Another alternative could be maybe to manage the MGS configuration in a different way and not hard-coding the NIDs on the OSTs/MDTs. I don't know if there's any feature request/development on that regard.

Diego Moreno added a comment - 26/Feb/19 4:19 PM This is currently the example on an OST, where we have 22 NIDs per server so, with just one failover server, a total of 44 NIDs defined: [root@le-oss01 ]# tunefs.lustre /dev/sfa0003 checking for existing Lustre data: found Reading CONFIGS/mountdata Read previous values: Target: fs1-OST0002 Index: 2 Lustre FS: fs1 Mount type: ldiskfs Flags: 0x1002 (OST no_primnode ) Persistent mount opts: ,errors=remount-ro Parameters: failover.node=10.207.208.121@tcp1001,10.207.64.121@o2ib,10.204.64.121@tcp1101,10.204.64.121@tcp1102,10.212.0.121@tcp1111,10.212.0.121@tcp1112,10.212.0.121@tcp1113,10.212.0.121@tcp1114,10.212.0.121@tcp1115,10.212.16.121@tcp1121,10.212.16.121@tcp1122,10.212.16.121@tcp1123,10.212.32.121@tcp1131,10.212.32.121@tcp1132,10.212.32.121@tcp1133,10.212.48.121@tcp1141,10.212.48.121@tcp1142,10.212.64.121@tcp1151,10.212.64.121@tcp1152,10.212.80.121@tcp1161,10.212.96.121@tcp1171,10.212.112.121@tcp1181:10.207.208.122@tcp1001,10.207.64.122@o2ib,10.204.64.122@tcp1101,10.204.64.122@tcp1102,10.212.0.122@tcp1111,10.212.0.122@tcp1112,10.212.0.122@tcp1113,10.212.0.122@tcp1114,10.212.0.122@tcp1115,10.212.16.122@tcp1121,10.212.16.122@tcp1122,10.212.16.122@tcp1123,10.212.32.122@tcp1131,10.212.32.122@tcp1132,10.212.32.122@tcp1133,10.212.48.122@tcp1141,10.212.48.122@tcp1142,10.212.64.122@tcp1151,10.212.64.122@tcp1152,10.212.80.122@tcp1161,10.212.96.122@tcp1171,10.212.112.122@tcp1181 mgsnode=10.207.208.102@tcp1001:10.207.208.101@tcp1001 When I tried to find the limiting number of NIDs it was around 45 NIDs, the 46th NID was truncated in the MGS and when the client tried to mount the filesystem it failed because the 46th NID was basically trash. This configuration for instance is used as follows: Tenant 1 has configured tcp1101, tcp1102 and mounts fileset1 (tcp1101) and fileset2 (tcp1102) Tenant 2 has configured tcp1111, tcp1112, tcp1113, tcp1114 and tcp1115 and mounts fileset3, fileset4, fileset2 and fileset5. It could eventually mount another fileset on the last tcp. Tenant 3 has configured tcp1171 and it just mounts fileset7 This is just an example but you can find a more detailed explanation on this presentation from LAD'18: Multi-tenancy at ETHZ In practical terms the more complex the configuration is the more we need Lustre routers to help simplifying the configuration. That's also what we'll do in the future as soon as we have more tenants. But even with Lustre routers there's still the need of 1 LNET per fileset and per mapping client <- router -> server. That would mean that in a configuration with 4 failover OSSes then there would be a hard limitation of 11 filesets that can be mounted at the same time from a tenant (this comes from the 45 NID limitation). It seems extreme, ok, but I guess our current configuration was also considered "extreme" at the time the limitation on the number of NIDs was introduced... Another alternative could be maybe to manage the MGS configuration in a different way and not hard-coding the NIDs on the OSTs/MDTs. I don't know if there's any feature request/development on that regard.

Amir Shehata (Inactive) added a comment - 20/Feb/19 6:31 PM

Can you please share the configuration that you're using?

Amir Shehata (Inactive) added a comment - 20/Feb/19 6:31 PM Can you please share the configuration that you're using?

People

Assignee:: James A Simmons

Reporter:: Diego Moreno

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 20/Feb/19 7:46 AM

Updated:: 07/May/25 1:54 PM