[LU-11978] Increase maximum number of configured NIDs per device Created: 20/Feb/19  Updated: 14/Oct/20

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.5
Fix Version/s: None

Type: Improvement Priority: Major
Reporter: Diego Moreno Assignee: Amir Shehata (Inactive)
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Multi-tenancy, several filesets per node


Issue Links:
Related
Epic/Theme: mgs
Rank (Obsolete): 9223372036854775807

 Description   

We're finding some sort of limitation in our particular use case for Lustre where we export Lustre to many different VLANs with many filesets. Because the VLANs are directly connected to the servers (the servers' interfaces are connected to trunk ports on the switches) we need to specify, for each VLAN, a different set of NIDs for each device. If we have, for instance, 10 VLANs, that means 10 NIDs for each server.

Besides, if we export several filesets to the same client, this means we need several LNETs on the same client (e.g. tcp10 for fileset1, tcp11 for fileset2, tcp12 for fileset3). This would mean 3 NIDs per VLAN.

That means that in the hypothetic configuration above we would need to define 30 NIDs for each server and at least 60 NIDs for each device (failover NIDs). Currently Lustre supports around 20 NIDs per device, above this number the configuration will silently fail and only after trying to unsuccessfully mount the Lustre client we could realize that the config in the device is truncated after around 20 NIDs.

Lustre routers can help to scale with problem 1 (multiple VLANs) but they cannot help if we have multiple filesets per client (maximum of 20 filesets per client in a routed environment).

It would be a good improvement if this limit could be increased now that new Lustre use cases are in play.



 Comments   
Comment by Amir Shehata (Inactive) [ 20/Feb/19 ]

Can you please share the configuration that you're using?

Comment by Diego Moreno [ 26/Feb/19 ]

This is currently the example on an OST, where we have 22 NIDs per server so, with just one failover server, a total of 44 NIDs defined:

 

[root@le-oss01 ]# tunefs.lustre /dev/sfa0003
checking for existing Lustre data: found
Reading CONFIGS/mountdata

Read previous values:
Target: fs1-OST0002
Index: 2
Lustre FS: fs1
Mount type: ldiskfs
Flags: 0x1002
 (OST no_primnode )
Persistent mount opts: ,errors=remount-ro
Parameters: failover.node=10.207.208.121@tcp1001,10.207.64.121@o2ib,10.204.64.121@tcp1101,10.204.64.121@tcp1102,10.212.0.121@tcp1111,10.212.0.121@tcp1112,10.212.0.121@tcp1113,10.212.0.121@tcp1114,10.212.0.121@tcp1115,10.212.16.121@tcp1121,10.212.16.121@tcp1122,10.212.16.121@tcp1123,10.212.32.121@tcp1131,10.212.32.121@tcp1132,10.212.32.121@tcp1133,10.212.48.121@tcp1141,10.212.48.121@tcp1142,10.212.64.121@tcp1151,10.212.64.121@tcp1152,10.212.80.121@tcp1161,10.212.96.121@tcp1171,10.212.112.121@tcp1181:10.207.208.122@tcp1001,10.207.64.122@o2ib,10.204.64.122@tcp1101,10.204.64.122@tcp1102,10.212.0.122@tcp1111,10.212.0.122@tcp1112,10.212.0.122@tcp1113,10.212.0.122@tcp1114,10.212.0.122@tcp1115,10.212.16.122@tcp1121,10.212.16.122@tcp1122,10.212.16.122@tcp1123,10.212.32.122@tcp1131,10.212.32.122@tcp1132,10.212.32.122@tcp1133,10.212.48.122@tcp1141,10.212.48.122@tcp1142,10.212.64.122@tcp1151,10.212.64.122@tcp1152,10.212.80.122@tcp1161,10.212.96.122@tcp1171,10.212.112.122@tcp1181 mgsnode=10.207.208.102@tcp1001:10.207.208.101@tcp1001

When I tried to find the limiting number of NIDs it was around 45 NIDs, the 46th NID was truncated in the MGS and when the client tried to mount the filesystem it failed because the 46th NID was basically trash. This configuration for instance is used as follows:

  • Tenant 1 has configured tcp1101, tcp1102 and mounts fileset1 (tcp1101) and fileset2 (tcp1102)
  • Tenant 2 has configured tcp1111, tcp1112, tcp1113, tcp1114 and tcp1115 and mounts fileset3, fileset4, fileset2 and fileset5. It could eventually mount another fileset on the last tcp.
  • Tenant 3 has configured tcp1171 and it just mounts fileset7

This is just an example but you can find a more detailed explanation on this presentation from LAD'18: Multi-tenancy at ETHZ

In practical terms the more complex the configuration is the more we need Lustre routers to help simplifying the configuration. That's also what we'll do in the future as soon as we have more tenants. But even with Lustre routers there's still the need of 1 LNET per fileset and per mapping client <- router -> server. That would mean that in a configuration with 4 failover OSSes then there would be a hard limitation of 11 filesets that can be mounted at the same time from a tenant (this comes from the 45 NID limitation). It seems extreme, ok, but I guess our current configuration was also considered "extreme" at the time the limitation on the number of NIDs was introduced...

Another alternative could be maybe to manage the MGS configuration in a different way and not hard-coding the NIDs on the OSTs/MDTs. I don't know if there's any feature request/development on that regard.

Comment by Sebastien Buisson [ 11/Mar/19 ]

Hi Diego,

After discussing with Amir, it seems there are two options to help with your configuration:

  • with LNet health feature introduced in 2.12, there is no need to specify targets' service NIDs that are on the same node (or even better on the same interface). Indeed, thanks to LNet health, the failover between various NIDs on the same node is handled directly at the LNet level (LNet knows about the NIDs pertaining to the same node because they are reported along with 'pings'). So upper layers need to specify as service NIDs only NIDs that are on different nodes. So potentially in the use case you are describing, with LNet health you may reduce the number of service NIDs to only 4: 'primary' NID on tcp-based network, 'primary' NID on IB-based network, and 2 others for actual failover node.
  • in the case where you are using routers, the restriction of having only one route (ie via one LNet) to reach a remote LNet can easily be removed, thanks to patch https://review.whamcloud.com/33447. It will enable you to declare tenant-specific LNet networks only on client side. On server side, you can only have one LNet per network type/interface, and then one router for each tenant network.
    To illustrate this, let's consider the following setup, like a light version of yours:
    client:
    tcp10(eth0),tcp11(eth0)
    server:
    tcp0(eth0)
    router1:
    tcp0(eth0),tcp10(eth1)
    router2:
    tcp0(eth0),tcp11(eth1)
    

    And the following routes:

    routes="tcp10 router1@tcp0;
                  tcp11 router2@tcp0;
                  tcp0 router1@tcp10;
                  tcp0 router2@tcp11"
    

    Without patch #33447, it fails with an error message like:

    LNetError: 52886:0:(router.c:489:lnet_check_routes()) Routes to tcp0 via router1@tcp10 and router2@tcp11 not supported
    

    But with patch #33447, this routing configuration is permitted.

HTH,
Sebastien.

Comment by Diego Moreno [ 14/Mar/19 ]

Sebastien,

Thanks for the very detailed answer. I agree with you that both, the feature and the patch, help to solve this complicated use-case of multi-tenancy. Together with filesets and dynamic LNET configuration it should be possible to do all we need in that context.

 

Thanks for the help!

Generated at Sat Feb 10 02:48:36 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.