Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11978

Increase maximum number of configured NIDs per device

Details

    • Improvement
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.10.5
    • None
    • Multi-tenancy, several filesets per node
    • 9223372036854775807

    Description

      We're finding some sort of limitation in our particular use case for Lustre where we export Lustre to many different VLANs with many filesets. Because the VLANs are directly connected to the servers (the servers' interfaces are connected to trunk ports on the switches) we need to specify, for each VLAN, a different set of NIDs for each device. If we have, for instance, 10 VLANs, that means 10 NIDs for each server.

      Besides, if we export several filesets to the same client, this means we need several LNETs on the same client (e.g. tcp10 for fileset1, tcp11 for fileset2, tcp12 for fileset3). This would mean 3 NIDs per VLAN.

      That means that in the hypothetic configuration above we would need to define 30 NIDs for each server and at least 60 NIDs for each device (failover NIDs). Currently Lustre supports around 20 NIDs per device, above this number the configuration will silently fail and only after trying to unsuccessfully mount the Lustre client we could realize that the config in the device is truncated after around 20 NIDs.

      Lustre routers can help to scale with problem 1 (multiple VLANs) but they cannot help if we have multiple filesets per client (maximum of 20 filesets per client in a routed environment).

      It would be a good improvement if this limit could be increased now that new Lustre use cases are in play.

      Attachments

        Issue Links

          Activity

            [LU-11978] Increase maximum number of configured NIDs per device

            simmonsja, I don't see much value in keeping this 6-year-old ticket open when there are a number of other tickets (e.g. LU-18815) actively working on the same issue.

            adilger Andreas Dilger added a comment - simmonsja , I don't see much value in keeping this 6-year-old ticket open when there are a number of other tickets (e.g. LU-18815 ) actively working on the same issue.

            I disagree. Mike's patch only gracefully handles requesting more than 32 in that it returns a over flow error. Not actually supporting more than 32 NIDs. I have a prototype patch but its got bugs to work out.

            simmonsja James A Simmons added a comment - I disagree. Mike's patch only gracefully handles requesting more than 32 in that it returns a over flow error. Not actually supporting more than 32 NIDs. I have a prototype patch but its got bugs to work out.

            This is being fixed by linked tickets.

            adilger Andreas Dilger added a comment - This is being fixed by linked tickets.
            moreno Diego Moreno added a comment -

            Sebastien,

            Thanks for the very detailed answer. I agree with you that both, the feature and the patch, help to solve this complicated use-case of multi-tenancy. Together with filesets and dynamic LNET configuration it should be possible to do all we need in that context.

             

            Thanks for the help!

            moreno Diego Moreno added a comment - Sebastien, Thanks for the very detailed answer. I agree with you that both, the feature and the patch, help to solve this complicated use-case of multi-tenancy. Together with filesets and dynamic LNET configuration it should be possible to do all we need in that context.   Thanks for the help!

            Hi Diego,

            After discussing with Amir, it seems there are two options to help with your configuration:

            • with LNet health feature introduced in 2.12, there is no need to specify targets' service NIDs that are on the same node (or even better on the same interface). Indeed, thanks to LNet health, the failover between various NIDs on the same node is handled directly at the LNet level (LNet knows about the NIDs pertaining to the same node because they are reported along with 'pings'). So upper layers need to specify as service NIDs only NIDs that are on different nodes. So potentially in the use case you are describing, with LNet health you may reduce the number of service NIDs to only 4: 'primary' NID on tcp-based network, 'primary' NID on IB-based network, and 2 others for actual failover node.
            • in the case where you are using routers, the restriction of having only one route (ie via one LNet) to reach a remote LNet can easily be removed, thanks to patch https://review.whamcloud.com/33447. It will enable you to declare tenant-specific LNet networks only on client side. On server side, you can only have one LNet per network type/interface, and then one router for each tenant network.
              To illustrate this, let's consider the following setup, like a light version of yours:
              client:
              tcp10(eth0),tcp11(eth0)
              server:
              tcp0(eth0)
              router1:
              tcp0(eth0),tcp10(eth1)
              router2:
              tcp0(eth0),tcp11(eth1)
              

              And the following routes:

              routes="tcp10 router1@tcp0;
                            tcp11 router2@tcp0;
                            tcp0 router1@tcp10;
                            tcp0 router2@tcp11"
              

              Without patch #33447, it fails with an error message like:

              LNetError: 52886:0:(router.c:489:lnet_check_routes()) Routes to tcp0 via router1@tcp10 and router2@tcp11 not supported
              

              But with patch #33447, this routing configuration is permitted.

            HTH,
            Sebastien.

            sebastien Sebastien Buisson added a comment - Hi Diego, After discussing with Amir, it seems there are two options to help with your configuration: with LNet health feature introduced in 2.12, there is no need to specify targets' service NIDs that are on the same node (or even better on the same interface). Indeed, thanks to LNet health, the failover between various NIDs on the same node is handled directly at the LNet level (LNet knows about the NIDs pertaining to the same node because they are reported along with 'pings'). So upper layers need to specify as service NIDs only NIDs that are on different nodes. So potentially in the use case you are describing, with LNet health you may reduce the number of service NIDs to only 4: 'primary' NID on tcp-based network, 'primary' NID on IB-based network, and 2 others for actual failover node. in the case where you are using routers, the restriction of having only one route (ie via one LNet) to reach a remote LNet can easily be removed, thanks to patch https://review.whamcloud.com/33447 . It will enable you to declare tenant-specific LNet networks only on client side. On server side, you can only have one LNet per network type/interface, and then one router for each tenant network. To illustrate this, let's consider the following setup, like a light version of yours: client: tcp10(eth0),tcp11(eth0) server: tcp0(eth0) router1: tcp0(eth0),tcp10(eth1) router2: tcp0(eth0),tcp11(eth1) And the following routes: routes="tcp10 router1@tcp0; tcp11 router2@tcp0; tcp0 router1@tcp10; tcp0 router2@tcp11" Without patch #33447, it fails with an error message like: LNetError: 52886:0:(router.c:489:lnet_check_routes()) Routes to tcp0 via router1@tcp10 and router2@tcp11 not supported But with patch #33447, this routing configuration is permitted. HTH, Sebastien.
            moreno Diego Moreno added a comment -

            This is currently the example on an OST, where we have 22 NIDs per server so, with just one failover server, a total of 44 NIDs defined:

             

            [root@le-oss01 ]# tunefs.lustre /dev/sfa0003
            checking for existing Lustre data: found
            Reading CONFIGS/mountdata
            
            Read previous values:
            Target: fs1-OST0002
            Index: 2
            Lustre FS: fs1
            Mount type: ldiskfs
            Flags: 0x1002
             (OST no_primnode )
            Persistent mount opts: ,errors=remount-ro
            Parameters: failover.node=10.207.208.121@tcp1001,10.207.64.121@o2ib,10.204.64.121@tcp1101,10.204.64.121@tcp1102,10.212.0.121@tcp1111,10.212.0.121@tcp1112,10.212.0.121@tcp1113,10.212.0.121@tcp1114,10.212.0.121@tcp1115,10.212.16.121@tcp1121,10.212.16.121@tcp1122,10.212.16.121@tcp1123,10.212.32.121@tcp1131,10.212.32.121@tcp1132,10.212.32.121@tcp1133,10.212.48.121@tcp1141,10.212.48.121@tcp1142,10.212.64.121@tcp1151,10.212.64.121@tcp1152,10.212.80.121@tcp1161,10.212.96.121@tcp1171,10.212.112.121@tcp1181:10.207.208.122@tcp1001,10.207.64.122@o2ib,10.204.64.122@tcp1101,10.204.64.122@tcp1102,10.212.0.122@tcp1111,10.212.0.122@tcp1112,10.212.0.122@tcp1113,10.212.0.122@tcp1114,10.212.0.122@tcp1115,10.212.16.122@tcp1121,10.212.16.122@tcp1122,10.212.16.122@tcp1123,10.212.32.122@tcp1131,10.212.32.122@tcp1132,10.212.32.122@tcp1133,10.212.48.122@tcp1141,10.212.48.122@tcp1142,10.212.64.122@tcp1151,10.212.64.122@tcp1152,10.212.80.122@tcp1161,10.212.96.122@tcp1171,10.212.112.122@tcp1181 mgsnode=10.207.208.102@tcp1001:10.207.208.101@tcp1001
            

            When I tried to find the limiting number of NIDs it was around 45 NIDs, the 46th NID was truncated in the MGS and when the client tried to mount the filesystem it failed because the 46th NID was basically trash. This configuration for instance is used as follows:

            • Tenant 1 has configured tcp1101, tcp1102 and mounts fileset1 (tcp1101) and fileset2 (tcp1102)
            • Tenant 2 has configured tcp1111, tcp1112, tcp1113, tcp1114 and tcp1115 and mounts fileset3, fileset4, fileset2 and fileset5. It could eventually mount another fileset on the last tcp.
            • Tenant 3 has configured tcp1171 and it just mounts fileset7

            This is just an example but you can find a more detailed explanation on this presentation from LAD'18: Multi-tenancy at ETHZ

            In practical terms the more complex the configuration is the more we need Lustre routers to help simplifying the configuration. That's also what we'll do in the future as soon as we have more tenants. But even with Lustre routers there's still the need of 1 LNET per fileset and per mapping client <- router -> server. That would mean that in a configuration with 4 failover OSSes then there would be a hard limitation of 11 filesets that can be mounted at the same time from a tenant (this comes from the 45 NID limitation). It seems extreme, ok, but I guess our current configuration was also considered "extreme" at the time the limitation on the number of NIDs was introduced...

            Another alternative could be maybe to manage the MGS configuration in a different way and not hard-coding the NIDs on the OSTs/MDTs. I don't know if there's any feature request/development on that regard.

            moreno Diego Moreno added a comment - This is currently the example on an OST, where we have 22 NIDs per server so, with just one failover server, a total of 44 NIDs defined:   [root@le-oss01 ]# tunefs.lustre /dev/sfa0003 checking for existing Lustre data: found Reading CONFIGS/mountdata Read previous values: Target: fs1-OST0002 Index: 2 Lustre FS: fs1 Mount type: ldiskfs Flags: 0x1002 (OST no_primnode ) Persistent mount opts: ,errors=remount-ro Parameters: failover.node=10.207.208.121@tcp1001,10.207.64.121@o2ib,10.204.64.121@tcp1101,10.204.64.121@tcp1102,10.212.0.121@tcp1111,10.212.0.121@tcp1112,10.212.0.121@tcp1113,10.212.0.121@tcp1114,10.212.0.121@tcp1115,10.212.16.121@tcp1121,10.212.16.121@tcp1122,10.212.16.121@tcp1123,10.212.32.121@tcp1131,10.212.32.121@tcp1132,10.212.32.121@tcp1133,10.212.48.121@tcp1141,10.212.48.121@tcp1142,10.212.64.121@tcp1151,10.212.64.121@tcp1152,10.212.80.121@tcp1161,10.212.96.121@tcp1171,10.212.112.121@tcp1181:10.207.208.122@tcp1001,10.207.64.122@o2ib,10.204.64.122@tcp1101,10.204.64.122@tcp1102,10.212.0.122@tcp1111,10.212.0.122@tcp1112,10.212.0.122@tcp1113,10.212.0.122@tcp1114,10.212.0.122@tcp1115,10.212.16.122@tcp1121,10.212.16.122@tcp1122,10.212.16.122@tcp1123,10.212.32.122@tcp1131,10.212.32.122@tcp1132,10.212.32.122@tcp1133,10.212.48.122@tcp1141,10.212.48.122@tcp1142,10.212.64.122@tcp1151,10.212.64.122@tcp1152,10.212.80.122@tcp1161,10.212.96.122@tcp1171,10.212.112.122@tcp1181 mgsnode=10.207.208.102@tcp1001:10.207.208.101@tcp1001 When I tried to find the limiting number of NIDs it was around 45 NIDs, the 46th NID was truncated in the MGS and when the client tried to mount the filesystem it failed because the 46th NID was basically trash. This configuration for instance is used as follows: Tenant 1 has configured tcp1101, tcp1102 and mounts fileset1 (tcp1101) and fileset2 (tcp1102) Tenant 2 has configured tcp1111, tcp1112, tcp1113, tcp1114 and tcp1115 and mounts fileset3, fileset4, fileset2 and fileset5. It could eventually mount another fileset on the last tcp. Tenant 3 has configured tcp1171 and it just mounts fileset7 This is just an example but you can find a more detailed explanation on this presentation from LAD'18: Multi-tenancy at ETHZ In practical terms the more complex the configuration is the more we need Lustre routers to help simplifying the configuration. That's also what we'll do in the future as soon as we have more tenants. But even with Lustre routers there's still the need of 1 LNET per fileset and per mapping client <- router -> server. That would mean that in a configuration with 4 failover OSSes then there would be a hard limitation of 11 filesets that can be mounted at the same time from a tenant (this comes from the 45 NID limitation). It seems extreme, ok, but I guess our current configuration was also considered "extreme" at the time the limitation on the number of NIDs was introduced... Another alternative could be maybe to manage the MGS configuration in a different way and not hard-coding the NIDs on the OSTs/MDTs. I don't know if there's any feature request/development on that regard.

            Can you please share the configuration that you're using?

            ashehata Amir Shehata (Inactive) added a comment - Can you please share the configuration that you're using?

            People

              simmonsja James A Simmons
              moreno Diego Moreno
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated: