Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.7.0
    • None
    • 3
    • 17076

    Description

      While investigating the number of maximum possible NI's that can be added with DLC, I found that the structure lnet_ping_info_t is only expected to have a maximum of 16 entries in pi_ni.

      ./lnet/include/lnet/lib-types.h:460

      typedef struct {
      __u32 pi_magic;
      __u32 pi_features;
      lnet_pid_t pi_pid;
      __u32 pi_nnis;
      lnet_ni_status_t pi_ni[0];
      } WIRE_ATTR lnet_ping_info_t;

      /* router checker data, per router */
      #define LNET_MAX_RTR_NIS 16
      #define LNET_PINGINFO_SIZE offsetof(lnet_ping_info_t, pi_ni[LNET_MAX_RTR_NIS])

      Following the call chain from adding a new NI with DLC....

      ./lnet/utils/lnetconfig/liblnetconfig.c:
      l_ioctl(LNET_DEV_ID, IOC_LIBCFS_ADD_NET, &data)

      ./lnet/lnet/module.c:
      l_ioctl() -> lnet_dyn_configure() -> lnet_dyn_add_ni()

      lnet/lnet/api-ni.c:
      lnet_dyn_add_ni() -> lnet_ping_info_setup() -> lnet_ping_info_create()

      This is where the pi_ni array mentioned above is used:

      lnet_ping_info_create(int num_ni)
      ...
      infosz = offsetof(lnet_ping_info_t, pi_ni[num_ni]);

      This limit would appear designed to apply to routers, but what about an OSS with an arbitrary number (N) of NI's (e.g. o2ib0…o2ibN)?

      Attachments

        Issue Links

          Activity

            [LU-6130] Number of LNET NI's limited

            Currently there is a default limit of 64K for the lnetctl net show. I don't know what that works out too but that limit could easily be increased to 4MB message which would cover 25,000+ of NIs.

            simmonsja James A Simmons added a comment - Currently there is a default limit of 64K for the lnetctl net show. I don't know what that works out too but that limit could easily be increased to 4MB message which would cover 25,000+ of NIs.

            hornc, simmonsja, Is this still an issue or what is the current limit of NIs on a single system?

            adilger Andreas Dilger added a comment - hornc , simmonsja , Is this still an issue or what is the current limit of NIs on a single system?
            jhammond John Hammond added a comment -

            More discussion from Andreas on LU-7562.

            jhammond John Hammond added a comment - More discussion from Andreas on LU-7562 .

            I'll create a ticket to track this enhancement. I do agree that we should be stepping away from hard coded static values.

            ashehata Amir Shehata (Inactive) added a comment - I'll create a ticket to track this enhancement. I do agree that we should be stepping away from hard coded static values.

            Yes, that was a clear explanation. Thank you, Amir. I understand now. So the number of LNET NI's should not be limited, which strictly speaking, was the original concern of my ticket. Creating more than 16 "flat" NI's is desirable. However, as James alluded to, this limitation could be a problem as soon as a router is needed in this configuration (o2ib to tcp, or segmented IB networks).

            blakecaldwell Blake Caldwell added a comment - Yes, that was a clear explanation. Thank you, Amir. I understand now. So the number of LNET NI's should not be limited, which strictly speaking, was the original concern of my ticket. Creating more than 16 "flat" NI's is desirable. However, as James alluded to, this limitation could be a problem as soon as a router is needed in this configuration (o2ib to tcp, or segmented IB networks).

            Today's corner cases will be tomorrows default settings. Perhaps we should look at fixing this. Besides LNET_MAX_RTR_NIS also LNET_PINGINFO_SIZE would have to be handled dynamically.

            simmonsja James A Simmons added a comment - Today's corner cases will be tomorrows default settings. Perhaps we should look at fixing this. Besides LNET_MAX_RTR_NIS also LNET_PINGINFO_SIZE would have to be handled dynamically.
            yujian Jian Yu added a comment -

            Hi Blake,

            Did Amir answer your question? If not, could you please explain your concern further?

            yujian Jian Yu added a comment - Hi Blake, Did Amir answer your question? If not, could you please explain your concern further?

            When creating/adjusting the ping info when adding an NI, there is no limit on the size of the ping info. So it can expand to include all the NIs on the node.

            However, when pinging a node the current code is only interested in the first 16 NIDs of the router. This could be an issue in the following scenario:

            Router A has 17 NIs, 0 to 16
            Node B has a route to Router A using the NID-16.
            Node B is configured to ping routers to check if they are alive
            Node B pings Router A NID-16
            Router A responds with NIDs 0-15
            Node B iterates all Router A's NIDs but doesn't find NID-16, so it considers that Router dead.

            This is a corner scenario, since it requires a router with > 16 NIDs and other nodes have routes to the router using NIDs > 16 as a route.

            This also affect lctl ping, but this is not critical, since lctl ping is only used by sysadmins to ping nodes. I don't think it's a problem since ping response will still be displayed, but only the first 16 NIDs will be displayed.

            The reason for the number 16, is because we have to allocate a specific size to receive ping info. When we ping a node we don't know how many NIDs it has so we need to select a size for the buffer to receive the ping info. 16 NIDs seemed like a reasonable number to go with.

            I'm not sure about your concern with regards to the OSS number of NIs. Could you please explain your concern further.

            ashehata Amir Shehata (Inactive) added a comment - When creating/adjusting the ping info when adding an NI, there is no limit on the size of the ping info. So it can expand to include all the NIs on the node. However, when pinging a node the current code is only interested in the first 16 NIDs of the router. This could be an issue in the following scenario: Router A has 17 NIs, 0 to 16 Node B has a route to Router A using the NID-16. Node B is configured to ping routers to check if they are alive Node B pings Router A NID-16 Router A responds with NIDs 0-15 Node B iterates all Router A's NIDs but doesn't find NID-16, so it considers that Router dead. This is a corner scenario, since it requires a router with > 16 NIDs and other nodes have routes to the router using NIDs > 16 as a route. This also affect lctl ping, but this is not critical, since lctl ping is only used by sysadmins to ping nodes. I don't think it's a problem since ping response will still be displayed, but only the first 16 NIDs will be displayed. The reason for the number 16, is because we have to allocate a specific size to receive ping info. When we ping a node we don't know how many NIDs it has so we need to select a size for the buffer to receive the ping info. 16 NIDs seemed like a reasonable number to go with. I'm not sure about your concern with regards to the OSS number of NIs. Could you please explain your concern further.
            yujian Jian Yu added a comment -

            Hi Isaac,

            I saw that "#define LNET_MAX_RTR_NIS 16" was created by you in commit 5e1e6a6756d3b4ca19a0d7e0defcf974dbfed13c. Could you please comment on this ticket?

            yujian Jian Yu added a comment - Hi Isaac, I saw that "#define LNET_MAX_RTR_NIS 16" was created by you in commit 5e1e6a6756d3b4ca19a0d7e0defcf974dbfed13c. Could you please comment on this ticket?

            People

              ssmirnov Serguei Smirnov
              blakecaldwell Blake Caldwell
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: