[LU-6130] Number of LNET NI's limited Created: 15/Jan/15 Updated: 15/Oct/20 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Blake Caldwell | Assignee: | Amir Shehata (Inactive) |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 17076 | ||||||||||||||||
| Description |
|
While investigating the number of maximum possible NI's that can be added with DLC, I found that the structure lnet_ping_info_t is only expected to have a maximum of 16 entries in pi_ni. ./lnet/include/lnet/lib-types.h:460 typedef struct { /* router checker data, per router */ Following the call chain from adding a new NI with DLC.... ./lnet/utils/lnetconfig/liblnetconfig.c: ./lnet/lnet/module.c: lnet/lnet/api-ni.c: This is where the pi_ni array mentioned above is used: lnet_ping_info_create(int num_ni) This limit would appear designed to apply to routers, but what about an OSS with an arbitrary number (N) of NI's (e.g. o2ib0…o2ibN)? |
| Comments |
| Comment by Jian Yu [ 15/Jan/15 ] |
|
Hi Isaac, I saw that "#define LNET_MAX_RTR_NIS 16" was created by you in commit 5e1e6a6756d3b4ca19a0d7e0defcf974dbfed13c. Could you please comment on this ticket? |
| Comment by Amir Shehata (Inactive) [ 29/Jan/15 ] |
|
When creating/adjusting the ping info when adding an NI, there is no limit on the size of the ping info. So it can expand to include all the NIs on the node. However, when pinging a node the current code is only interested in the first 16 NIDs of the router. This could be an issue in the following scenario: Router A has 17 NIs, 0 to 16 This is a corner scenario, since it requires a router with > 16 NIDs and other nodes have routes to the router using NIDs > 16 as a route. This also affect lctl ping, but this is not critical, since lctl ping is only used by sysadmins to ping nodes. I don't think it's a problem since ping response will still be displayed, but only the first 16 NIDs will be displayed. The reason for the number 16, is because we have to allocate a specific size to receive ping info. When we ping a node we don't know how many NIDs it has so we need to select a size for the buffer to receive the ping info. 16 NIDs seemed like a reasonable number to go with. I'm not sure about your concern with regards to the OSS number of NIs. Could you please explain your concern further. |
| Comment by Jian Yu [ 23/Apr/15 ] |
|
Hi Blake, Did Amir answer your question? If not, could you please explain your concern further? |
| Comment by James A Simmons [ 23/Apr/15 ] |
|
Today's corner cases will be tomorrows default settings. Perhaps we should look at fixing this. Besides LNET_MAX_RTR_NIS also LNET_PINGINFO_SIZE would have to be handled dynamically. |
| Comment by Blake Caldwell [ 23/Apr/15 ] |
|
Yes, that was a clear explanation. Thank you, Amir. I understand now. So the number of LNET NI's should not be limited, which strictly speaking, was the original concern of my ticket. Creating more than 16 "flat" NI's is desirable. However, as James alluded to, this limitation could be a problem as soon as a router is needed in this configuration (o2ib to tcp, or segmented IB networks). |
| Comment by Amir Shehata (Inactive) [ 23/Apr/15 ] |
|
I'll create a ticket to track this enhancement. I do agree that we should be stepping away from hard coded static values. |
| Comment by John Hammond [ 15/Oct/20 ] |
|
More discussion from Andreas on |