[LU-8937] Dual IB port connect Router Created: 14/Dec/16  Updated: 13/Sep/18  Resolved: 13/Sep/18

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: None

Type: Question/Request Priority: Minor
Reporter: Mahmoud Hanafi Assignee: Amir Shehata (Inactive)
Resolution: Not a Bug Votes: 0
Labels: None

Rank (Obsolete): 9223372036854775807

 Description   

For increase bandwidth and failover we would like to connect 2 IB ports on each side of a router

o2ib ---- | HCA1 - ROUTER - HCA3|-- o2ib313

o2ib ---- | HCA2 - ROUTER - HCA4|-- o2ib313

I know we can doing if we had 2 different nids on each side. Can it be done using the same nid on 2 interfaces?



 Comments   
Comment by Amir Shehata (Inactive) [ 14/Dec/16 ]

Currently LNet doesn't support having two ports with the same NIDs, or even on the same LNet. The only way you can do this is by configuring each IB port on a separate LNet network. so the configuration would look something like this

o2ib1 — | HCA1 - ROUTER - HCA3 | – o2ib313
o2ib2 ----| HCA2 - ROUTER - HCA4 | – o2ib314

However, there is a feature that we have now called Multi-rail. It's not in master, but on its own branch, which allows LNet to use multiple interfaces on the same LNet Network, which will allow you to do what you want there, even without the presence of the router, if it's not needed for other purposes. So one possible configuration for what you have is:

o2ib(hca1, hca2) and o2ib(hca3, hca4). This will allow all interfaces to live on the same network. LNet will do weighted round robin over the interfaces, and you should see an aggregate of the performance of both interfaces.

If you need the routers you'd do something like

o2ib(hca1, hca2) on the nodes on the left side of the routers
o2ib(hca3, hca4) on the routers
o2ib313(hca5, hca6) on the routers

    1. This assumes that the router has 2 interfaces on each side of the network
      o2ib313(hca7, hca8) on the right side of the routers.

This will increase performance, but the routers will have an impact on this performance gain, because there is still a hop for the messages to go over.

Let me know if you need more details

Comment by Mahmoud Hanafi [ 14/Dec/16 ]

What if the node/servers on each of the routers only has 1 interface

 

servers(o2ib)--- o2ib(hca1,hca2)[router]o2ib313(hca3,hca4)--- nodes(o2ib313)

 

Comment by Amir Shehata (Inactive) [ 14/Dec/16 ]

I'm assuming you're asking about MR.

What you have outlined would work.

You would configure the servers and the nodes with the routers interfaces as shown below:

on the servers:
lnetctl peer add --nids router@hca1-nid,router@hca2-nid

on the nodes
lnetctl peer add --nids router@hca3-nid,router@hca4-nid

this will enable both the nodes and the servers to use both interfaces on the routers, which should increase the performance of the router.

Without MR you would need to add each of the routers interfaces on a different LNet network (o2ib[1,2,3,4]) and then configure some of the servers to use one LNet network and other servers to configure the other LNet network. With MR the configuration is simplified quiet a bit.

Comment by Mahmoud Hanafi [ 26/Oct/17 ]

We can close this case. We have dual-rail routers working in production.

Comment by Mahmoud Hanafi [ 13/Sep/18 ]

Question was answered.

Generated at Sat Feb 10 02:21:50 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.