[LU-3679] /proc/sys/lnet/routes should accurately reflect routing with ARF when LNet router has one or more down NIs Created: 31/Jul/13 Updated: 04/Feb/14 Resolved: 04/Feb/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0 |
| Fix Version/s: | Lustre 2.6.0, Lustre 2.5.1 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Chris Horn | Assignee: | Dmitry Eremin (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Rank (Obsolete): | 9498 |
| Description |
|
On a system where an LNet router has more than one NI, ARF is configured on clients and servers, and one or more of the LNet router's NIs goes "down", /proc/sys/lnet/routes on clients/servers should show routes for that router as "down" rather than "up". The story: A site was doing some tests of FGR where LNet routers had two IB interfaces. After seeing wide variations in packet counts between ib0 and ib1, they noticed that some NIs were down on the routers > lnet6: nid status alive refs peer rtr max tx min but were up for IPOIB. This caused some confusion, and was compounded by the fact that clients show these routes as still functional: cat /proc/sys/lnet/routes | grep 454 This lead people to believe that clients were still trying to use routes that were actually down resulting in performance problems. Since ARF was configured, we know this wasn't actually the case. Clients will not use a router if that router has one or more down NIs. This should be reflected in the output of /proc/sys/lnet/routes. |
| Comments |
| Comment by Isaac Huang (Inactive) [ 06/Aug/13 ] |
|
Yes, a route should be considered "down" if the router is down or the router NI for the target network is down. |
| Comment by Chris Horn [ 07/Aug/13 ] |
|
Correct me if I'm wrong, but if the NI for the target network is up and an NI for a different target network is down the router still won't be used due to ARF, right? |
| Comment by Isaac Huang (Inactive) [ 07/Aug/13 ] |
|
In that case the route will still used. For example, if router 454@gni has @o2ib1000 NI down but @o2ib1002 NI up, there is no reason why 454@gni can't be used as a route to @o2ib1002. Note that route != router (a router can serve as next hop in multiple routes), in the example, the route to @o2ib1000 via 454@gni is down, but the route to @o2ib1002 via 454@gni is up. |
| Comment by Chris Horn [ 08/Aug/13 ] |
|
Ah right. I had missed the bit of code in lnet_parse_rc_info() that ignored other down NIs on a router if the NI for the destination network was up. |
| Comment by Chris Horn [ 01/Oct/13 ] |
|
FYI, I have a patch for this awaiting testing and a push into Gerrit for review. Just don't want anyone to duplicate effort here. |
| Comment by Chris Horn [ 04/Oct/13 ] |
|
For your review: http://review.whamcloud.com/#/c/7857/ |
| Comment by Peter Jones [ 27/Oct/13 ] |
|
Landed for 2.6 |
| Comment by Chris Horn [ 06/Nov/13 ] |
|
Can we get this on b2_5? |
| Comment by Dmitry Eremin (Inactive) [ 06/Nov/13 ] |
|
patch for b2_5 is http://review.whamcloud.com/8195 |
| Comment by Dmitry Eremin (Inactive) [ 04/Feb/14 ] |
|
Landed to b2_5 |