[LU-6060] ARF doesn't detect lack of interface on a router Created: 19/Dec/14 Updated: 14/Jul/15 Resolved: 20/Jan/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.3 |
| Fix Version/s: | Lustre 2.7.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | John Lewis (Inactive) | Assignee: | WC Triage |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 16875 | ||||||||||||||||
| Description |
|
When using Asymmetric router failure detection, the system appears unable to determine the lack of an expected interface. While a defined but non-functional interface is detected, the clients do not seem to detect when they have a route to a network via a router but that router has no means of getting the traffic there. Take for example a few nodes, login1, rtr5, rtr6, and mgs. This was demonstrated on live hardware, although the following example is abstracted/has changed addresses and names. Host: interfaces (routes) In other words, we have two routers with two interfaces each sitting between LNET1 and GNI1. Reproduction steps: show missing interface on rtr5 via lctl list_nids on login1 ping mgs show routes look for down_ni In other words, there is no way to get to o2ib1 via rtr5, but arf does not detect this. Presumably, at least in a non-multihop configuration, clients should be concerned not with whether the router has defined routes that aren't working, but wether the client has a defined route that a router can't handle due to a down interface or a lack of an interface. |
| Comments |
| Comment by Liang Zhen (Inactive) [ 20/Dec/14 ] |
|
Hi John, do you have patch on |
| Comment by James A Simmons [ 20/Dec/14 ] |
|
Yes the patch for |
| Comment by Gerrit Updater [ 21/Dec/14 ] |
|
Liang Zhen (liang.zhen@intel.com) uploaded a new patch: http://review.whamcloud.com/13162 |
| Comment by Liang Zhen (Inactive) [ 21/Dec/14 ] |
|
James, I think the issue here is, we will not record downis if there is no NI for target network, above patch should fix this problem. Also, I'm wondering if this the same problem of |
| Comment by James A Simmons [ 15/Jan/15 ] |
|
Can you make a patch for master as well. Testing looks good for the patch you provided. |
| Comment by Gerrit Updater [ 15/Jan/15 ] |
|
Jian Yu (jian.yu@intel.com) uploaded a new patch: http://review.whamcloud.com/13417 |
| Comment by Gerrit Updater [ 19/Jan/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13417/ |
| Comment by Peter Jones [ 20/Jan/15 ] |
|
Landed for 2.7 |