Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
None
-
3
-
9223372036854775807
Description
An LNet router hit an odd problem. The router completed a reboot at 20:41:12
[Mon Jul 13 20:41:12 2020] Sending ec_node_info with boot code 8 (NODE_INFO_OS_BOOT_SUCCEEDED) for nid 602
but its ib0 interface didn't come up until 21:11:59
[Mon Jul 13 20:39:17 2020] ib0: enabling connected mode will cause multicast packet drops [Mon Jul 13 20:39:17 2020] ib0: mtu > 4092 will cause multicast packet drops. [Mon Jul 13 20:39:17 2020] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready ... [Mon Jul 13 21:11:59 2020] IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
Because of this change:
commit 28324781942780cc149555ccfd3dcf9a8d2ffdfb Author: Amir Shehata <ashehata@whamcloud.com> Date: Thu Nov 28 15:44:27 2019 -0800 LU-13029 lnet: fix asym routing with multi-hop
the gni clients classified the router as "multi-hop" and continued to use it. It should have been considered "down" (because of avoid_asym_router_failure). This lead to a bunch of evictions.
We can keep the detection code, because it is useful to spot when things go awry, but when we actually determine route aliveness we should use the configured hop count.