Hi,
I reviewed the current related documentation, listed below are the recommended changes:
The lnetctl section of the Lustre manual and lnetctl man page should be updated to mention that the hop count defaults to 1 if not specified when adding a route with lnetctl.
Also, the manual should be updated to clarify that "avoid_asym_route_failure" module parameter applies only to single-hop routers.
Also, the following passage from 34.3.7. LNet Peer Health should be modified:
"A router is considered down if any of its NIDs are down. For example, router X has three NIDs: Xnid1, Xnid2, and Xnid3. A client is connected to the router via Xnid1. The client has router checker enabled. The router checker periodically sends a ping to the router via Xnid1. The router responds to the ping with the status of each of its NIDs. In this case, it responds with Xnid1=up, Xnid2=up, Xnid3=down. If avoid_asym_router_failure==1, the router is considered down if any of its NIDs are down, so router X is considered down and will not be used for routing messages. If avoid_asym_router_failure==0, router X will continue to be used for routing messages."
The above sounds incorrect to me now, because the router shouldn't be considered down unless it cannot reach remote net.
Thanks,
Serguei.
"Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/c/doc/manual/+/44916/
Subject: LUDOC-494 lnet: clarify use of route hopcount
Project: doc/manual
Branch: master
Current Patch Set:
Commit: f7da09ba79b2522ca51d001c59ab1212d051309c