[LUDOC-494] Clarify when setting lnet route hops is required for Lustre 2.12 and Lustre 2.14 Created: 29/Jul/21 Updated: 08/Aug/23 |
|
| Status: | Open |
| Project: | Lustre Documentation |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Olaf Faaland | Assignee: | Serguei Smirnov |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
In a discussion on https://review.whamcloud.com/#/c/43127/, in response to:
Chris said: "Yes, good point. I think there was always an implicit requirement that hop count be set for multi-hop routes if the avoid_asym_route_failure feature was enabled, but we should make that explicit." However this isn't reflected in the manual or lnetctl(8). (yet)
|
| Comments |
| Comment by Olaf Faaland [ 29/Jul/21 ] |
|
Related to https://jira.whamcloud.com/browse/LU-14555 |
| Comment by Peter Jones [ 07/Aug/21 ] |
|
Serguei Could you please advise on what changes should be made to the manual here? Thanks Peter |
| Comment by Serguei Smirnov [ 10/Aug/21 ] |
|
Hi, I reviewed the current related documentation, listed below are the recommended changes: The lnetctl section of the Lustre manual and lnetctl man page should be updated to mention that the hop count defaults to 1 if not specified when adding a route with lnetctl. Also, the manual should be updated to clarify that "avoid_asym_route_failure" module parameter applies only to single-hop routers. Also, the following passage from 34.3.7. LNet Peer Health should be modified: "A router is considered down if any of its NIDs are down. For example, router X has three NIDs: Xnid1, Xnid2, and Xnid3. A client is connected to the router via Xnid1. The client has router checker enabled. The router checker periodically sends a ping to the router via Xnid1. The router responds to the ping with the status of each of its NIDs. In this case, it responds with Xnid1=up, Xnid2=up, Xnid3=down. If avoid_asym_router_failure==1, the router is considered down if any of its NIDs are down, so router X is considered down and will not be used for routing messages. If avoid_asym_router_failure==0, router X will continue to be used for routing messages." The above sounds incorrect to me now, because the router shouldn't be considered down unless it cannot reach remote net. Thanks, Serguei.
|
| Comment by Olaf Faaland [ 13/Aug/21 ] |
|
Hi Serguei, We'll be happy to review the patches. Thanks |
| Comment by Gerrit Updater [ 14/Sep/21 ] |
|
"Serguei Smirnov <ssmirnov@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/44916 |
| Comment by Gerrit Updater [ 08/Aug/23 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/c/doc/manual/+/44916/ |