[LU-14386] LNet: select reachable remote peer nid Created: 29/Jan/21 Updated: 15/Mar/22 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Serguei Smirnov | Assignee: | Serguei Smirnov |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | lnet, lnet-router | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
The following results in a problem: NodeA --tcp0-- GW --tcp1-- NodeB
------ NodeA ------
lnetctl net show
- net type: tcp9
local NI(s):
- nid: 192.168.122.10@tcp9
- net type: tcp
local NI(s):
- nid: 192.168.122.142@tcp
------ NodeB ------
lnetctl net show
net:
- net type: tcp1
local NI(s):
- nid: 192.168.122.40@tcp1
------ NodeB ------
lnetctl peer show
peer:
- primary nid: 192.168.122.10@tcp9
Multi-Rail: True
peer ni:
- nid: 192.168.122.142@tcp
state: NA
- nid: 192.168.122.10@tcp9
state: NA
Note that NodeB lists NodeA under the unreachable tcp9 primary nid. Even though NodeB is aware of the reachable nid for NodeA, it gets confused if using the primary nid:
------ NodeB ------
lnetctl ping 192.168.122.10@tcp9
manage:
- ping:
errno: -1
descr: failed to ping 192.168.122.10@tcp9: Input/output error
|
| Comments |
| Comment by Serguei Smirnov [ 29/Jan/21 ] |
|
This can be resolved by porting the changes from the following MRR series patch: https://review.whamcloud.com/#/c/34625/17
|
| Comment by Gerrit Updater [ 29/Jan/21 ] |
|
Serguei Smirnov (ssmirnov@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41369 |