socklnd needs improved interface selection and configuration
(LU-14064)
|
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Technical task | Priority: | Minor |
| Reporter: | Amir Shehata (Inactive) | Assignee: | Amir Shehata (Inactive) |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
sihara found this issue. When using NAT on a setup, it appears like the wrong NID is used: peer 1: 192.168.122.135 (NAT: 10.128.13.120)
(lib-move.c:1790:lnet_handle_send()) TRACE: 192.168.122.135@tcp(192.168.122.135@tcp:<?>) -> 10.128.13.130@tcp(10.128.13.130@tcp:10.128.13.130@tcp) : GET try# 0
peer2: 10.128.13.130
(lib-move.c:4302:lnet_parse()) TRACE: 10.128.13.130@tcp(10.128.13.130@tcp) <- 10.128.13.120@tcp : GET - for me
(lib-move.c:1858:lnet_handle_send()) TRACE: 10.128.13.130@tcp(10.128.13.130@tcp:10.128.13.130@tcp) -> 10.128.13.120@tcp(10.128.13.120@tcp:10.128.13.120@tcp)
peer1: :(lib-move.c:4236:lnet_parse()) TRACE: 10.128.13.120@tcp(192.168.122.135@tcp) <- 10.128.13.130@tcp : REPLY - routed It appears like the NID is of the node changes some where along the line. |
| Comments |
| Comment by Amir Shehata (Inactive) [ 29/May/20 ] |
|
The problem here is that the socklnd is using the IP address of the socket on the passive side. When a connection is established the passive side looks up the peer IP address from the socket. That IP is the NATed IP address however. So then the local peer structure on the passive side is created with a NID using the NATed IP address of the active. When a response is finally sent to the active node, the NID in the message contains the NATed IP address and not the private IP address LNet on the active node was configured with. The message is then dropped. What we need to do is keep a mapping between private and public IP addresses in socklnd. So the correct IP address ends up being used. |