Details

    • Technical task
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 9223372036854775807

    Description

      sihara found this issue.

      When using NAT on a setup, it appears like the wrong NID is used:

      peer 1: 192.168.122.135 (NAT: 10.128.13.120)

      (lib-move.c:1790:lnet_handle_send()) TRACE: 192.168.122.135@tcp(192.168.122.135@tcp:<?>) -> 10.128.13.130@tcp(10.128.13.130@tcp:10.128.13.130@tcp) : GET try# 0
      

      peer2: 10.128.13.130

      (lib-move.c:4302:lnet_parse()) TRACE: 10.128.13.130@tcp(10.128.13.130@tcp) <- 10.128.13.120@tcp : GET - for me
      
      (lib-move.c:1858:lnet_handle_send()) TRACE: 10.128.13.130@tcp(10.128.13.130@tcp:10.128.13.130@tcp) -> 10.128.13.120@tcp(10.128.13.120@tcp:10.128.13.120@tcp)
      

      peer1:

      :(lib-move.c:4236:lnet_parse()) TRACE: 10.128.13.120@tcp(192.168.122.135@tcp) <- 10.128.13.130@tcp : REPLY - routed
      

      It appears like the NID is of the node changes some where along the line.
      LNet shouldn't care about NAT in this case and should work.

      Attachments

        Activity

          [LU-13565] LNet socklnd with NAT is not working properly

          The problem here is that the socklnd is using the IP address of the socket on the passive side. When a connection is established the passive side looks up the peer IP address from the socket. That IP is the NATed IP address however. So then the local peer structure on the passive side is created with a NID using the NATed IP address of the active. When a response is finally sent to the active node, the NID in the message contains the NATed IP address and not the private IP address LNet on the active node was configured with. The message is then dropped.

          What we need to do is keep a mapping between private and public IP addresses in socklnd. So the correct IP address ends up being used.

          ashehata Amir Shehata (Inactive) added a comment - The problem here is that the socklnd is using the IP address of the socket on the passive side. When a connection is established the passive side looks up the peer IP address from the socket. That IP is the NATed IP address however. So then the local peer structure on the passive side is created with a NID using the NATed IP address of the active. When a response is finally sent to the active node, the NID in the message contains the NATed IP address and not the private IP address LNet on the active node was configured with. The message is then dropped. What we need to do is keep a mapping between private and public IP addresses in socklnd. So the correct IP address ends up being used.

          People

            ashehata Amir Shehata (Inactive)
            ashehata Amir Shehata (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: