Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15192

socklnd: using typed_conns=0 disables communication

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • 3
    • 9223372036854775807

    Description

       Switching to "untyped" socklnd connections by using the following option

      options ksocklnd typed_conns=0

      appears to make socklnd unable to communicate. Self-pinging fails:

      lnetctl ping 192.168.122.123@tcp
      manage:
          - ping:
                errno: -1
                descr: failed to ping 192.168.122.123@tcp: Input/output error

      Typical net debug trace is 

      00000400:00000200:0.0:1635961675.271877:0:9092:0:(lib-move.c:4834:LNetGet()) LNetGet -> 12345-192.168.122.123@tcp
      00000400:00000200:0.0:1635961675.271885:0:9092:0:(lib-move.c:2450:lnet_handle_send_case_locked()) Source ANY to NMR:  192.168.122.123@tcp local destination
      00000400:00000200:0.0:1635961675.271892:0:9092:0:(lib-move.c:1714:lnet_handle_send()) rspt_next_hop_nid = 192.168.122.123@tcp
      00000400:00000200:0.0:1635961675.271899:0:9092:0:(lib-move.c:1728:lnet_handle_send()) TRACE: 192.168.122.123@tcp(192.168.122.123@tcp:<?>) -> 192.168.122.123@tcp(192.168.122.123@tcp:192.168.122.123@tcp) : GET try# 0
      00000800:00000200:0.0:1635961675.271905:0:9092:0:(socklnd_cb.c:1003:ksocknal_send()) sending 0 bytes in 0 frags to 12345-192.168.122.123@tcp
      00000800:00000200:0.0:1635961675.271912:0:9092:0:(socklnd.c:195:ksocknal_find_peer_locked()) got peer_ni [ffff9cacc153ae00] -> 12345-192.168.122.123@tcp (4)
      00000800:00000200:0.1F:1635961675.271919:0:9092:0:(socklnd.c:195:ksocknal_find_peer_locked()) got peer_ni [ffff9cacc153ae00] -> 12345-192.168.122.123@tcp (4)
      00000800:00000100:0.0:1635961675.271924:0:9092:0:(socklnd_cb.c:979:ksocknal_launch_packet()) No usable routes to 12345-192.168.122.123@tcp
      00000400:00000200:0.0:1635961675.271926:0:9092:0:(lib-msg.c:816:lnet_is_health_check()) health check = 1, status = -5, hstatus = 7
      00000400:00000200:0.0:1635961675.271932:0:9092:0:(lib-msg.c:630:lnet_health_check()) health check: 192.168.122.123@tcp->192.168.122.123@tcp: GET: REMOTE_ERROR
      00000400:00000200:0.0:1635961675.271937:0:9092:0:(api-ni.c:4096:lnet_ping()) poll 1(5 -5)
      00000400:00000200:0.0:1635961675.271940:0:9092:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md ffff9cad1077c110
      00000400:00000200:0.0:1635961675.271942:0:9092:0:(api-ni.c:4096:lnet_ping()) poll 1(6 0) unlinked
      00000800:00000200:0.0:1635961678.781862:0:8854:0:(socklnd.c:195:ksocknal_find_peer_locked()) got peer_ni [ffff9cacc153ae00] -> 12345-192.168.122.123@tcp (4)
      00000800:00000200:0.1:1635961678.781869:0:8854:0:(socklnd.c:195:ksocknal_find_peer_locked()) got peer_ni [ffff9cacc153ae00] -> 12345-192.168.122.123@tcp (4)
      00000800:00000100:0.0:1635961678.781873:0:8854:0:(socklnd_cb.c:979:ksocknal_launch_packet()) No usable routes to 12345-192.168.122.123@tcp
      00000800:00000200:0.0:1635961678.781878:0:8854:0:(socklnd.c:195:ksocknal_find_peer_locked()) got peer_ni [ffff9cad3a278300] -> 12345-192.168.122.137@tcp (4)
      00000800:00000200:0.1:1635961678.781881:0:8854:0:(socklnd.c:195:ksocknal_find_peer_locked()) got peer_ni [ffff9cad3a278300] -> 12345-192.168.122.137@tcp (4)
      00000800:00000100:0.0:1635961678.781884:0:8854:0:(socklnd_cb.c:979:ksocknal_launch_packet()) No usable routes to 12345-192.168.122.137@tcp

      Attachments

        Activity

          People

            ssmirnov Serguei Smirnov
            ssmirnov Serguei Smirnov
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: