Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18819

allow direct IB GUID addressing without IPoIB

Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.16.1
    • 3
    • 9223372036854775807

    Description

      Currently it is required that IB networks have IPoIB configured in order to establish an initial connection to the IB peer via its IP address so that it can fetch the IB GUID. This also simplifies configuration somewhat because IP dotted-quad addresses are easier to manage than IB GUIDs. It is also possible to use DNS to perform hostname->IP lookups (e.g. to the MGS), but the current Lustre configuration logs require IP addresses to be stored because the kernel does not (or at least did not) have the ability to lookup hostnames directly (though that may have changed in newer kernels).

      After that initial IP connection, o2iblnd does not use IP for any traffic.

      It would be desirable to eliminate the requirement for IPoIB in networks where this is not needed, without administrators having to manage IB GUIDs directly (which are long, complex, do not necessarily follow a consistent pattern).

      Attachments

        Issue Links

          Activity

            [LU-18819] allow direct IB GUID addressing without IPoIB

            Sure, but that's a chicken-and-egg problem then. How do you know which GUID to query in the first place, without having an out-of-band communication mechanism?

            adilger Andreas Dilger added a comment - Sure, but that's a chicken-and-egg problem then. How do you know which GUID to query in the first place, without having an out-of-band communication mechanism?

            I believe the GUID can be queried by the inifinband layer itself.

            simmonsja James A Simmons added a comment - I believe the GUID can be queried by the inifinband layer itself.

            One option would be to add DNS txt records that report the IB GUID for a peer node, so that it is possible to do a lookup of e.g. "hostname@ib" to get the GUID record and use that to establish the peer connection directly. This has the advantage of using existing infrastructure. However, I think cluster administrators have some obstacle/reluctance for using DNS, since it is already possible to use DNS for MGS hostname resolution (e.g. mgsnode@tcp or mgsnode@o2ib) instead of specifying the MGS IP addresses, but in my experience the MGS IP addresses are overwhelmingly used instead of hostnames.

            Possibly, this could be improved by having a DNS server on the MGS itself, so that it could be configured/administered by the cluster admin directly. Potentially we could even embed/depend on a very simple DNS server with the MGS, but it isn't clear if that has any advantage over just documenting/automating the process of configuring an external/existing DNS server package on the MGS (e.g. via Recommends: in the lustre.spec file.

            Another option that had been proposed in the past was for the MGS to provide its own (non-DNS) name lookup or configuration service, but it isn't clear that this is a win over just configuring a dedicated DNS service with txt records for the GUIDs.

            We could use something like an IB-based Rendezvous protocol to advertise the OSTs and MDTs to the MGS and then register them in the Imperative Recovery (IR) target list for the clients to use. However, peer discovery over a large network with many segments may be slow, difficult, or impossible (though possibly LNet routers could be forwarders?).

            adilger Andreas Dilger added a comment - One option would be to add DNS txt records that report the IB GUID for a peer node, so that it is possible to do a lookup of e.g. " hostname@ib " to get the GUID record and use that to establish the peer connection directly. This has the advantage of using existing infrastructure. However, I think cluster administrators have some obstacle/reluctance for using DNS, since it is already possible to use DNS for MGS hostname resolution (e.g. mgsnode@tcp or mgsnode@o2ib ) instead of specifying the MGS IP addresses, but in my experience the MGS IP addresses are overwhelmingly used instead of hostnames. Possibly, this could be improved by having a DNS server on the MGS itself, so that it could be configured/administered by the cluster admin directly. Potentially we could even embed/depend on a very simple DNS server with the MGS, but it isn't clear if that has any advantage over just documenting/automating the process of configuring an external/existing DNS server package on the MGS (e.g. via Recommends: in the lustre.spec file. Another option that had been proposed in the past was for the MGS to provide its own (non-DNS) name lookup or configuration service, but it isn't clear that this is a win over just configuring a dedicated DNS service with txt records for the GUIDs. We could use something like an IB-based Rendezvous protocol to advertise the OSTs and MDTs to the MGS and then register them in the Imperative Recovery (IR) target list for the clients to use. However, peer discovery over a large network with many segments may be slow, difficult, or impossible (though possibly LNet routers could be forwarders?).

            People

              wc-triage WC Triage
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: