Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18164

MGS should dynamically track server NIDs

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Duplicate
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      (HPE LUS-12254)

      Motivation

      Changes of server NIDs in Lustre have historically been challenging, originally requiring the scary writeconf. There is a slight improvement with lctl replace_nids in that the entire config isn't wiped out, but this still requires a full system shutdown according to the manual, and in any case is another step after futzing with cabling and LNET configuration. 

      Also, fixed addressing is generally a challenge for more dynamic environments: cloud, virtual, etc.

      Proposal

      When servers start up, they contact the MGS and self-report their NIDs. The MGS updates its in-memory config with these settings, and notifies all other nodes of the new value via imperative recovery. 
      But: remove all NID information from persistent Lustre config files. MGS should remember the existence of the servers (uuid), but not know their NIDs when starting up. MGS should make up an address for a server that hasn't registered yet, either something like "nothing@lo0" or perhaps the MGS address if that helps the old client compat case. Clients will fail to contact anything at this address, and will retry forever until they get an update from the MGS with the new addresses.

      Heading off objections: 

      For those people/sites that hate change/are worried about server imposter attacks, we could perhaps keep both methods. Change mkfs.lustre to include a new flag "dynamic_nid" by default. If that is not present when first registering with the MGS, then MGS stores NIDs persistently as today. If it is there, then MGS does not store NID for this server. 

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              nrutman Nathan Rutman
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: