Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18164

MGS should dynamically track server NIDs

Details

    • Improvement
    • Resolution: Duplicate
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      (HPE LUS-12254)

      Motivation

      Changes of server NIDs in Lustre have historically been challenging, originally requiring the scary writeconf. There is a slight improvement with lctl replace_nids in that the entire config isn't wiped out, but this still requires a full system shutdown according to the manual, and in any case is another step after futzing with cabling and LNET configuration. 

      Also, fixed addressing is generally a challenge for more dynamic environments: cloud, virtual, etc.

      Proposal

      When servers start up, they contact the MGS and self-report their NIDs. The MGS updates its in-memory config with these settings, and notifies all other nodes of the new value via imperative recovery. 
      But: remove all NID information from persistent Lustre config files. MGS should remember the existence of the servers (uuid), but not know their NIDs when starting up. MGS should make up an address for a server that hasn't registered yet, either something like "nothing@lo0" or perhaps the MGS address if that helps the old client compat case. Clients will fail to contact anything at this address, and will retry forever until they get an update from the MGS with the new addresses.

      Heading off objections: 

      For those people/sites that hate change/are worried about server imposter attacks, we could perhaps keep both methods. Change mkfs.lustre to include a new flag "dynamic_nid" by default. If that is not present when first registering with the MGS, then MGS stores NIDs persistently as today. If it is there, then MGS does not store NID for this server. 

      Attachments

        Issue Links

          Activity

            [LU-18164] MGS should dynamically track server NIDs

            Now we just need someone to test/fix/confirm that this functionality is working.

            It looks like the initial patch https://review.whamcloud.com/39613 "LU-10360 mgc: Use IR for client->MDS/OST connections" was landed in commit v2_13_55-106-g37be05eca3, so it should be available in all recent clients, but I don't think it was really finished before Amir moved on to another project. Hacking the MGS client config log to delete the NID records might be able to quickly show what the state of affairs is for the functionality. Ideally, the client config log wouldn't need to have any target configuration records in it, just know "there is an MDT in the IR log, I know how to set up the MDC for it, proceed". Not only does this avoid the complexity of managing static NIDs, it also reduces issues if the MGS config log becomes corrupt, etc.

            adilger Andreas Dilger added a comment - Now we just need someone to test/fix/confirm that this functionality is working. It looks like the initial patch https://review.whamcloud.com/39613 " LU-10360 mgc: Use IR for client->MDS/OST connections " was landed in commit v2_13_55-106-g37be05eca3, so it should be available in all recent clients, but I don't think it was really finished before Amir moved on to another project. Hacking the MGS client config log to delete the NID records might be able to quickly show what the state of affairs is for the functionality. Ideally, the client config log wouldn't need to have any target configuration records in it, just know "there is an MDT in the IR log, I know how to set up the MDC for it, proceed". Not only does this avoid the complexity of managing static NIDs, it also reduces issues if the MGS config log becomes corrupt, etc.

            Well cool beans.

            nrutman Nathan Rutman added a comment - Well cool beans.

            For those people/sites that hate change/are worried about server imposter attacks, we could perhaps keep both methods. Change mkfs.lustre to include a new flag "dynamic_nid" by default.

            Note that this already exists (to some extent), with "mgc.*.dynamic_nids" which enables/disables the use of MGS IR NIDs for connection. It defaults to "0/off" so maybe just turning this on by default is (almost?) enough to make this all work today. Having stronger identification (let alone authentication) of MDTs and OSTs in the filesystem would of course be great (e.g. proper UUIDs for each target, so that they cannot be easily confused in an environment with many different filesystems).

            adilger Andreas Dilger added a comment - For those people/sites that hate change/are worried about server imposter attacks, we could perhaps keep both methods. Change mkfs.lustre to include a new flag "dynamic_nid" by default. Note that this already exists (to some extent), with " mgc.*.dynamic_nids " which enables/disables the use of MGS IR NIDs for connection. It defaults to "0/off" so maybe just turning this on by default is (almost?) enough to make this all work today. Having stronger identification (let alone authentication) of MDTs and OSTs in the filesystem would of course be great (e.g. proper UUIDs for each target, so that they cannot be easily confused in an environment with many different filesystems).

            As James pointed out, this is exactly a duplicate of LU-10360 to use the MGS IR logs to track the current server NIDs, and LU-10359 to remove the hard-coded server NIDs from the config logs (there would just be a generic list of "targets" left that clients ask the MGS for addresses to connect to).

            Some work was done on that code, and IIRC the client will "discover" target NIDs via the IR log after mounting, but it still prefers to use the config log for the initial mount.

            What remains to be done is to finish off the discovery portion (if anything) and then test/fix clients so that they can mount using only the MGS IR NIDs (i.e. no "add_nid" records in the client config llog, or it ignores them).

            Having redundant MGS services (LU-16722) would also go a long way to making mounting and IR more robust in the face of an MGS reboot.

            So I definitely think this is achievable, but needs someone to have time to focus on this task and move it forward.

            adilger Andreas Dilger added a comment - As James pointed out, this is exactly a duplicate of LU-10360 to use the MGS IR logs to track the current server NIDs, and LU-10359 to remove the hard-coded server NIDs from the config logs (there would just be a generic list of "targets" left that clients ask the MGS for addresses to connect to). Some work was done on that code, and IIRC the client will "discover" target NIDs via the IR log after mounting, but it still prefers to use the config log for the initial mount. What remains to be done is to finish off the discovery portion (if anything) and then test/fix clients so that they can mount using only the MGS IR NIDs (i.e. no "add_nid" records in the client config llog, or it ignores them). Having redundant MGS services ( LU-16722 ) would also go a long way to making mounting and IR more robust in the face of an MGS reboot. So I definitely think this is achievable, but needs someone to have time to focus on this task and move it forward.

            Take a look at LU-10360.

            simmonsja James A Simmons added a comment - Take a look at LU-10360 .

            People

              wc-triage WC Triage
              nrutman Nathan Rutman
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: