Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4966

MGS target registration should use proper UUID

Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • 13736

    Description

      If some server registered successfully on MGS, but it got an error registration reply (MGS revoking config locks timeout or other networking problems), then the server will always get -EADDRINUSE error when it try to register next time, because the server index has been occupied on MGS in the first registration.

      Current solution for above situation is to use writeconf option to force registration.

      We need to get this improved and make MGS able to handle this gracefully.

      Attachments

        Issue Links

          Activity

            [LU-4966] MGS target registration should use proper UUID

            Was this issue somehow adressed in latest versions ?
            We are still seeing this problem when installing configurations with large amount of OSTs (or when modifiying it after a writeconf) and have to take care manually of it by starting OSTs one after another.

            spiechurski Sebastien Piechurski added a comment - Was this issue somehow adressed in latest versions ? We are still seeing this problem when installing configurations with large amount of OSTs (or when modifiying it after a writeconf) and have to take care manually of it by starting OSTs one after another.

            There is space in the last_rcvd file to store the UUID, but that has the potential problem that this file may be deleted if there are problems with recovery.

            As for the OST detection in the connection cide, it would be possible to store the target type in the last byte of the UUID or similar (e.g. the ASCII "O" or "M") and still make the rest of the UUID random.

            adilger Andreas Dilger added a comment - There is space in the last_rcvd file to store the UUID, but that has the potential problem that this file may be deleted if there are problems with recovery. As for the OST detection in the connection cide, it would be possible to store the target type in the last byte of the UUID or similar (e.g. the ASCII "O" or "M") and still make the rest of the UUID random.

            Niu's idea makes sense to me. I spent some time experimenting with it.

            A real UUID (e.g., "a53bc5ba-687b-4091-fb0b-61489785f247") could easily be generated by back-end-independent mkfs.lustre code and stored in a back-end-specific way (e.g., in ldiskfs "mountdata" or as a ZFS dataset property). (ZFS has pool IDs, but those are pool properties and are only 64-bit.)

            Current master code always passes empty strings in mti_uuid. It would be nice if real UUIDs could be packed into that field. However, experiments showed:

            • MDT OSPs for OSTs would send MDS_CONNECTs to OSTs, because client_obd_setup() depends on "OST" in UUIDs to determine whether an OSP is for an OST or an MDT.
            • Clients would not be able to connect to MDTs, because mgs would put fake UUIDs (e.g., "lustre-MDT0000_UUID") into MDT logs but real UUIDs into the client log.

            A possible solution is:

            • Start generating real UUIDs for any newly formatted targets.
            • Send real UUIDs via mti_uuid.
            • MGT checks and stores real UUIDs somewhere, but keeps putting fake UUIDs into logs.
            • Newly formatted targets must talk with new mgs code.
            liwei Li Wei (Inactive) added a comment - Niu's idea makes sense to me. I spent some time experimenting with it. A real UUID (e.g., "a53bc5ba-687b-4091-fb0b-61489785f247") could easily be generated by back-end-independent mkfs.lustre code and stored in a back-end-specific way (e.g., in ldiskfs "mountdata" or as a ZFS dataset property). (ZFS has pool IDs, but those are pool properties and are only 64-bit.) Current master code always passes empty strings in mti_uuid. It would be nice if real UUIDs could be packed into that field. However, experiments showed: MDT OSPs for OSTs would send MDS_CONNECTs to OSTs, because client_obd_setup() depends on "OST" in UUIDs to determine whether an OSP is for an OST or an MDT. Clients would not be able to connect to MDTs, because mgs would put fake UUIDs (e.g., "lustre-MDT0000_UUID") into MDT logs but real UUIDs into the client log. A possible solution is: Start generating real UUIDs for any newly formatted targets. Send real UUIDs via mti_uuid. MGT checks and stores real UUIDs somewhere, but keeps putting fake UUIDs into logs. Newly formatted targets must talk with new mgs code.

            Alex, I think Niu was asking for ideas on how this might best be fixed.

            adilger Andreas Dilger added a comment - Alex, I think Niu was asking for ideas on how this might best be fixed.
            bzzz Alex Zhuravlev added a comment - - edited

            this issue was mentioned by Chris in LU-1257 and at the moment it's not clear whether this specific issue is a major one for LLNL. it'll be hard to get this fixed in 2.6 due to amount of changes ?

            bzzz Alex Zhuravlev added a comment - - edited this issue was mentioned by Chris in LU-1257 and at the moment it's not clear whether this specific issue is a major one for LLNL. it'll be hard to get this fixed in 2.6 due to amount of changes ?

            Andreas/Alex, any suggestions? Thanks.

            niu Niu Yawei (Inactive) added a comment - Andreas/Alex, any suggestions? Thanks.

            I think if MGS save server UUID along with the server index, the it can tell if the registration (acquire for an occupied index) come from same server.
            And looks MGS now keeps server index bitmap in memory only, it needs be saved in disk as well.

            niu Niu Yawei (Inactive) added a comment - I think if MGS save server UUID along with the server index, the it can tell if the registration (acquire for an occupied index) come from same server. And looks MGS now keeps server index bitmap in memory only, it needs be saved in disk as well.

            People

              wc-triage WC Triage
              niu Niu Yawei (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated: