Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • 9223372036854775807

    Description

      For reliability, it would be desirable to replicate the FLDB file across multiple MDTs, in case the FLDB file on MDT0000 is lost or corrupted. Since the FLDB itself is changing very rarely (only when new MDTs or OSTs are added to the filesystem, or 4B SEQ numbers have been allocated by one target), there should not be any noticeable performance overhad from having multiple mirrors.

      Since the FLDB will almost always be in sync across MDTs, it would be possible for the clients/servers to contact any MDT with an FLDB replica, and only query the MDT0000 FLDB copy if the requested SEQ number could not be located on the other MDT.

      When there are many MDTs, it may be impractical to have an FLDB copy on every MDT in the filesystem, so it makes sense to (deterministically) have FLDB copies only on a subseet of MDTs, such as having backups on MDT0001, MDT0003, MDT0005, MDT0007, MDT0009, MDTxxxx where x=3 n , 5 n , 7 n (like ext4 superblock copies). This would provide one backup for 2 MDTs, 2 backups for 4 MDTs, 3 for 8 MDTs, ..., up to 23 replicas with 65536 MDTs. One drawback of this mechanism is that the replicas may not be available if the MDTs are not densely numbered, but that is a very uncommon configuration, and almost certainly MDT0000 and MDT0001 should be available. If the expected MDT index is not configured, the client should try the next higher MDT index that is available in the filesystem.

      Attachments

        Issue Links

          Activity

            [LU-15414] LMR1: FLDB mirroring

            LU-4232 is intended to add checks on the OST/MDT that FIDs arriving in RPC requests on the servers are actually for FID SEQ ranges that this target is handing.

            adilger Andreas Dilger added a comment - LU-4232 is intended to add checks on the OST/MDT that FIDs arriving in RPC requests on the servers are actually for FID SEQ ranges that this target is handing.

            One issue that came up is an OST that was assigned a duplicate meta-sequence number as another OST. It makes sense for the MDTs and OSTs have a full copy of the FLDB from the SEQ controller and they can compare their own metaseq allocation against the FLDB and see if there is a conflict.

            If the FLDB shows that the metaseq is assigned to a different target, the OST or MDT should discard their metaseq and request a new one.

            adilger Andreas Dilger added a comment - One issue that came up is an OST that was assigned a duplicate meta-sequence number as another OST. It makes sense for the MDTs and OSTs have a full copy of the FLDB from the SEQ controller and they can compare their own metaseq allocation against the FLDB and see if there is a conflict. If the FLDB shows that the metaseq is assigned to a different target, the OST or MDT should discard their metaseq and request a new one.

            It may be enough that each MDT and OST always fetches a full copy of the FLDB from MDT0000 and stores it locally. That would make it trivial to manually copy the "fld" file to a failed MDT0000, or (better) do FLDB lookups on another server to recover at mount time from another server.

            adilger Andreas Dilger added a comment - It may be enough that each MDT and OST always fetches a full copy of the FLDB from MDT0000 and stores it locally. That would make it trivial to manually copy the " fld " file to a failed MDT0000, or (better) do FLDB lookups on another server to recover at mount time from another server.

            People

              wc-triage WC Triage
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: