Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17818

LMR: Lustre Metadata Redundancy

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • None
    • 9223372036854775807

    Description

      Top-level tracking ticket for Metadata replication.

      Parts of this work could be split into smaller features and implementation phases in order to reduce the amount of effort needed to get something usable out of the development within a reasonable timeframe:

      • improvements to performance of distributed transactions (LU-7426) so that synchronous/ordered disk transactions are not needed. This would be very useful independent of MDT mirroring to improve creation of remote and striped directories, cross-MDT rename/link, etc.
      • improve handling of distributed recovery when an MDT is offline (e.g. save transaction logs, don't block filesystem access for unrelated MDTs (LU-9206 ++)
      • fault-tolerance for services that run on MDT0000, such as the quota master, FLDB, MGT, flock, etc.
      • scalability of REMOTE_PARENT_DIR to allow handling more disconnected filesystem objects (LU-10329)
      • mirroring of top-level directories in the filesystem (initiallly ROOT/, and then first level of subdirectories below it, etc.) so that the filesystem is "more" available if MDT0000 or other MDTs in a top-level striped directory are unavailable. This would not include mirroring of the regular inodes for files, only the directories themselves. Since the top-level directories are changed relatively less often than lower-level subdirectories, some extra overhead creating directories at this level is worthwhile for higher availability.
      • mirrored directories would be similar to striped directories, but each directory entry name could be looked up in at least two different directory shards (e.g. lmv_locate_tgt_by_name(), ...+1, ...+2), depending on replication level, allowing the target to be found even if one MDT is offline (LU-9206)
      • each mirrored directory entry would contain two or more different FIDs referencing inodes on separate MDTs (for subdirectories), or the same FID (for regular files)
      • each mirrored directory inode would have the full layout of all shards in the directory, and client can determine which shard to use for lookup
      • updates to the mirrored directory would always need distributed transactions that inserted or removed the redundant dirents together
      • normal DNE distributed transaction recovery would apply to recover incomplete transactions if an MDT is offline during an update

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: