Description
Top-level tracking ticket for Metadata replication.
Parts of this work could be split into smaller features and implementation phases in order to reduce the amount of effort needed to get something usable out of the development within a reasonable timeframe:
- improvements to performance of distributed transactions (LU-7426) so that synchronous/ordered disk transactions are not needed. This would be very useful independent of MDT mirroring to improve creation of remote and striped directories, cross-MDT rename/link, etc.
- improve handling of distributed recovery when an MDT is offline (e.g. save transaction logs, don't block filesystem access for unrelated MDTs (
LU-9206++) - fault-tolerance for services that run on MDT0000, such as the quota master, FLDB, MGT, flock, etc.
- scalability of REMOTE_PARENT_DIR to allow handling more disconnected filesystem objects (LU-10329)
- mirroring of top-level directories in the filesystem (initiallly ROOT/, and then first level of subdirectories below it, etc.) so that the filesystem is "more" available if MDT0000 or other MDTs in a top-level striped directory are unavailable. This would not include mirroring of the regular inodes for files, only the directories themselves. Since the top-level directories are changed relatively less often than lower-level subdirectories, some extra overhead creating directories at this level is worthwhile for higher availability.
- mirrored directories would be similar to striped directories, but each directory entry name could be looked up in at least two different directory shards (e.g. lmv_locate_tgt_by_name(), ...+1, ...+2), depending on replication level, allowing the target to be found even if one MDT is offline (
LU-9206) - each mirrored directory entry would contain two or more different FIDs referencing inodes on separate MDTs (for subdirectories), or the same FID (for regular files)
- each mirrored directory inode would have the full layout of all shards in the directory, and client can determine which shard to use for lookup
- updates to the mirrored directory would always need distributed transactions that inserted or removed the redundant dirents together
- normal DNE distributed transaction recovery would apply to recover incomplete transactions if an MDT is offline during an update
Attachments
Issue Links
- is related to
-
LU-18015 DNE: client-MDT idle disconnection
- Open
- is related to
-
LU-12310 MDT Device-level Replication/Mirroring
- Open
-
LU-10329 DNE3: REMOTE_PARENT_DIR scalability
- Open
-
LU-4215 Some expected improvements for OUT
- Open
-
LU-7319 OUT: continue updates processing upon an error
- Open
-
LU-7427 DNE3: multiple entries for BATCHID
- Open
-
LU-7607 Preserve inode number after MDT migration
- Open
-
LU-9206 DNE - allow partial access to striped dir if one of the MDTs is unavailable
- Resolved