Details
-
Technical task
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
9223372036854775807
Description
Replicate ROOT/ directory to other MDTs with striped+replicated entries to allow read-only access to directory trees on other MDTs when MDT0000 is unavailable. It would initially be acceptable to disallow new directory creation in ROOT/ if MDT0000 is offline, since this is not a frequent operation on most filesystems.
The first step needed is to produce a more detailed design for Lustre Replicated Metadata. Some of these issues are not critical for read-only operations, but good to understand any longer term design/implementation issues so that earlier phases are moving in the right direction.
- detection and handling of primary MDT failure
- handling of DLM locking for secondary FIDs (or does the secondary MDT allow locking on the primary FID?)
This needs a number of independent changes to the code:
- ldiskfs/osd-ldiskfs dirdata, ZFS/osd-zfs to allow multiple target FIDs to be stored in a dirent (up to 7?)
- changes to e2fsck, debugfs, etc. to use provided dirdata size instead of assuming 16 bytes for a single FID
- changes to MDC, LMV, MDD, LOD, llite to allow multiple FIDs to be stored in a single directory entry in struct lu_dirent
- changes to MDC, LMV to allow clients to lookup name in primary shard or backup shard(s) if directory is replicated
- changes to MDC, LMV so readdir() returns only a single copy of each name
- store primary FID in OI of replica MDTs (in addition to replica FID) to allow lookup of replica object directly
- use FID_SEQ_ROOT to identify ROOT/ directory stripes on other MDTs (reserve first 65535 OIDs for this)?
Add commands add/remove stripes and replica entries to existing directories on non-LMR system, for example "lfs migrate -m -c 2 -N 2 <dir>" or similar.
Enhance LFSCK to verify/repair consistency between entries in replica directories.