Details
-
Improvement
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
Directory migration involves multiple MDTs, and sever failure during migration tends to cause file missing, and this is its inherent fragility, not fault in the implementation:
1. directory migration doesn't support replay by client request: because all the source file xattrs need to be packed in the reply and then saved to request for replay, while current request format doesn't support this.
2. if the inter-MDT recovery is aborted, due to 1, file may get lost.
3. even if client replay is supported, if server recovery is aborted, file may get lost too.
To enhance directory migration robustness, we need to introduce redundancy for files migrated:
- retain source file till migration is finished, this includes both source file dirents and inodes, and they may be located on different MDTs. NB the retained files should be invisible to client, otherwise accessing migrating directory may fail.
- verify target files against source files before finishing migration (updating target directory layout), if there are extr source files, they must have been migrated, but target files get lost due to target MDT failure, migrate them again by MDT.
- make sure target files are committed to disk before destroying retained source files.
- this is complicated, so handle this in MDD layer other than LFSCK, because MDD code has better understanding of migration, and this is also more friendly for users, since it's opaque to them, which doesn't need extra lfsck command.
Attachments
Issue Links
- is related to
-
LU-17307 osd_dirent_count() keeps multiple threads busy
- Resolved