Details
-
Technical task
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
9223372036854775807
Description
In LU-13021, we have designed three flush mode for WBC.
In WBC_FLUSH_AGE_LOCK_HOLD flush mode, to optimize the unlink() operation, we can also do unlink via background flush ->write_inode(). The design could be as follows:
When unlink 'f' under the directory 'dir' ('f' and 'dir' are all protected under a root WBC EX lock):
- Each directory dentry has a list L to maintain its children files or directories which have already flushed to MDT, but removed from WBC cache (MemFS) later;
- If 'f' is not flushed to MDT (!Sync(S) state), remove it directly from cache (MemFS currently);
- Otherwise, add a item which contains the name of the unlinking file into L of 'dir'; remove it from cache; And then mark 'dir' inode as dirty which will be flushed later;
- When Linux kernel flushes an inode via ->write_inode(), if found that the directory 'dir' has some children files or directories which are already synced to MDT but unlinked locally in cache, it must do a unlink for these files or directories on MDT;
- When MDT received an unlink request, if found it is not an empty directory (nlink > 0?), it must remove this whole subtree. This can be done Asynchronously:
- move this directory into lost+found?, and then reply to the client;
- launch a daemon thread on MDT dedicate to unlink this kind of directories under lost+found.