On a client, WBC borrows the design and implementation from Linux/tmpfs a lot. Our implementation of Lustre WBC is called MemFS.
WBC needs to add cache aging mechanism so that it will automatically start to flush dirty cache in the background when it gets old (normal VFS/VM dirty inode writeback), instead of the current code that doesn't do any writeback to the MDS until the DLM lock is cancelled, which might be minutes or hours later.
Cache aging implementation
This can be implemented via super block VFS interface ->write_inode() in the background when it gets old.
However, when flush a dirty directory triggered by the revocation of the WBC EX lock or ->write_inode() call, it needs to hold the WBC EX lock until push all its children directories or files to MDT, acquire the WBC EX lock back on the children directories, and then can we drop the parent WBC EX lock.
The reasons are:
- It need to block any newly creation under a directory protected by the WBC EX lock which is being revoked.
- Currently the mechanism adopted by WBC (similar tmpfs) is to simply scan the in-memory sub dentries of the directory in dcache to fill the content returned to readdir call.
- Lustre new readdir implementation is much complex. It does readdir in hash order and uses hash of a file name as a telldir/seekdir cookie.
How to bridge the two implementations is a challenge.
Solution: sub dirs or files in a directory all flush to MDT or all cache on client-side MemFS.
To distinguish different states of a dentry and take different actions accordingly, several flags of an dentry are defined as follows:
- The file or directory is under the protection of subtree WBC EX lock;
- The file or directory has been flushed to the metadata server (MDT).
- For directory:
- Its top-level directory must be flushed to MDT; exclusively lock new subdirs, drop parent lock;
- The directory must obtain EX lock;
- Its parent directory should also sync to MDT, and drop the EX lock;
- Complete (C) for a directory:
- Cached in client-side cache or in Sync(S) state;
- Under the protection of EX lock: Protect(P) state;
- Contains the complete sub dirs and files in dcache (MemFS);
- Results of readdir() and lookup() operations under the directory can directly be obtained from client-side cache. All file operations can be performed on the client-side cache without communication with the server.
- Otherwise read dentries from MDT (Not (C) state).
When flushing the data of a regular file under the protection of WBC EX lock, it needs to write the dirty cached pages in MemFS into Lustre clio (called data assimilation phase). Before that, the metadata object has already created on MDT and the layout of the file is also instantiated and returned to the client.
Here there are two methods to assimilate file data.
- Before data assimilation, pin the WBC EX lock by increasing its reference. And then acquire all the extent locks for data IO. After that, it is safe to unpin WBC EX lock that guards the layout by decreasing the reference. Then perform data assimilation.
- The client does not acquire any extent locks for data IO, and could hold WBC EX lock that guards the layout until 1)finished data assimilation, 2) sync the data to OSTs and 3) discard the cached data on the client. if the file size is small, this method should be more efficient.
During data assimilation, it needs to ensure that new ongoing generic IO, which needs to switch from MemFS to Lustre clio engine (this should be solved in the JIRA ticket: LU-13010 reopen the files when EX WBC lock is cancelled), must be blocked in this phase until finished data assimilation.
- is related to
LU-10938 Metadata writeback cache support