Metadata writeback cache support (LU-10938)

[LU-13011] WBC2: Integrate PCC with Metadata Writeback Caching Created: 26/Nov/19  Updated: 18/May/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Technical task Priority: Minor
Reporter: Qian Yingjin Assignee: Qian Yingjin
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-10938 Metadata writeback cache support Open
Rank (Obsolete): 9223372036854775807

 Description   

The memory size on a client is limit compared with the local persistent storage such as SSDs/NVMe. For a WBC subtree, its metadata can be reasonably whole cached in MemFS. But data for regular files maybe grows up too large to cache on client-side MemFS.

Data on PCC

PCC can be used as the client-side persistent caching for the data of regular files under protection of EX WBC lock.

The PCC copy stub can be created according to FID when create the regular file on MemFS and then all data I/O are directed into PCC.

Or when the regular file growing too large for the client memory, create the corresponding PCC copy stub and attach the file into PCC.

Or at the time of assimilate data from MemFS into Lustre clio, create the PCC copy stub, write the data into PCC.

All these operations do not require interaction with MDS until flush is needed.During flushing for a regular file, it just needs to set the file with HSM exists, archived and released on MDT. The file data cached on PCC can defer resync to Lustre OSTs or evict from PCC when it is nearly full.

Metadata and Data all on PCC

It is desirable if metadata and data can both be cached on PCC, not only for larger capacity compared with the memory size of the client, but also for persistence and recoverability.

The proposal design are as follows:

  1. There is a PCCPATH/WBCROOT directory under the PCC device;
  2. Once create a newly directory 'dir' with WBC EX lock returned from MDT, create a corresponding metadata stub (also a directory) on PCC "PCCPATH/WBCROOT/FID(dir)" where "FID(dir)" is the FID of the directory "dir".
  3. All IO operations under this directory can be directly performed equally in the directory "PCCPATH/WBCROOT/FID(dir)" on PCC.
    • Create a directory "dir1" under the directory "dir", Lustre: "dir/dir1" -> PCC: "PCCPATH/WBCROOT/FID(dir)/dir1";
    • Create a directory "dir2" under the directory "dir1", Lustre: "dir/dir1/dir2" -> PCC: "PCCPATH/WBCROOT/FID(dir)/dir1/dir2";
    • Create a regular file "file1" under the directory "dir", Lustre:: "dir/file1" -> PCC: "PCCPATH/WBCROOT/FID(dir)/file1";
    • Writing data into "dir/file1" is directing into "PCCPATH/WBCROOT/FID(dir)/file1" on PCC;
    • Other operations are similar: read(), setattr(), stat(), truncate(), readdir()...
  4. When WBC EX lock is revoking:
    • For a child dir "dir1" under this directory, after it has created on MDT and the client has acquired the returned corresponding WBC EX lock to protect this child dir, move and rename it to PCCPATH/WBCROOT/FID(dir1) -
      mv PCCPATH/WBCROOT/dir/dir1 PCCPATH/WBCROOT/FID(dir1)
    • For a child regular file "file1" under this directory, the client first moves and renames it according HSM naming structure as follows:
      "%04x/%04x/%04x/%04x/%04x/%04x/" DFID_NOBRACE FID(file1)
      And then create a metadata object setting with HSM exists, archived and released on MDT;
  5. After finished to flush all children directories and files, the directory "PCCPATH/WBCROOT/FID(dir)" should be an empty directory, and it could be deleted directly.
  6. After that, it can release the WBC EX lock for "dir" safely.

 

FID maintain:

Once the metadata is also cached on PCC, directories and files under the root WBC directory may be evicted from VFS memory cache (dcache, page chace, inode cache), so when create metadata stub on PCC or the inode is evicted from cache (FID has already allocated for this file on the client), the client needs to store FID into a extent attribute EA of the metadata stub on PCC.

If not care about the consistence of FID for the file, maybe we could allocate a new FID for the file when flush the file to MDT?

Also we could use the strategy in LU-13024: Lustre HSM support for a subtree to organize and name the root WBC directory  on PCC

Advantages:

  • Much larger capacity;
  • fsync() can become a client local operation;
  • Do not need any cache limits for WBC;
  • When the client crashes, it can still recovery data from PCC;

Disadvantage:

  • May have impact on metadata performance, compared with MemFS, as it needs to create extra metadata stub on PCC.
  • As it does not need to pin the dentry in the memory, sub directories and files under the root WBC directory (directly protected by WBC EX lock) may be synced into PCC and evicted from VFS memory cache (dcache, page cache). Thus, when flushing a root WBC directory due to WBC EX lock revocation, the client may need to get all its children dentry via readdir() call on the directory stub on PCC, and then flush the children directories and files.

 



 Comments   
Comment by Andreas Dilger [ 27/Nov/19 ]

I think this is a WBC v2 target. Getting basic WBC functionality working is the goal for 2.14.

Comment by Andreas Dilger [ 28/Nov/19 ]

If not care about the consistence of FID for the file, maybe we could allocate a new FID for the file when flush the file to MDT?

The client can already allocate the FIDs itself, why would it need to allocate a new FID when flushing the file to the MDT? The only time I this this might be needed is if the client selected a FID on one MDT, but that MDT is full when it comes time to flush the WBC files/directories. However, as part of the WBC v1 implementation there needs to be a "grant" request for inodes/blocks on each MDT (as is done with OSTs today) to ensure that the client does not create more data/files than can fit into the filesystem. Otherwise the client data would be lost/discarded with ENOSPC long after it is written instead of returning an error to the application, which makes users unhappy.

Comment by Gerrit Updater [ 23/Jul/21 ]

Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/44392
Subject: LU-13011 wbc: data on PCC (DOP) for lock drop flush mode
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: fafaeda4ae56672b0ed140d8052031db8460b35d

Comment by Gerrit Updater [ 30/Jul/21 ]

Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/44427
Subject: LU-13011 wbc: create PCC copy when file grows too large
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5f71b8950db9d690347a3006559d7dc04a133d56

Comment by Gerrit Updater [ 23/Nov/21 ]

"Yingjin Qian <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/45639
Subject: LU-13011 wbc: data on PCC for lock keep flush mode
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0e15e14373dba34de4888988cea95836e53bae04

Comment by Gerrit Updater [ 07/May/22 ]

"Yingjin Qian <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/47246
Subject: LU-13011 wbc: print PCC file path for data on PCC
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: bc8e04b25855954cd28c56e5acd03af0f3f80805

Comment by Gerrit Updater [ 18/May/22 ]

"Yingjin Qian <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/47386
Subject: LU-13011 wbc: obtain attr from PCC once data on PCC
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e2cc46392cb2ca21cd5f1abe7ae8571747b4f5d7

Generated at Sat Feb 10 02:57:36 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.