Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10938 Metadata writeback cache support
  3. LU-13011

WBC2: Integrate PCC with Metadata Writeback Caching



    • Technical task
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 9223372036854775807


      The memory size on a client is limit compared with the local persistent storage such as SSDs/NVMe. For a WBC subtree, its metadata can be reasonably whole cached in MemFS. But data for regular files maybe grows up too large to cache on client-side MemFS.

      Data on PCC

      PCC can be used as the client-side persistent caching for the data of regular files under protection of EX WBC lock.

      The PCC copy stub can be created according to FID when create the regular file on MemFS and then all data I/O are directed into PCC.

      Or when the regular file growing too large for the client memory, create the corresponding PCC copy stub and attach the file into PCC.

      Or at the time of assimilate data from MemFS into Lustre clio, create the PCC copy stub, write the data into PCC.

      All these operations do not require interaction with MDS until flush is needed.During flushing for a regular file, it just needs to set the file with HSM exists, archived and released on MDT. The file data cached on PCC can defer resync to Lustre OSTs or evict from PCC when it is nearly full.

      Metadata and Data all on PCC

      It is desirable if metadata and data can both be cached on PCC, not only for larger capacity compared with the memory size of the client, but also for persistence and recoverability.

      The proposal design are as follows:

      1. There is a PCCPATH/WBCROOT directory under the PCC device;
      2. Once create a newly directory 'dir' with WBC EX lock returned from MDT, create a corresponding metadata stub (also a directory) on PCC "PCCPATH/WBCROOT/FID(dir)" where "FID(dir)" is the FID of the directory "dir".
      3. All IO operations under this directory can be directly performed equally in the directory "PCCPATH/WBCROOT/FID(dir)" on PCC.
        • Create a directory "dir1" under the directory "dir", Lustre: "dir/dir1" -> PCC: "PCCPATH/WBCROOT/FID(dir)/dir1";
        • Create a directory "dir2" under the directory "dir1", Lustre: "dir/dir1/dir2" -> PCC: "PCCPATH/WBCROOT/FID(dir)/dir1/dir2";
        • Create a regular file "file1" under the directory "dir", Lustre:: "dir/file1" -> PCC: "PCCPATH/WBCROOT/FID(dir)/file1";
        • Writing data into "dir/file1" is directing into "PCCPATH/WBCROOT/FID(dir)/file1" on PCC;
        • Other operations are similar: read(), setattr(), stat(), truncate(), readdir()...
      4. When WBC EX lock is revoking:
        • For a child dir "dir1" under this directory, after it has created on MDT and the client has acquired the returned corresponding WBC EX lock to protect this child dir, move and rename it to PCCPATH/WBCROOT/FID(dir1) -
          mv PCCPATH/WBCROOT/dir/dir1 PCCPATH/WBCROOT/FID(dir1)
        • For a child regular file "file1" under this directory, the client first moves and renames it according HSM naming structure as follows:
          "%04x/%04x/%04x/%04x/%04x/%04x/" DFID_NOBRACE FID(file1)
          And then create a metadata object setting with HSM exists, archived and released on MDT;
      5. After finished to flush all children directories and files, the directory "PCCPATH/WBCROOT/FID(dir)" should be an empty directory, and it could be deleted directly.
      6. After that, it can release the WBC EX lock for "dir" safely.


      FID maintain:

      Once the metadata is also cached on PCC, directories and files under the root WBC directory may be evicted from VFS memory cache (dcache, page chace, inode cache), so when create metadata stub on PCC or the inode is evicted from cache (FID has already allocated for this file on the client), the client needs to store FID into a extent attribute EA of the metadata stub on PCC.

      If not care about the consistence of FID for the file, maybe we could allocate a new FID for the file when flush the file to MDT?

      Also we could use the strategy in LU-13024: Lustre HSM support for a subtree to organize and name the root WBC directory  on PCC


      • Much larger capacity;
      • fsync() can become a client local operation;
      • Do not need any cache limits for WBC;
      • When the client crashes, it can still recovery data from PCC;


      • May have impact on metadata performance, compared with MemFS, as it needs to create extra metadata stub on PCC.
      • As it does not need to pin the dentry in the memory, sub directories and files under the root WBC directory (directly protected by WBC EX lock) may be synced into PCC and evicted from VFS memory cache (dcache, page cache). Thus, when flushing a root WBC directory due to WBC EX lock revocation, the client may need to get all its children dentry via readdir() call on the directory stub on PCC, and then flush the children directories and files.



        Issue Links



              qian_wc Qian Yingjin
              qian_wc Qian Yingjin
              0 Vote for this issue
              4 Start watching this issue