[LU-10092] PCC: Lustre Persistent Client Cache Created: 06/Oct/17  Updated: 25/Aug/20  Resolved: 30/Aug/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.13.0

Type: New Feature Priority: Minor
Reporter: Li Xi (Inactive) Assignee: Li Xi
Resolution: Fixed Votes: 0
Labels: patch

Attachments: PDF File LUG2018-Lustre_Persistent_Client_Cache-Xi.pdf    
Issue Links:
Blocker
is blocked by LU-10499 Readonly Persistent Client Cache support Closed
Gantt End to End
has to be finished together with LUDOC-432 Create documentation for PCC Feature Resolved
Related
is related to LU-7207 HSM: Add Archive UUID to delete chang... Open
is related to LU-13137 User process segfaults since 2.13 cli... Resolved
is related to LU-10602 Add file heat support for Persistent ... Open
is related to LU-10606 HSM info as part of LOV layout xattr Open
is related to LU-11333 Using PCC to cache whole "virtual" di... Open
is related to LU-10918 Configurable rule based auto PCC cach... Open
is related to LU-13924 Document PCC "lfs pcc add" options Open
is related to LU-10114 Feasibility of increasing upper limit... Resolved
Sub-Tasks:
Key
Summary
Type
Status
Assignee
LU-11908 Create PCC Phase 1 test plan Technical task Resolved Qian Yingjin  
Rank (Obsolete): 9223372036854775807

 Description   

PCC is a new framework which provides a group of local cache on Lustre client side. No global namespace will be provided by PCC. Each client uses its own SSD as a local cache for it self. Local file system is used on the SSD to manage the data on local caches. Cached I/O is directed to local file system while normal I/O is directed to OSTs.

PCC uses HSM for data synchronization. It uses HSM copytool to restore file from local caches to Lustre OSTs. Each PCC has a copytool instance running with unique archive number. Any remote access from another Lustre client would trigger the data synchronization. If a client with PCC goes offline, the cached data becomes inaccessible for other client temporally. And after the PCC client reboots and the copytool restarts, the data will be accessible again.

Following is what will happen in different conditions:

When file is being created on PCC

  • A normal file is created on MDT;
  • An empty mirror file is created on local cache;
  • The HSM status of the Lustre file will be set to archived and released;
  • The archive number will be set to the proper value.

When file is being **prefetched* to PCC*

  • An mirror file is copied to local cache;
  • The HSM status of the Lustre file will be set to archived and released;
  • The archive number will be set to the proper value.

When file is being accessed from PCC

  • Data will be read directly from local cache;
  • Metadata will be read from MDT, except file size;
  • File size will be got from local cache.

PCC should be able to accelerate some applications with certain I/O patterns.

For more information, please check the presentation during LUG'18 (http://wiki.lustre.org/images/0/04/LUG2018-Lustre_Persistent_Client_Cache-Xi.pdf).



 Comments   
Comment by Gerrit Updater [ 06/Oct/17 ]

Li Xi (lixi@ddn.com) uploaded a new patch: https://review.whamcloud.com/29347
Subject: LU-10092 llite: Add cache on client support
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a79a5d8115bfcf240479b363695feae3bc06756c

Comment by Jinshan Xiong (Inactive) [ 06/Oct/17 ]

The benchmark seems to be problematic because the bottleneck should be on the gigabit network. This is an alternative to local cachefs in kernel. Good work.

Comment by Andreas Dilger [ 11/Oct/17 ]

I'd long thought of implementing something similar using the cachefs in the kernel, but there was always the problem of how to manage cache coherency and data migration for files written into the client cache.  Using the HSM infrastructure to do this makes a lot of sense, since this correctly handles the case of files written into the client cache first, and only migrated into the filesystem afterward.  Using regular DLM locking is potentially problematic, since the client may be evicted before the cached file is flushed, but this is not a problem with "restoring" a file from the HSM archive.

This potentially also integrate nicely with composite files and FLR if we enhanced the Lustre layout to include an "HSM layout" component (equivalent to LOV_MAGIC_V1).  The "LOV_MAGIC_HSM" component describes a file in an HSM archive, storing the HSM archive number, "UUID" of the file within the archive, and other parameters (e.g. archive timestamp) needed to identify the file.  The archive timestamp could be useful for storing multiple replicas of the file in HSM and using it for file versioning, along with the FLR mirror_io equivalent to open up a specific component to access an older version of the file.

 

A note on naming, the "LCOC" acronym might have some bad connotations in English, so it probably makes sense to rename this feature to something like "Lustre Client Cache (LCC)" or "Persistent Client Cache (PCC)" or similar.

Comment by James A Simmons [ 11/Oct/17 ]

Doesn't this overlap with the linux kernel DAX api?

Comment by Li Xi (Inactive) [ 12/Oct/17 ]

Hi Andreas,

Thanks for you advice. We will change the name to PCC for better description.

And integrate FLR with PCC sounds great.

Comment by Li Xi (Inactive) [ 12/Oct/17 ]

Hi James,

 

I have limited knowlege about Linux Kernel DAX, but I think DAX requires a specific kind of block device which supports DAX? We are thinking of using any kind of SSD/NVME/DISK to accelerate Lustre.

Comment by Andreas Dilger [ 18/Oct/17 ]

It seems possible that the same PCC mechanism could be used as a way of managing data stored in NVRAM and accessed via DAX. This wouldn't work for SSD/NVMe storage on the client, but that is OK as well. Lustre would have to prestage/migrate data from the backend storage into the PCC (lfs hsm archive) so that it can be accessed via low-latency IO operations (DAX/mmap), then flushed back to the backing storage when it is no longer in use.

Comment by Li Xi (Inactive) [ 24/Nov/17 ]

Hi Andreas,

Thanks for reviewing the patch! It is really helpful!

To discuss on your comments in the patch, I feel it would be better to post some of my thoughts here.

> do we want to go directly to mounting whole filesystem images that are accessed directly on the client, or should the "automatic mount of ext4 filesystem image" be a separate (optional) feature of files stored in Lustre (regardless of PCC or remote)

I feel that should be in a seperate patch.Without "automatic mount of ext4 filesystem image", the current implementation can alreay accelerate some applications which do a lot of single client write/read.

And also, another thing that I would like to do is to implement a patch which improves the stat/getattr() performance by using PCC. Currently, the patch will not cache any attributes of file in Lustre, which means, file stat performance will not be able to accelerated. In the next patch, the attribute updates will be applied first to Lustre MDT and then to local file system, thus attributes can read directly from cache which should be able to accelerate the performance of reading metadata.

But indeed "automatic mount of ext4 filesystem image"  would be a fancy feature which will accelerates all data and metadata operations, which should be our final goal.

> is PCC duplicating fscache/cachefs/cachefiles too much, and we should look at using those to avoid code duplication and ease upstream acceptance, or is PCC faster and more flexible for our use?

I agree it would be a concern when pushing the patch to upstream Linux kernel. But I don't think PCC duplicates fscache/cachefs/cachefiles. The PCC code only has 1500 lines including user space tool. By contrat fs-cache has almost 6000 lines, and in order to integrate NFS with fs-cache, another 1000 lines are needed. Most of the codes of PCC are only Lustre related. And no cache management or indexing mechanism is introduced into PCC either. PCC codes only connects the existing interfaces of Lustre HSM and  the existing file system methods (struct file_operations etc.) of local file system together, using very limited externtion of inode in memory. I think the current implementation is almost the simplest implementation of local cache possible. That means, if we want to combine PCC with fscache or bcache or any other kind of mechanism, the implementation would be much more complicated, and more likely less efficient, and thus might be more difficult to be accepted by upstream Linux kernel. And it is hard to integrate PCC with fs-cache (or other cache machanism like bcache or so), because PCC is managing cache in granularity of files, yet fs-cache is managing cache in pages.

I am not a expert of fs-cache. But judging from a introduction (http://people.redhat.com/~dhowells/fscache/FS-Cache.pdf) and the codes, I feel fs-cache is less efficient than PCC. PCC is writing and reading (and getattr() in the later versions) directly to/from local file system which could be highly optimized. Instead, fs-cache manages cached data in pages. That means, in order to lookup local cache of file data, indexing of pages are needed. And searching indexes (through radix tree) will certainly introduce overhead. Also, PCC can be used on top of any kind of local file systems, which means PCC is able to utilize the newest techonologies of local file systems for NVME/SSD, which I think is another advantage over fs-cache.

Comment by James A Simmons [ 27/Nov/17 ]

I don't think its a fair comparison for the line count between PCC and fscache since PCC depends on a lot of Lustre infrastructure already present. A more fair comparison is if you were to create a functional PCC layer that could work with any file system. In that case I suspect more code would be involved. If PCC is truly a more efficient solution than fs-cache I would recommend making a general implementation that would work with ext4 as a example and push it upstream to the VFS maintainers. From your last sentence it appears to be the case that PCC can be truly independent of Lustre. This would boost DDN + lustre standing with the linux kernel community as well as getting back valuable feed back. Also by having it apart of the core kernel code it would be properly maintained instead you managing it until the end of time with an out of source tree. Please consider this approach. I would gladly help with the process of pushing it upstream, especially since I test newer kernels all the time.

Comment by Wang Shilong (Inactive) [ 29/Nov/17 ]

I just wrote some idea of mine, Generally speaking, i think PCC is different from FSCache, and i think it hard
to make PCC truly independent of Lustre.

I implement a simple version of FScache support for Lustre, see it from:
FScache for Lustre

I haven't got benchmark for it, but here is what i learned already,
FsCache could be page range, that mean it could cache partial of file which is good advantage vs PCC, Generally, fscache use backend filesystem layout and hook read/write, it will use bmap to check whether one page cached etc.

Here is some limit to use Fscache:
1) only buffer read.
2) any write will make file cache invalid.
3)double space usage for Cache, and write to cachefiles will happen during first cache miss.

FsCache is desgined as Generic, but limitation is obvious, compared to that, core desgin of PCC
Rely on Lustre HSM, and it hook read/write directly to Lustre read/write function, compared to
Fscache, advantange will be:

1)both read/write, buffer or direct IO could be easily support.
2)only one copy of space will be consumed.
3)read performance will be a little better than FsCache, due to shorter IO path compared to FsCache.

But PCC also have Limitation:
1)Do not support partial file cache compared to FsCache
2)Number of clients limited to 32 due to HSM limitation.
3)Different Lustre clients could not do parallel cache read for one file?

Comment by Andreas Dilger [ 29/Nov/17 ]

I think the 32-archive limit could be removed from Lustre fairly easily. Instead, there would probably be a limit of 2^32-1 archives.

I also discussed the multi-client cache with Li Xi in Denver.  One of the things I've been thinking about for HSM for a while is to move the "hsm" xattr (see LU-7207) to be part of a composite file layout as a new type of component (e.g. LOV_MAGIC_HSM or similar).  That would allow a file to have many different read-only mirrors of the data in different archives, though at most one would be writable.  When the client is doing the read, it could compare the Archive ID with the local Archive ID to see which of the mirrors it should be reading from.

In a similar way, it would be possible to have an HSM component that didn't cover the whole file, if the file was very large.  The HSM component could have a specific extent_start and extent_end like any other component.  This would also be useful for regular HSM archives if e.g. the file is larger than what could fit onto a single tape, or if it is desirable to archive different parts of the file in parallel.  It wouldn't allow page-granular cache of files in PCC, but that wouldn't make much sense for Lustre anyway - it would be better to have each PCC handle a large chunk of the file and use collective IO to map the IO processes to the appropriate client node.

Comment by Jinshan Xiong (Inactive) [ 13/Dec/17 ]

Hi Li Xi,

Is the I/O pattern detector that mentioned in your slides open-sourced?

Comment by Li Xi (Inactive) [ 12/Jan/18 ]

Hi Jinshan,

Sorry for my late reply. The I/O pattern detector is still under design and development. We are trying to implement some kind of file heat detector in Lustre client...

Comment by Gerrit Updater [ 12/Jan/18 ]

Li Xi (lixi@ddn.com) uploaded a new patch: https://review.whamcloud.com/30844
Subject: LU-10092 llite: improve stat using pcc
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 42f212ac303917959c6cb1bd4c4c7e4a230741cb

Comment by John Hammond [ 21/Feb/18 ]

How will clients avoid choosing conflicting archive ids?

Comment by Andreas Dilger [ 22/Feb/18 ]

I think the archive ID should be assigned by the MDS initially (lctl interface to generate a new ID), and then be assigned to the archive permanently. It should not relate to the NID of the client. With 32-bit IDs that shouldn't be too limiting, and they can be re-used if needed (round-robin).

Comment by John Hammond [ 23/Feb/18 ]

Sounds like kind of a headache for this use case.

Comment by Gerrit Updater [ 05/Jul/18 ]

Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/32787
Subject: LU-10092 pcc: Non-blocking PCC caching
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f8f1af1a6e60bee60d9f3561baa3d477ba18a39f

Comment by Qian Yingjin (Inactive) [ 05/Jul/18 ]

Current PCC uses refcount of PCC inode to determine whether a previous PCC-attached file can be detached. If a file is being used (refcount > 1), the detaching will return -EBUSY.
Each open on the PCC-cached file will increase the refcount of the PCC inode; Each close on the PCC-cached file will decrease the refcount of the PCC inode;
When another client accesses the PCC-cached file, it will trigger the restore progress as the file is HSM released. During restore, the Agent needs to detach the PCC-cached file.
Thus, if a PCC-attached file is keeping opened but not closed for a long time, the restore request will always return failure. It is unacceptable for some application.
In this patch, we implement a non-blocking PCC caching mechanism for Lustre. After attach the file into PCC, the client acquires the layout lock for the file, and the layout generation is maintained in the PCC inode. Under the layout lock protection, the PCC caching state is valid and all I/O are directing into PCC caching. When the layout lock is revoked, in the blocking AST it will invalidate the PCC caching state and detach the file automatically.
This patch is also helpful to handle the ENOSPC error for PCC write by fallback to normal I/O path which will restore the file data into OSTs (as the file is in HSM released state) and redo the write again.

Comment by Gerrit Updater [ 09/Aug/18 ]

Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/32963
Subject: LU-10092 llite: Add persistent cache on client
Project: fs/lustre-release
Branch: pcc
Current Patch Set: 1
Commit: 3c5dacbf218146df42cf822c90d911c7b08e3cfa

Comment by Gerrit Updater [ 09/Aug/18 ]

Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/32965
Subject: LU-10092 pcc: Use lease lock with intent to attach a file
Project: fs/lustre-release
Branch: pcc
Current Patch Set: 1
Commit: 20739c4806e7f43f538b3cf0e1aa0a86849a70b8

Comment by Gerrit Updater [ 09/Aug/18 ]

Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/32966
Subject: LU-10092 pcc: Non-blocking PCC caching
Project: fs/lustre-release
Branch: pcc
Current Patch Set: 1
Commit: ab2e4429eb2f035d5ad30825ee3c548430ff0e4f

Comment by James A Simmons [ 09/Aug/18 ]

Since you are going to land this even with its collision with other kernel features I suggest that you only enable this when CONFIG_FSCACHE and CONFIG_FS_DAX are both turn off. Otherwise this could cause all kinds of havoc on deployed systems. Sadly you can't force those features off when being out of tree.

Comment by Patrick Farrell (Inactive) [ 09/Aug/18 ]

James,

Would you explain how this conflicts with those?  Unless I've really missed something, it has no runtime conflicts, and while it's got some similarities/overlap, it also offers significantly different functionality.

Given that it doesn't appear to have runtime conflicts, only enabling this when those are turned off seems pointlessly restrictive.

Comment by James A Simmons [ 09/Aug/18 ]

My concern is that if lustre runs in an environment with other file systems that does use fscache or dax. Since fscache / dax are external subsystem they should be managing the node local storage between the different classes of file systems so they don't stomp on each other. I can see a case where PCC comes in and just smashes all over the local storage device. Does that make sense to you or maybe you have a far greater understanding of fscache and dax to ease my concern with potential conflicts with other files systems.

Second pushing this into the linux kernel when other solution exist might be a hard sell. Will DDN live with this work never going mainstream? Maybe they will never care.

Comment by Patrick Farrell (Inactive) [ 09/Aug/18 ]

Huh?  There are plenty of ways to configure your file systems so one stomps on another.  DAX and fscache add some new ones, but there are plenty already.  Setting up PCC to use a storage device that it shouldn't is no more (or less) dangerous than doing that with any other file system...?  PCC requires formatting the local storage device anyway!  So it's exactly as dangerous as anything else.

That second issue is a separate one.  To determine whether or not it's upstreamable as is would require a detailed analysis of offered functionality.  Certainly, PCC has a lot of Lustre integration that fscache and DAX obviously don't have.  In fact, other than it's usable as a cache, I don't see that much in common...?

Comment by James A Simmons [ 09/Aug/18 ]

Excellent, you seem to have a much better grasp than me on those potential conflicts. Since that is the case could you provide detail documentation on what potential conflicts could exist. This way sites can avoid potential data corruption. Also they should be made aware that no solutions to avoid these conflicts are on the table at this time.

Also since you seem to have a very good understanding of DAX, FSCache and this new PCC it would be nice if you take up the mantle and do the detail analysis. 

Comment by Patrick Farrell (Inactive) [ 09/Aug/18 ]

There is no need - Unless I've badly misunderstood, PCC requires reformatting the block device it will use, in a manner exactly like any other file system.  The implications of this are well understood and shared with any other file system.  It has no special interaction with DAX or fscache.  They're just other things that could potentially be used - well, fscache, anyway, I don't think DAX is, I believe it's not a caching layer - to achieve some of the same goals.  But they have no direct interaction with PCC.

There is no complex relationship to unpack here.

Comment by Gerrit Updater [ 21/Nov/18 ]

Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/33698
Subject: LU-10092 pcc: Non-blocking PCC caching
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: db4610b8734e33e0d52d5d024799e8e6d0504b83

Comment by Gerrit Updater [ 05/Dec/18 ]

Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/33787
Subject: LU-10092 pcc: auto attach during open for valid cache
Project: fs/lustre-release
Branch: pcc
Current Patch Set: 1
Commit: 6125ce9802fee1866c81670b7a7f41de4d2e1088

Comment by Gerrit Updater [ 13/Dec/18 ]

Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/33844
Subject: LU-10092 pcc: Add manual and remove options for detach
Project: fs/lustre-release
Branch: pcc
Current Patch Set: 1
Commit: 211495b07fe808a1b832e1d66354f6d50b2d4510

Comment by Gerrit Updater [ 08/Jan/19 ]

Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/33982
Subject: LU-10092 pcc: Add RW-PCC support for non-root user
Project: fs/lustre-release
Branch: pcc
Current Patch Set: 1
Commit: 588cc597a9260c933b3750cb2b58ef740f69db3f

Comment by Gerrit Updater [ 28/Feb/19 ]

Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/34341
Subject: LU-10092 pcc: Add a new connect flag for PCC
Project: fs/lustre-release
Branch: pcc
Current Patch Set: 1
Commit: 57689020c3b68666274e5528ef74f69519595366

Comment by Gerrit Updater [ 01/Mar/19 ]

Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/34356
Subject: LU-10092 pcc: Reserve a new connection flag for PCC
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 26dd48459dbe3c108ead5964065d4f33ea262427

Comment by Gerrit Updater [ 11/Apr/19 ]

Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/34637
Subject: LU-10092 pcc: security and permission for non-root user access
Project: fs/lustre-release
Branch: pcc
Current Patch Set: 1
Commit: 07867e40b8dbd242146c304049e72f87e3bab664

Comment by Gerrit Updater [ 18/Apr/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34356/
Subject: LU-10092 pcc: Reserve a new connection flag for PCC
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 93aa684046699a1d8802524003115ebaf07758ca

Comment by Gerrit Updater [ 13/Jun/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32963/
Subject: LU-10092 llite: Add persistent cache on client
Project: fs/lustre-release
Branch: pcc
Current Patch Set:
Commit: f172b116885753d0f316549a2fb9d451e9b4bd2e

Comment by Gerrit Updater [ 13/Jun/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32966/
Subject: LU-10092 pcc: Non-blocking PCC caching
Project: fs/lustre-release
Branch: pcc
Current Patch Set:
Commit: 58d744e3eaab358ef346e51ff4aa17e9f08efbb3

Comment by Gerrit Updater [ 13/Jun/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34637/
Subject: LU-10092 pcc: security and permission for non-root user access
Project: fs/lustre-release
Branch: pcc
Current Patch Set:
Commit: 2102c86e0d0ae735aed9ee8c1c6a77b63eda6037

Comment by Gerrit Updater [ 13/Jun/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33787/
Subject: LU-10092 pcc: auto attach during open for valid cache
Project: fs/lustre-release
Branch: pcc
Current Patch Set:
Commit: e29ecb659e51dd67758c5b0adb542210e7aeddb1

Comment by Gerrit Updater [ 13/Jun/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33844/
Subject: LU-10092 pcc: change detach behavior and add keep option
Project: fs/lustre-release
Branch: pcc
Current Patch Set:
Commit: 2dadefb4148f753dd93ee1dbebb3aac49bda2f8d

Comment by Gerrit Updater [ 13/Jun/19 ]

Oleg Drokin (green@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35214
Subject: LU-10092 First phase of persistent client cache project merging
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 478f97b212714bc3af9a9a104efab314ca942758

Comment by Gerrit Updater [ 13/Jun/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35214/
Subject: LU-10092 First phase of persistent client cache project merging
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 478f97b212714bc3af9a9a104efab314ca942758

Comment by James A Simmons [ 13/Jun/19 ]

Is it ready for testing yet?

Comment by Joseph Gmitter (Inactive) [ 30/Aug/19 ]

All patches targeted for V1 have been landed. Patches being worked on for V2 are moved to LU-12714

Generated at Sat Feb 10 02:31:59 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.