Metadata writeback cache support
(LU-10938)
|
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Technical task | Priority: | Minor |
| Reporter: | Qian Yingjin | Assignee: | Qian Yingjin |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
It would better to design a reclaim mechanism to free up some reserved inodes for newly creation or cache pages for latter I/O in case of cache saturation.
The cache shrinker starts to work if the cache allocation has become larger than the upper watermark and it evicts files until the allocation is below a lower watermark. |
| Comments |
| Comment by Andreas Dilger [ 15/May/20 ] | ||||||||||
|
Could we just hook into the existing kernel inode/slab/page shrinkers to manage this? One thing that is important to remember is that these shrinkers are essentially a "notification method" from the kernel about memory pressure, but we should still be free to add/modify the inodes/pages that are being flushed at one time to be more IO/RPC friendly (e.g. selecting contiguous pages to write to the OST, though I'm not sure what would be best for MDT aggregation). One thing that we have to worry about is delaying writeback to the MDT/OST for too long, as that can cause memory pressure to increase significantly, and we will have wasted tens of seconds not sending RPCs, which could have written GBs of dirty data during that time. I think as much as possible it makes sense to have a "write early, free late" kind of policy that we have for dirty file data so that we don't waste the bandwidth/IOPS just waiting until we are short of memory. | ||||||||||
| Comment by Gerrit Updater [ 22/May/20 ] | ||||||||||
|
Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/38697 | ||||||||||
| Comment by Qian Yingjin [ 22/May/20 ] | ||||||||||
|
" One thing that we have to worry about is delaying writeback to the MDT/OST for too long, as that can cause memory pressure to increase significantly, and we will have wasted tens of seconds not sending RPCs, which could have written GBs of dirty data during that time. I think as much as possible it makes sense to have a "write early, free late" kind of policy that we have for dirty file data so that we don't waste the bandwidth/IOPS just waiting until we are short of memory. " Can we tune the kernel writeback parameters to achieve this goal? Linux Writeback Settings
Moreover, for data IO pages, we can control the limit of cache pages in MemFS per file to allow data caching in MemFS. If exceed this threshold (i.e. max_pages_per_rpc: 16M? or only 1M to allow to cache much more small files), the client will assimilate the cache pages from MemFS into Lustre. After that, all data IO on this file is directed to Lustre OSTs via Lustre normal IO path. | ||||||||||
| Comment by Gerrit Updater [ 28/May/20 ] | ||||||||||
|
Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/38739 | ||||||||||
| Comment by Gerrit Updater [ 28/May/20 ] | ||||||||||
|
Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/38749 | ||||||||||
| Comment by Gerrit Updater [ 09/Jun/20 ] | ||||||||||
|
Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38875 |