[LU-10967] MDT page cache management improvements Created: 30/Apr/18  Updated: 30/Apr/18

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.3
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Nathan Dauchy (Inactive) Assignee: Peter Jones
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Relevant to any configuration where critical MDT metadata may not fit in memory or may get pushed out of DRAM cache due to other IO.


Issue Links:
Related
is related to LU-15 strange slow IO messages and bad perf... Resolved
is related to LU-3631 file stats are slow on filesystem sta... Resolved
is related to LU-1885 Lustre should be able to limit amount... Resolved
is related to LU-10946 add an interface to load ldiskfs bloc... Closed
Rank (Obsolete): 9223372036854775807

 Description   

Opening this as somewhat of a catch all for improvements to MDT memory management, either through pinning or making more selective use of the page cache.  This will be important for at least two cases...
A) Data On MDT
B) Combining OST and MDT on a single server
MDT performance historically relied on "read mostly" workload to backing storage, with much of the important and frequently used metatdata in DRAM cache.  With increase in data blocks from either case above, that could quickly get pushed out of cache and thereby slow down metadata ops.

It may not be practical to try to pin all MDT "real metadata" info in memory, but some features to perhaps release DoM from the page cache faster, or refresh the MDT (and OST) bitmaps and inode tables, would be helpful.



 Comments   
Comment by Nathan Dauchy (Inactive) [ 30/Apr/18 ]

The other tickets I could find that are even marginally related to this issue...
LU-1885 : Lustre should be able to limit amount of memory used for read and write caches
LU-2477 : Poor MDS create performance due to ARC cache growth
LU-10946 : add an interface to load ldiskfs block bitmaps

Comment by Peter Jones [ 30/Apr/18 ]

Nathan

Thanks for opening this ticket. It will be good to get a variety of views onto possible approaches here.

Peter

Comment by Andreas Dilger [ 30/Apr/18 ]

The proposal for LU-10946 to pin the block and inode bitmaps would also help for the MDS, though to a lesser extent because the MDT is typically flash so reading them on demand from storage will be much faster than competing with a streaming I/O workload on the OST on spinning disks.

The block bitmaps would need 4KB of RAM per 128MB of MDT space, and the inode bitmaps would double that. For a largish MDTs, say 8TB (=4B files), this would mean 4KB * (8TB/128MB) = 256MB for the block bitmaps, and an additional 256MB for the inode bitmaps, which is not unreasonable. Pinning the inode tables on the MDT would be totally impractical, since they consume (by default) 1/2 of all the MDT space (i.e. you would need 4TB of RAM to pin all of the MDT inode tables).

Generated at Sat Feb 10 02:39:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.