[LU-181] Lustre memory usage Created: 31/Mar/11  Updated: 29/May/17  Resolved: 29/May/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major
Reporter: Liang Zhen (Inactive) Assignee: Liang Zhen (Inactive)
Resolution: Incomplete Votes: 0
Labels: None

Rank (Obsolete): 4561

 Description   

This is quote from Andreas:

"Originally my thought was 4kB per inode (about 2kB for the inode itself, and 2kB for the ldlm_lock+ldlm_resource), but then I realized that we have an ldlm_lock (over 400 bytes today) for every client that is caching this resource.

That means on a filesystem with 100k clients it consumes 800kB for every pointer in struct ldlm_lock for every inode cached by all the clients. It consumes 50MB for ldlm_lock for all of the clients to cache a single inode. That equates to only 20 locks per GB of RAM, which is pretty sad.

Taking a quick look at struct ldlm_lock, there is a ton of memory wastage that could be avoided quite quickly simply by aligning the fields better for 64-bit CPUs. There are a number of other fields, like l_bl_ast that can be made smaller (it is a boolean flag that could at least be shrunk to a single byte, and stuck with the other "byte flags"), and l_readers/l_writers are only > 1 on a client, and it is limited to the number of threads concurrently accessing the lock so 16 bits is already overkill.

There are also fields like l_blocking_ast, l_completion_ast, l_glimpse_ast, and l_weigh_ast that are almost always identical on a client or server, and are determined at compile time, so it would be trivial to replace them with a pointer to a pre-registered or even static struct ldlm_callback_suite, saving 2.4MB per widely-cached inode alone.

There are also fields that are only ever used on the client or the server, and grouping those into a union would not only save memory, I think it would clarify the code somewhat to better understand how the fields in a lock are used."



 Comments   
Comment by Liang Zhen (Inactive) [ 01/Apr/11 ]

a few more things:

  • I think we can disable LUSTRE_TRACKS_LOCK_EXP_REFS by default right?
  • we can easily save 24 bytes for portals handle even with RCU (by reusing h_cookie and remove h_ptr, and add ops table to replace two pointers of callbacks)
  • probably we don't want to poison full chunk of memory on OBD_FREE? (i.e: limit the size of POISION to 64 bytes, which will reduce a lot of data traffic)
  • 64-bits aligning members in struct dynlock, which will save some memory for all server side inodes
  • we can disable dynlock in MDD layer which is useless, and align members of mdd_object, so it's size will drop to less than 128 bytes
  • aligning fields and disable unused members of lu_object_header/lu_object/mdd_object/mdt_object will help somehow for metadata stack
Comment by Liang Zhen (Inactive) [ 09/Dec/11 ]

I've posted the first patch at here: http://review.whamcloud.com/#change,1827
it's just the first step of this work.

Comment by Andreas Dilger [ 29/May/17 ]

Close old bug.

Generated at Sat Feb 10 01:04:34 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.