[LU-2924] shrink ldlm_poold workload Created: 07/Mar/13  Updated: 25/Jul/14  Resolved: 26/Jun/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: Lustre 2.5.0

Type: Improvement Priority: Major
Reporter: Oleg Drokin Assignee: Oleg Drokin
Resolution: Fixed Votes: 0
Labels: None

Attachments: File nopatch-idle-data     File patch-idle-5624-data     File patch-idle-5793-data    
Issue Links:
Related
is related to LU-1376 ldlm_poold noise on clients significa... Open
is related to LU-1128 Complete investigation of the LDLM po... Resolved
is related to LU-5415 High ldlm_poold load on client Resolved
Sub-Tasks:
Key
Summary
Type
Status
Assignee
LU-3429 Create test plan for shrink ldlm_pool... Technical task Resolved Oleg Drokin  
Rank (Obsolete): 7025

 Description   

ldlm_poold wakes every second currently and walk list of all namespaces in the system doing some bookkeeping.
On systems with a lot of servers this client list gets quite large and it makes no sense to visit every namespace in the list if some of them don't actually have any locks.

As such I think it makes sense to only get ldlm_poold to iterate over non-empty client namespaces.
Estimates from Fujitsu indicate that on a system with 2000 OSTs just the list iteration (i.e. empty namespaces) takes 2ms which is probably excessive.



 Comments   
Comment by Oleg Drokin [ 07/Mar/13 ]

I have an idea for a patch at http://review.whamcloud.com/5624

I tried to add debug info to it and do some testing and the namespaces seem to be entering and exiting active list as expected.

I suspect this should ease most of the pain of big systems since they rarely (if ever) have outstanding locks to all OSTs present, so this should be a good first step in taming ldlm_poold

Comment by Andreas Dilger [ 07/Mar/13 ]

I think there is already a patch to fix this in Gerrit. Search for message:poold or similar.

Comment by Andreas Dilger [ 07/Mar/13 ]

LU-1520 and patch http://review.whamcloud.com/3859 and LU-607/http://review.whamcloud.com/1334 are what I was thinking about.

Comment by Oleg Drokin [ 07/Mar/13 ]

That's a bit orthogonal though also related issue.

But those patches are just about shrinking.
I am dealing with the ldlm_poold accounting thread, it's woken up once a second and iterates entire list of namespaces (on every node in the system), which adds (avoidable) OS jitter.

Comment by Oleg Drokin [ 21/Mar/13 ]

Another improvement idea: only wake up as frequently as makes sense, which means about once per 10 seconds on clients, instead of once every second. This should further reduce ldlm poold time wasting at the expense of some innacuracies in grant rate calculations, but I don't really think it's such an important metric on the client side.

patch in http://review.whamcloud.com/5793

Comment by Oleg Drokin [ 21/Mar/13 ]

My final idea for this (for later) is to further spread apart wakeups on clients by looking at slv and clv and calculating when they meet (we can update clv locally on every lock moving to lru and can more or less accurately guess time left during recalc with some margin, and then decide to sleep this much. If server slv drastically changes, we can wake poold ahead of scheduled time it we want to (which sounds like a good idea anyway).
In the extremum this could lead to no ldlmpoold at all and we can just do on-the-fly adjustments from rpc callbacks, like after-reply or similar.

Comment by James A Simmons [ 10/Apr/13 ]

I tested both the patches for this ticket on a system were I create 1800 virtual OSTs on four OSS. So far I managed to collect the data for when it was idle 6 hours after it was mounted. 3 clients were used in the test. One client with no patch, second client with only patch 5624, and the third client with both patches.

Comment by Niu Yawei (Inactive) [ 12/Apr/13 ]

This reminds me LU-1128, looks the pool shrinker on server side can't work as expected (under memory pressure, the SLV isn't decreased and the lock cancel from client never be triggered.)

Though this ticket is mainly for addressing the client side issue.

Comment by Jodi Levi (Inactive) [ 26/Jun/13 ]

Patches landed and test plan written.

Comment by Ryan Haasken [ 20/Sep/13 ]

Oleg, in http://review.whamcloud.com/5624, why did you change the type of ldlm_srv_namespace_nr and ldlm_cli_namespace_nr from cfs_atomic_t to int?

Now when we call ldlm_namespace_nr_*, are we expected to be holding the ldlm_namespace_lock()?

Generated at Sat Feb 10 01:29:24 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.