[LU-2924] shrink ldlm_poold workload - Whamcloud Community JIRA

Details

Type: Improvement
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.5.0
Affects Version/s: Lustre 2.5.0
Labels:
None

Rank (Obsolete):
7025

Description

ldlm_poold wakes every second currently and walk list of all namespaces in the system doing some bookkeeping.
On systems with a lot of servers this client list gets quite large and it makes no sense to visit every namespace in the list if some of them don't actually have any locks.

As such I think it makes sense to only get ldlm_poold to iterate over non-empty client namespaces.
Estimates from Fujitsu indicate that on a system with 2000 OSTs just the list iteration (i.e. empty namespaces) takes 2ms which is probably excessive.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

nopatch-idle-data
10/Apr/13 5:00 PM
0.4 kB
James A Simmons
patch-idle-5624-data
10/Apr/13 5:00 PM
0.3 kB
James A Simmons
patch-idle-5793-data
10/Apr/13 5:00 PM
0.3 kB
James A Simmons

Issue Links

is related to

LU-5415 High ldlm_poold load on client

Resolved

is related to

LU-1376 ldlm_poold noise on clients significantly reduces applification performance

Open

LU-1128 Complete investigation of the LDLM pool shrinker and SLV handling

Resolved

Sub-Tasks

Progress

Create test plan for shrink ldlm_poold workload and attach to Jira ticket

Resolved

Oleg Drokin

Activity

[LU-2924] shrink ldlm_poold workload

Ryan Haasken added a comment - 20/Sep/13 7:01 PM

Oleg, in http://review.whamcloud.com/5624, why did you change the type of ldlm_srv_namespace_nr and ldlm_cli_namespace_nr from cfs_atomic_t to int?

Now when we call ldlm_namespace_nr_*, are we expected to be holding the ldlm_namespace_lock()?

Ryan Haasken added a comment - 20/Sep/13 7:01 PM Oleg, in http://review.whamcloud.com/5624 , why did you change the type of ldlm_srv_namespace_nr and ldlm_cli_namespace_nr from cfs_atomic_t to int? Now when we call ldlm_namespace_nr_*, are we expected to be holding the ldlm_namespace_lock()?

Jodi Levi (Inactive) added a comment - 26/Jun/13 9:41 PM

Patches landed and test plan written.

Jodi Levi (Inactive) added a comment - 26/Jun/13 9:41 PM Patches landed and test plan written.

Niu Yawei (Inactive) added a comment - 12/Apr/13 3:16 AM

This reminds me ~~LU-1128~~, looks the pool shrinker on server side can't work as expected (under memory pressure, the SLV isn't decreased and the lock cancel from client never be triggered.)

Though this ticket is mainly for addressing the client side issue.

Niu Yawei (Inactive) added a comment - 12/Apr/13 3:16 AM This reminds me LU-1128 , looks the pool shrinker on server side can't work as expected (under memory pressure, the SLV isn't decreased and the lock cancel from client never be triggered.) Though this ticket is mainly for addressing the client side issue.

James A Simmons added a comment - 10/Apr/13 5:00 PM

I tested both the patches for this ticket on a system were I create 1800 virtual OSTs on four OSS. So far I managed to collect the data for when it was idle 6 hours after it was mounted. 3 clients were used in the test. One client with no patch, second client with only patch 5624, and the third client with both patches.

James A Simmons added a comment - 10/Apr/13 5:00 PM I tested both the patches for this ticket on a system were I create 1800 virtual OSTs on four OSS. So far I managed to collect the data for when it was idle 6 hours after it was mounted. 3 clients were used in the test. One client with no patch, second client with only patch 5624, and the third client with both patches.

Oleg Drokin added a comment - 21/Mar/13 5:50 AM

My final idea for this (for later) is to further spread apart wakeups on clients by looking at slv and clv and calculating when they meet (we can update clv locally on every lock moving to lru and can more or less accurately guess time left during recalc with some margin, and then decide to sleep this much. If server slv drastically changes, we can wake poold ahead of scheduled time it we want to (which sounds like a good idea anyway).
In the extremum this could lead to no ldlmpoold at all and we can just do on-the-fly adjustments from rpc callbacks, like after-reply or similar.

Oleg Drokin added a comment - 21/Mar/13 5:50 AM My final idea for this (for later) is to further spread apart wakeups on clients by looking at slv and clv and calculating when they meet (we can update clv locally on every lock moving to lru and can more or less accurately guess time left during recalc with some margin, and then decide to sleep this much. If server slv drastically changes, we can wake poold ahead of scheduled time it we want to (which sounds like a good idea anyway). In the extremum this could lead to no ldlmpoold at all and we can just do on-the-fly adjustments from rpc callbacks, like after-reply or similar.

Oleg Drokin added a comment - 21/Mar/13 5:47 AM

Another improvement idea: only wake up as frequently as makes sense, which means about once per 10 seconds on clients, instead of once every second. This should further reduce ldlm poold time wasting at the expense of some innacuracies in grant rate calculations, but I don't really think it's such an important metric on the client side.

patch in http://review.whamcloud.com/5793

Oleg Drokin added a comment - 21/Mar/13 5:47 AM Another improvement idea: only wake up as frequently as makes sense, which means about once per 10 seconds on clients, instead of once every second. This should further reduce ldlm poold time wasting at the expense of some innacuracies in grant rate calculations, but I don't really think it's such an important metric on the client side. patch in http://review.whamcloud.com/5793

Oleg Drokin added a comment - 07/Mar/13 3:35 PM

That's a bit orthogonal though also related issue.

But those patches are just about shrinking.
I am dealing with the ldlm_poold accounting thread, it's woken up once a second and iterates entire list of namespaces (on every node in the system), which adds (avoidable) OS jitter.

Oleg Drokin added a comment - 07/Mar/13 3:35 PM That's a bit orthogonal though also related issue. But those patches are just about shrinking. I am dealing with the ldlm_poold accounting thread, it's woken up once a second and iterates entire list of namespaces (on every node in the system), which adds (avoidable) OS jitter.

Andreas Dilger added a comment - 07/Mar/13 1:33 PM

~~LU-1520~~ and patch http://review.whamcloud.com/3859 and ~~LU-607~~/http://review.whamcloud.com/1334 are what I was thinking about.

Andreas Dilger added a comment - 07/Mar/13 1:33 PM LU-1520 and patch http://review.whamcloud.com/3859 and LU-607 / http://review.whamcloud.com/1334 are what I was thinking about.

Andreas Dilger added a comment - 07/Mar/13 2:33 AM

I think there is already a patch to fix this in Gerrit. Search for message:poold or similar.

Andreas Dilger added a comment - 07/Mar/13 2:33 AM I think there is already a patch to fix this in Gerrit. Search for message:poold or similar.

Oleg Drokin added a comment - 07/Mar/13 1:59 AM

I have an idea for a patch at http://review.whamcloud.com/5624

I tried to add debug info to it and do some testing and the namespaces seem to be entering and exiting active list as expected.

I suspect this should ease most of the pain of big systems since they rarely (if ever) have outstanding locks to all OSTs present, so this should be a good first step in taming ldlm_poold

Oleg Drokin added a comment - 07/Mar/13 1:59 AM I have an idea for a patch at http://review.whamcloud.com/5624 I tried to add debug info to it and do some testing and the namespaces seem to be entering and exiting active list as expected. I suspect this should ease most of the pain of big systems since they rarely (if ever) have outstanding locks to all OSTs present, so this should be a good first step in taming ldlm_poold

People

Assignee:: Oleg Drokin

Reporter:: Oleg Drokin

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 07/Mar/13 1:51 AM

Updated:: 25/Jul/14 5:31 PM

Resolved:: 26/Jun/13 9:41 PM