[LU-1376] ldlm_poold noise on clients significantly reduces applification performance - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.3.0, Lustre 2.1.2
Labels:
- jitter
- llnl
Environment:
https://github.com/chaos/lustre/commits/2.1.1-10chaos, 3000+ OSTs across 4 filesystems

Severity:
3
Rank (Obsolete):
4033

Description

Our users found that their application was scaling very poorly on our "zin" cluster. It is a sandy bridge cluster, 16 cores per node, roughly 3000 nodes. At relatively low node counts (512 nodes), they found that their performance on zin now that it is one the secure network is 1/4 of what is was when zin was on the open network.

One of the few differences is that zin now talks to 3000+ OSTs on the secure network, whereas it only talk to a few hundred OSTs while it was shaken down on the open network. One of our engineers noted that the ldlm_poold was frequently using 0.3% of CPU time on zin.

The application in question is HIGHLY sensitive to system daemons and other CPU noise on the compute nodes because it highly MPI coordinated. I created the attached patch (ldlm_poold_period.patch) that allows me to change the sleep interval used by the ldlm_poold. Sure enough, if I change the sleep time to 300 seconds, the application's performance immediate improves by 4X.

The ldlm_poold walking a list of 3000+ namespaces every second and doing nothing most of the time (because client namespaces are only actually "recalculated" every 10s) is a very bad design. The patch was just to determine if that was really the cause.

I will now work on a real fix.

I think instead of making the ldlm_poold's sleep time configurable, I will make both the LDLM_POOL_SRV_DEF_RECALC_PERIOD and LDLM_POOL_CLI_DEF_RECALC_PERIOD tunables. Then I will make the ldlm_poold will dynamically sleep based on the next period in the list of namespaces...although I probably don't want each name space to have its own starting time.

I probably want to keep the server and client namespace periods in sync with the namespaces of the same type, and then perhaps order the list as well to avoid walking the entire list unnecessarily.

No work needed by Whamcloud right now, except perhaps to comment on my approach if you think there is something that I should be doing differently (or if there is already work in this area that I haven't found).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

ldlm_poold_period.patch
1 kB
04/May/12 5:25 PM

Issue Links

is related to

LU-2924 shrink ldlm_poold workload

Resolved

Activity

People

Assignee:: Oleg Drokin

Reporter:: Christopher Morrone (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 04/May/12 5:25 PM

Updated:: 08/Nov/19 2:53 AM