Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.9.0
Affects Version/s: Lustre 2.5.3, Lustre 2.8.0
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

A performance problem at one of our customers led us to find that the granted ldlm locks counter (found in /proc/fs/lustre/ldlm/namespaces/mdt-fsname-MDT0000_UUID/pool/granted) is actually missing some decrements in some conditions (yet to be determined).

This leads after some time to have this counter largely exceed the number found in /proc/fs/lustre/ldlm/namespaces/mdt-fsname-MDT0000_UUID/pool/limit.

See here:

[root@prolixmds1 pool]# pwd
/proc/fs/lustre/ldlm/namespaces/mdt-scratch-MDT0000_UUID/pool
[root@prolixmds1 pool]# cat limit
3203616
[root@prolixmds1 pool]# cat granted
54882822

However, summing up all granted locks as seen by the all the clients, we get only 16k locks, which is also consistent with the slab consumption on the MDS.

Once above the limit, the MDS will then constantly try to cancel locks, even those which are not above max_age. Clients then reacquire the locks, but lose time in the process (then showing the performance problem).

Note that as this is only the counter which is false, we don't have any resource overconsumption tied to this problem.

We found that this problem is also seen on 2.8.
Can you help find where the leak comes from ?

I also wonder if there is any relation with the last comment from Shuichi Ihara in ~~LU-5727~~.
I also think Christopher Morrone pointed this out here

Attachments

Issue Links

is related to

LU-8634 2.8.0 MDS (layout.c:2025:__req_capsule_get()) @@@ Wrong buffer for field `quota_body' (3 of 1) in format `LDLM_INTENT_QUOTA': 0 vs. 112 (server)

Resolved

Activity

People

Assignee:: Oleg Drokin

Reporter:: Sebastien Piechurski

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 07/Jun/16 9:49 AM

Updated:: 14/Jun/18 9:41 PM

Resolved:: 30/Oct/16 11:40 AM