[LU-6433] MDS deadlock in qouta Created: 06/Apr/15  Updated: 09/Feb/17  Resolved: 10/Jul/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Minor
Reporter: Andriy Skulysh Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: patch

Issue Links:
Related
is related to LU-7926 MDS sits idle with extreme slow respo... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

With new quota code MDS obtains quota credits via ptlrpc requests.
All quota operation takes place after obtaining ldlm lock.
So, with single MDS it can happen that we are run out of mdt threads to process quota request while all other mdt threads are waiting for a lock conflicting with a lock taken by a thread which is sending quota request.

The possible solution is to serve quota requests in a HP queue. It looks like quota lock doesn't conflicts, so quota intends can be processed in a HP queue also.



 Comments   
Comment by Gerrit Updater [ 06/Apr/15 ]

Andriy Skulysh (andriy.skulysh@seagate.com) uploaded a new patch: http://review.whamcloud.com/14369
Subject: LU-6433 quota: handle QUOTA_DQACQ in hp queue
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a4fde3781061b33d660dccb6499469cac6ca1d25

Comment by Andriy Skulysh [ 06/Apr/15 ]

patch: http://review.whamcloud.com/14369

Comment by Niu Yawei (Inactive) [ 16/Apr/15 ]

This looks like a livelock problem to me, I don't quite see why adding the quota request to HP list is helpful. I think we usually use different portal (and thread set) to handle such problem, probably we'd introduce a new portal for quota requests?

Comment by Andriy Skulysh [ 16/Apr/15 ]

The patch helps because there is always an free HP thread to handle quota request even if all other MDS threads are waiting for a lock. Different portal is more general and complex solution. Is it needed in this case ?

Comment by Niu Yawei (Inactive) [ 17/Apr/15 ]

Using different portal looks simpler and cleaner to me, actually, I think we can just reuse the READPAGE portal for dqacq request. (or only redirect the synchronous dqacq to the readpage portal).

Comment by Cory Spitz [ 08/Jul/15 ]

Code review (and landing) of http://review.whamcloud.com/14369 was raised to the OpenSFS LWG on 7/1. What blocks this patch from landing?

Comment by Gerrit Updater [ 10/Jul/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14369/
Subject: LU-6433 quota: handle QUOTA_DQACQ in READPAGE portal
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 471ad1f679ad7c0193785f82abf6f249ffeb1e79

Comment by Peter Jones [ 10/Jul/15 ]

Landed for 2.8

Generated at Sat Feb 10 02:00:11 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.