[LU-12079] lockless IO stucks if client has other lock already Created: 17/Mar/19  Updated: 09/Jul/21  Resolved: 09/Jul/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Mikhail Pershin Assignee: Patrick Farrell
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Related
is related to LU-11435 add contention check and accounting f... Closed
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

lockless write IO may stuck if client has other lock on the same page, e.g. PR lock from previous READ or GLIMPSE requests. WRITE IO cause blocking AST from server for that PR lock which tries to discard pages on client and stuck on cl_page_own() forever if that page is involved in lockless WRITE already.

At first sign it seems that any lockless write must find and cancel all client locks on write range before IO is started. Though can be not enough because some other concurrent lock can be taken while lockless IO is being issued.



 Comments   
Comment by Mikhail Pershin [ 17/Mar/19 ]

I have found that while testing lock contention mechanism for DOM, which switches locks to lockless locks, but that is also true for normal files. That can be seen with hard lock contention settings with ns_contended_locks=0 and ns_max_nolock_size=some_big_value. Running fsx reveals such bug quite quickly.

 Interesting that ns_max_nolock_size default value is 0. That means the lockless IO is always turned off in Lustre by default

Comment by Patrick Farrell (Inactive) [ 17/Mar/19 ]

Ahh, I thought lockless i/o was off by default, because I never saw it turn on when working on shared file contention...  It's good to know why.

Comment by Mikhail Pershin [ 17/Mar/19 ]

yes, I was thinking that lockless IO is good benefit when there is lock contention on server and client switches to using of server locks temporarily because there is no sense in lock caching and constant ping-pong with lock enqueue - lock cancel. But it is not working at all until ns_contended_locks will be set to some value and even being enabled it is not reliable because of possible deadlocks.

Comment by Patrick Farrell [ 09/Jul/21 ]

Duplicated/resolved by removal in LU-14838

Generated at Sat Feb 10 02:49:29 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.