[LU-6398] LDLM lock creation race condition Created: 24/Mar/15  Updated: 31/Jan/22  Resolved: 31/Jan/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Patrick Farrell (Inactive) Assignee: WC Triage
Resolution: Won't Fix Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

As promised in LU-6397, this ticket describes another race condition between acquiring new LDLM locks made possible by LU-1669.

After the kms_valid problem has been fixed for new objects (see LU-6397), a closely related race condition remains.

Consider this sequence of events with two processes, P1 and P2:
P1 makes an IO request (FX, write to the first page of a file)
P1 creates an LDLM lock request
P1 calls osc_enqueue_base and ldlm_lock_match to check for locks, none found
P1 waits for reply from server
P2 makes an IO request (FX, read from the second page of the file)
P2 creates an LDLM lock request
P2 calls osc_enqueue_base and ldlm_lock_match to check for locks, none found
P2 waits for a reply from server
P1 Receives reply, lock is granted
(Lock is expanded beyond the requested extent, so it covers the area P2 wants to read)
P2 Receives reply, lock is blocked by lock granted to P1
Lock granted to P1 is called back by server, even though it matches request from P2

The problem is this:
The lock to allow P1s IO request is still waiting for a reply from the server, so it is not on any queue, and is not found by the lock request from P2.

Locks are currently added to the waiting or granted queue (at which point they can be matched by other lock requests) by the processing policy, which is called from ldlm_lock_enqueue, which is called from ldlm_cli_enqueue_fini.

ldlm_cli_enqueue_fini is not called until a reply has been received from the server. So while P1 is waiting on a reply from the server, the P2 lock request (which would match the P1 lock if it were available for matching) can continue.

This was previously prevented by LU-1669.

I do not currently see a simple way to resolve this problem. This is made particularly difficult by async locks, such as lock ahead locks, which do not wait for a reply from the server. If we were concerned with only synchronous locks, we could either ignore this or conceivably hold a lock preventing a new lock request from calling in to ldlm_lock_match until the other lock was issued. The problem with this idea is that it would prevent other IOs from using other existing locks.

I have one partial idea:
Another lock queue, "lr_waiting_reply" or "lr_created", to which locks could be added to when created at the start of ldlm_cli_enqueue_fini but before sending to the server.

This would only require preventing lock matching requests for the time it took to allocate the ptlrpc request & create the lock. I am not sure how we would lock this, though - Holding the resource spinlock for so long does not seem advisable.

This bug should be fairly low priority: It does not currently cause any actual failures, as valid locks are issued for the various IO requests involved. It's just inefficient.

For asynchronous locks such as the proposed lock ahead locks, this problem is perhaps a bit worse, since they can conceivably be cancelled by a normal IO request which was intended to fit in to one. Still, if there are multiple lock ahead requests on the server, the lock requested for the IO request will not be expanded, and as a result, the server will only cancel one of the lock ahead locks.



 Comments   
Comment by Andreas Dilger [ 30/Jan/22 ]

Patrick, is this still an issue?

Comment by Patrick Farrell [ 31/Jan/22 ]

Yes but it's pretty minor and I think not worth fixing - Basically it describes a situation where a client has a pending lock request (pending but not yet in the waiting queue, so basically, no reply from the server yet), then another thread on the client generates a second lock request which could use that lock, but doesn't match because it's not granted yet.  The second lock request is actually made (when it could have just waited for the first lock to be available), and the client ends up self-conflicting and cancelling the first lock.

But that all works just fine - no deadlock or other issue.

Anyway, this is A) rare (there's provision in the cl/OSC locking that mostly catches this), and B) harmless - just a very slight and probably transient performance cost (ie, after the first two compatible requests have fought it out, an LDLM lock is held in the normal way without further issue, so there's no ongoing cost paid).

Also, this is technically possible for normal I/O, but really it's something I saw with lockahead when I'd have lockahead start an async request, then the thread would go on to do I/O right away and would send a new request before the first one had received a reply.  This was solved in practice by creating a way for userspace to check if a lock existed (via the lockahead/ladvise interface), so the lockahead library could spool off a bunch of async requests all at once, but could then wait to start I/O until at least the first lock had been granted.

I'm going to close this one out.

Generated at Sat Feb 10 01:59:52 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.