Details
-
Bug
-
Resolution: Won't Fix
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
As promised in LU-6397, this ticket describes another race condition between acquiring new LDLM locks made possible by LU-1669.
After the kms_valid problem has been fixed for new objects (see LU-6397), a closely related race condition remains.
Consider this sequence of events with two processes, P1 and P2:
P1 makes an IO request (FX, write to the first page of a file)
P1 creates an LDLM lock request
P1 calls osc_enqueue_base and ldlm_lock_match to check for locks, none found
P1 waits for reply from server
P2 makes an IO request (FX, read from the second page of the file)
P2 creates an LDLM lock request
P2 calls osc_enqueue_base and ldlm_lock_match to check for locks, none found
P2 waits for a reply from server
P1 Receives reply, lock is granted
(Lock is expanded beyond the requested extent, so it covers the area P2 wants to read)
P2 Receives reply, lock is blocked by lock granted to P1
Lock granted to P1 is called back by server, even though it matches request from P2
The problem is this:
The lock to allow P1s IO request is still waiting for a reply from the server, so it is not on any queue, and is not found by the lock request from P2.
Locks are currently added to the waiting or granted queue (at which point they can be matched by other lock requests) by the processing policy, which is called from ldlm_lock_enqueue, which is called from ldlm_cli_enqueue_fini.
ldlm_cli_enqueue_fini is not called until a reply has been received from the server. So while P1 is waiting on a reply from the server, the P2 lock request (which would match the P1 lock if it were available for matching) can continue.
This was previously prevented by LU-1669.
I do not currently see a simple way to resolve this problem. This is made particularly difficult by async locks, such as lock ahead locks, which do not wait for a reply from the server. If we were concerned with only synchronous locks, we could either ignore this or conceivably hold a lock preventing a new lock request from calling in to ldlm_lock_match until the other lock was issued. The problem with this idea is that it would prevent other IOs from using other existing locks.
I have one partial idea:
Another lock queue, "lr_waiting_reply" or "lr_created", to which locks could be added to when created at the start of ldlm_cli_enqueue_fini but before sending to the server.
This would only require preventing lock matching requests for the time it took to allocate the ptlrpc request & create the lock. I am not sure how we would lock this, though - Holding the resource spinlock for so long does not seem advisable.
This bug should be fairly low priority: It does not currently cause any actual failures, as valid locks are issued for the various IO requests involved. It's just inefficient.
For asynchronous locks such as the proposed lock ahead locks, this problem is perhaps a bit worse, since they can conceivably be cancelled by a normal IO request which was intended to fit in to one. Still, if there are multiple lock ahead requests on the server, the lock requested for the IO request will not be expanded, and as a result, the server will only cancel one of the lock ahead locks.