[LU-17493] restore LDLM cancel on blocking callback Created: 01/Feb/24  Updated: 01/Feb/24

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0, Lustre 2.16.0, Lustre 2.17.0
Fix Version/s: None

Type: Improvement Priority: Major
Reporter: Andreas Dilger Assignee: WC Triage
Resolution: Unresolved Votes: 1
Labels: None

Issue Links:
Related
is related to LU-3308 large readdir chunk size slows unlink... Reopened
is related to LU-11509 LDLM: replace lock LRU with improved ... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

In the old days of Catamount on ASCI Red with liblustre running in the Catamount OS that did not have any CPU interrupts. That meant any server-to-client requests (such as DLM lock cancellations) must be handled asynchronously on the client when the application yielded the processor to filesystem administrative tasks.

In that environment, the server would immediately assume that a DLM lock was cancelled as soon as the AST was sent on a lock with LDLM_FL_CANCEL_ON_BLOCK set on a lock, rather than waiting for the client to reply to the AST and actually cancel the lock. This avoided potentially significant delays for servers granting locks.

In large clusters, there are some locks that are invariably highly contended (e.g. ROOT/, /home/ or /project directories, either because many clients are holding a read lock and some client wants to modify the directory, or because of conflicting workloads (e.g. "ls -l" or "rm" in a directory (tree) that is actively in use by other clients. If any client holding a contended lock has a problem, for example LU-17453/LU-17476, then other nodes accessing that lock may block for tens or hundreds of seconds until it is cancelled or the client is evicted.

It would be useful if LDLM_FL_CANCEL_ON_BLOCK was used for such highly-contended resources when requested with LCK_PR mode, so that the server can send asynchronous ASTs to all clients and then cancel the DLM locks rapidly and perform the required operation without getting blocked by unresponsive clients. Any responsive client will receive the AST and not even need to send the cancel RPC, while unresponsive clients are already unlikely to know or care whether the server sent the AST, so they will have an inconsistent local state until they again contact the server (as they already do today).

This could potentially also be tied into "ls" (readdir()) being able to run with "LDLM_FL_CANCEL_ON_BLOCK" locks, or no DLM locks at all on the directory or inodes. Per comments in LU-3308, POSIX does not require readdir() to be fully cache coherent even among processes on the same node, only that the readdir cache is reset with rewinddir() and close().


Generated at Sat Feb 10 03:35:53 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.