There is discussion in LU-17493 about restoring the cancel-on-block functionality as a first-class citizen, in order to make large clusters more reliable in the face of unstable network connectivity to clients.
We've seen in a few cases where clients have poor network connections (nearly inevitable as cluster size grows while using commodity Ethernet network hardware), and clients will sometimes lose their connection for a second or two. If an AST is sent during this time, the client may not get it, but will eventually restore the network connection in time to get the server resend, and otherwise remain alive in the eyes of the server.
We've made the client and server connection "too robust" in some senses, to avoid client eviction that causes application failures and is otherwise disruptive, while allowing misbehaving clients to cause problems elsewhere in the cluster.
If the flakey client is holding an important DLM lock (eg. on the root directory) the AST loss/delay can block the whole cluster for tens of seconds, and the lack of client eviction means this can happen over and over again.
Allowing misbehaving clients to degrade into a "synchronous write" mode without eviction, where the DLM locks can be cancelled arbitrarily by the server without waiting for reply, would allow such flakey clients to remain part of the filesystem (at a reduced performance level) without compromising the stability of well-behaved clients.
"Allowing misbehaving clients to degrade into a "synchronous write" mode without eviction, where the DLM locks can be cancelled arbitrarily by the server without waiting for reply, would allow such flakey clients to remain part of the filesystem (at a reduced performance level) without compromising the stability of well-behaved clients."
Makes sense - This idea has been sort of bouncing around in a few places. So this would be specifically for those degraded clients... But I think if a client is degraded we'd just have it NOT use DLM locks (on the client, anyway)? So the transition to degraded mode might for example cancel all existing locks? That's perhaps a bit forceful, but... Simple?