Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
None
-
3
-
9223372036854775807
Description
MDS gets OOM, at this time it has 36M locks granted and SLV == 1 and is getting tons of cancel RPCs with 1 lock handle in each.
one possibility of sending lock 1-by-1 is.
LU-16285 ldlm: send the cancel RPC asap
even after fixing it with:
LU-16285 ldlm: BL_AST lock cancel still can be batched
it is still possible for CANCELLING locks - i.e. those which are taken by another thread for cancelling but an RPC is not formed/sent yet, in which case a separate cancel (with just 1 lock handle) is sent.
how could it happen that we have so many BLAST for locks which are already in a process of being cancelled? the client activity is still not clear i full but theoretically it is possible that in a low mem condition on server it starts massively reclaiming the locks, i.e. sending out many BLAST RPCs. in addition with a small SLV, it may result in a cancel RPC with 1K locks being prepared (it may take some time due to data flush) and 1K BLAST RPCs for the same set of locks which results in 1K separate cancel RPCs.
let's try to get it well optimised so that even CANCELLING lock still could be batched.
Attachments
Issue Links
- is related to
-
LU-18881 MDT overwhelmed by lock cancel requests
-
- Open
-