Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.16.0
-
None
-
3
-
9223372036854775807
Description
Currently we drop all unused or cancelled locks when doing lock replay on the client which makes sense - why let server know about locks we want to get rid of anyway.
But whatever cancel RPCs we wanted to send still stay in the outgoing queue because unlike the early days ,they no longer have no_resend flag.
This leads to a paradoxal situation where post replay we have no locks, but potentially a long list of cancel RPCs for non-existing locks. Nornally that would not pose a big problem, but sometimes when the count is exceptionally big the cancel thread on the server might be overloaded (see LU-18072) so it makes sense to ensure we kill those RPCs.
Currently it's not entirely convenient to do: import generation does not change and connection counter changes even on simple reconnects without replay, so we need to add some sort of a "no_resend_replay" flag and then discard all requests with it set when we enter some suitable connection phase that indicates we had replay (say replay locks?)
Pm the other hand we don't want to iterate a potentially big list of RPCs like this and having a separate counter similar to import generation might be a better implementaion here so all the locks that cross this replay boundary would die on their own?