Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18077

Do not resend cancel requests over replay boundary

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.16.0
    • None
    • 3
    • 9223372036854775807

    Description

      Currently we drop all unused or cancelled locks when doing lock replay on the client which makes sense - why let server know about locks we want to get rid of anyway.

      But whatever cancel RPCs we wanted to send still stay in the outgoing queue because unlike the early days ,they no longer have no_resend flag.

      This leads to a paradoxal situation where post replay we have no locks, but potentially a long list of cancel RPCs for non-existing locks. Nornally that would not pose a big problem, but sometimes when the count is exceptionally big the cancel thread on the server might be overloaded (see LU-18072) so it makes sense to ensure we kill those RPCs.

      Currently it's not entirely convenient to do: import generation does not change and connection counter changes even on simple reconnects without replay, so we need to add some sort of a "no_resend_replay" flag and then discard all requests with it set when we enter some suitable connection phase that indicates we had replay (say replay locks?)

      Pm the other hand we don't want to iterate a potentially big list of RPCs like this and having a separate counter similar to import generation might be a better implementaion here so all the locks that cross this replay boundary would die on their own?

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: