Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-19976

osc_init_grant ignores cl_lost_grant on reconnect, causing grant inflation

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Medium
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      osc_init_grant() does not zero cl_lost_grant when computing available grant on reconnect. This causes the client to report a grant total that exceeds the server-authorised amount.

      *How it happens:*

      During an eviction and reconnect cycle, dirty pages that fail to flush are accounted via osc_free_grant(): the grant moves from cl_dirty_grant into cl_lost_grant. When osc_reconnect() fires, it zeroes cl_lost_grant and reports the current dirty+reserved totals to the server in the CONNECT RPC. However, if more RPCs fail between osc_reconnect() and the subsequent IMP_EVENT_OCD (which calls osc_init_grant()), cl_lost_grant accumulates again.

      osc_init_grant() then sets:
      cl_avail_grant = ocd_grant - cl_dirty_grant - cl_reserved_grant

      But it does not zero cl_lost_grant. The already-drained grants are double-counted: they reduced cl_dirty_grant (so cl_avail_grant gains that space), while also remaining in cl_lost_grant. The client's view of total grant becomes:
      avail + dirty + reserved + lost > ocd_grant

      *Fix:*

      In osc_init_grant(), zero cl_lost_grant after computing cl_avail_grant. The lost grants from the old connection were either reported to the server in osc_reconnect() or discarded; they must not carry over into the new connection's accounting.

      Affected function: osc_init_grant() in lustre/osc/osc_request.c

      *Discovery:*

      Found via a TLA+ formal model of the OSC grant eviction and reconnect protocol. The model checker (TLC) produced a 23-state counterexample demonstrating the inflation path.

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              paf0186 Patrick Farrell
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: