Details
-
Bug
-
Resolution: Unresolved
-
Medium
-
None
-
None
-
3
-
9223372036854775807
Description
osc_init_grant() does not zero cl_lost_grant when computing available grant on reconnect. This causes the client to report a grant total that exceeds the server-authorised amount.
*How it happens:*
During an eviction and reconnect cycle, dirty pages that fail to flush are accounted via osc_free_grant(): the grant moves from cl_dirty_grant into cl_lost_grant. When osc_reconnect() fires, it zeroes cl_lost_grant and reports the current dirty+reserved totals to the server in the CONNECT RPC. However, if more RPCs fail between osc_reconnect() and the subsequent IMP_EVENT_OCD (which calls osc_init_grant()), cl_lost_grant accumulates again.
osc_init_grant() then sets:
cl_avail_grant = ocd_grant - cl_dirty_grant - cl_reserved_grant
But it does not zero cl_lost_grant. The already-drained grants are double-counted: they reduced cl_dirty_grant (so cl_avail_grant gains that space), while also remaining in cl_lost_grant. The client's view of total grant becomes:
avail + dirty + reserved + lost > ocd_grant
*Fix:*
In osc_init_grant(), zero cl_lost_grant after computing cl_avail_grant. The lost grants from the old connection were either reported to the server in osc_reconnect() or discarded; they must not carry over into the new connection's accounting.
Affected function: osc_init_grant() in lustre/osc/osc_request.c
*Discovery:*
Found via a TLA+ formal model of the OSC grant eviction and reconnect protocol. The model checker (TLC) produced a 23-state counterexample demonstrating the inflation path.
Attachments
Issue Links
- is duplicated by
-
LU-19977 osc: osc_init_grant ignores cl_lost_grant on reconnect, causing grant inflation
-
- Closed
-