Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-19956

osc: fix page lifecycle race in osc_completion

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Medium
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      cl_batch_put() ASSERTION(page->cp_state == CPS_FREEING) fails in the
      write completion path (brw_interpret -> osc_extent_finish ->
      osc_completion -> cl_page_put -> cl_page_batch_put).

      Root cause: osc_completion() cleared ops_transfer_pinned directly
      (bypassing osc_page_transfer_put()) before cl_page_complete() but
      deferred the actual cl_page_put() until after. This split the
      flag-clear from the ref-drop, creating a race window:

      1. osc_completion() clears ops_transfer_pinned directly
      2. cl_page_complete() sets CPS_CACHED, calls end_page_writeback()
      3. Page is now reclaimable. Another CPU enters do_release_page() ->
      cl_page_delete() -> osc_page_delete() -> osc_page_transfer_put()
      4. osc_page_transfer_put() sees flag already 0, skips cl_page_put()
      5. vvp_page_delete() drops cache ref
      6. Back on CPU A, cl_page_put() drops transfer pin – now the last ref
      7. cl_batch_put() fires. On weakly-ordered architectures (aarch64),
      the CPS_FREEING store from step 3 may not yet be visible -> LBUG

      The root cause was directly manipulating ops_transfer_pinned instead
      of using the osc_page_transfer_put() accessor which keeps the flag
      and the cl_page reference in sync.

      Fix: Do not clear ops_transfer_pinned directly. Call
      osc_page_transfer_put() after cl_page_complete(), which clears the
      flag and drops the ref together. The transfer pin reference also
      keeps cl_page_in_use() returning true, which prevents concurrent
      reclaim until the ref is dropped.

      Also add documentation on ops_transfer_pinned warning that it must
      only be managed through osc_page_transfer_get/put accessors, since
      the flag is paired with a cl_page reference.

      Confirmed via vmcore analysis from an aarch64 system (256 CPUs):
      cl_page had cp_state=CPS_FREEING at dump time (set by a concurrent
      thread AFTER the assertion fired), cp_ref=0, and vmpage PG_private
      already clear (vvp_page_delete completed on another CPU).

      Component: osc

      Attachments

        Activity

          People

            wc-triage WC Triage
            paf0186 Patrick Farrell
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: