Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4509

clio can be stuck in osc_extent_wait

XMLWordPrintable

    • 3
    • 12342

      Assume scenario like this:
      1) thread-1 held an active extent
      2) thread-2 called flush cache, and marked this extent as "urgent" and "sync_wait"
      3) thread-3 wants to write to the same extent, osc_extent_find will get "conflict" because this extent is "sync_wait", so it starts to wait...
      4) cl_writeback_work has been scheduled by thread-4 to write some other extents, it has sent RPCs but not returned yet.
      5) thread-1 finished his work, and called osc_extent_release()> osc_io_unplug_async()>ptlrpcd_queue_work(), but found cl_writeback_work is still running, so it's ignored (-EBUSY)
      6) thread-3 is stuck because nobody will wake him up.

      This is an issue I hit while testing DAOS, but it should be addressed in common code.

            bobijam Zhenyu Xu
            liang Liang Zhen (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: