Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4509

clio can be stuck in osc_extent_wait

    XMLWordPrintable

Details

    • 3
    • 12342

    Description

      Assume scenario like this:
      1) thread-1 held an active extent
      2) thread-2 called flush cache, and marked this extent as "urgent" and "sync_wait"
      3) thread-3 wants to write to the same extent, osc_extent_find will get "conflict" because this extent is "sync_wait", so it starts to wait...
      4) cl_writeback_work has been scheduled by thread-4 to write some other extents, it has sent RPCs but not returned yet.
      5) thread-1 finished his work, and called osc_extent_release()> osc_io_unplug_async()>ptlrpcd_queue_work(), but found cl_writeback_work is still running, so it's ignored (-EBUSY)
      6) thread-3 is stuck because nobody will wake him up.

      This is an issue I hit while testing DAOS, but it should be addressed in common code.

      Attachments

        Issue Links

          Activity

            People

              bobijam Zhenyu Xu
              liang Liang Zhen (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: