Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-90

Simplify cl_lock

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Duplicate
    • Minor
    • Lustre 2.1.0
    • Lustre 2.0.0
    • 10405

    Description

      This task is based on a discussion about simplifying cl_lock at Beijing.

      The proposal to simplify cl_lock
      ================================

      We have discussed the scheme to simplify cl_lock many times, but we always
      don't start a paper work to record it. Based on this situation, I'm writing my
      idea to simplify cl_lock.

      1. Problems
      First of all, we all have consensus that the current implementation of cl_lock
      is far too much complex. This is because:

      • cl_lock has two-level caches,
      • a top lock may have several sublocks, and a sub lock may be shared by
        multiple top locks,
      • an IO lock is actually composed by one top lock, and several sublocks,
      • we have to hold the mutex of both top lock and sublocks to finish some
        operations, and finally,
      • both llite and osc can initiate an operation to update the lock.

      The above difficutlies make cl_lock be hard to understand, deadlock prone and
      out of maintance. It also affects performance because we have to grab unknown #
      of mutexes to finish an operation. And more, we have to invent the
      cl_lock_closure to address the deadlock issue.

      Life would be a bit eaiser if we can revise the lock modal to mitigate those
      issues.

      2 Scheme
      Here is my proposal to fix this problem:

      • remove the top level cache, that is to say, make it be a pass through cache
      • revise the bottom to top lock operations, so that it can have only top to
        bottom operations

      If we can reach the above targets, we can simplify the lock modal a lot,because:

      • No deadlock concerns any more, we can then remove cl_lock_closure;
      • The # of mutexes to be held to finish an operation is determined,
        the # is (N + 1) at most, N = stripe count;
      • we can remove hundred lines of code related with cl_lock;
      • code will become easy to understand.

      2.1 remove top level cache
      After removing top level cache, each IO has to request new locks
      unconditionally. This can be done to have a new enqueue bit in cl_lock_descr,
      says CEF_NOCACHE. After an IO is done, we will cancel and delete these top
      locks voluntarily. Based on the fact that when we're doing IO, we have to hold
      the user count(->cll_users) to prevent the sublock from being cancelled, this
      has a benefit that if the sublock is able to be canceled, it must not have any
      top lock stacked upon.

      2.2 remove bottom-to-top lock callback methods
      Currently, we have the following operations initiating from osc:

      • ->clo_weigh: this operation is used to determine which locks can be early
        canceled. in ccc_lock_weigh, it checks if the object has mmap
        regions, if this is true, we aren't keen on canceling this
        lock. We can invent a new mechanism to address this issue.
        For example, we can have an ->cll_points field in cl_lock.
        And if the lock is from mmap region, we can assign it a higher
        point. Then, early cancel logic scans namespaces, and
        decreases the point of each lock by one, if the point reaches
        zero, it can be canceled. Also, we can make a good LRU
        algorithm based on this scheme.
      • ->clo_closure: now that there is no deadlock concerns, we don't need this
        method
      • ->clo_modify: since the top lock can't be cached, there is no point in
        modifying the descr of top lock
      • ->clo_delete: we can make absolutely sure that when the lock is being
        deleted, there's no top lock stacked upon.
      • ->clo_state: we still need this method because we need a way to notify top
        locks that a sublock is able to be operated.
        However, we can implement this function in a new way: we've
        already maintained a parent lock list at lovsub_lock, the list
        is a private data to sublock, so that we can access this list
        under the protection of sublock's mutex. We can then access
        each parent lock in the list, and just wake up the processes
        by calling wakeup(parent->cll_wq). We don't need to hold
        the parent's mutex at all.
        2.3 other updates
        We need to revise some code in lov_lock.c to overcome the lack of notification
        from bottom to top. This seems to not be difficult.

      2.4 Pros and cons
      In this scheme, we still use most ideas in current implementation, we just
      remove some hard-to-understand code, this makes the modification is under
      control.
      However, the code might be still difficult, a new engineer still need much time
      to grok the cl_lock. This is unavoidable.

      3. Discussions
      3.1 What if we made sublock be pass through cache as well?
      One word: mininal update. The most difficult parts are two-level cache and
      two-direction path to update the lock, we just need to grab the essence. To
      implement this update, we have to rework dlm callbacks, page finding routines.
      This is not worth, because they are working good so far.

      Attachments

        Issue Links

          Activity

            People

              jay Jinshan Xiong (Inactive)
              jay Jinshan Xiong (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: