[LU-90] Simplify cl_lock Created: 21/Feb/11  Updated: 28/May/17  Resolved: 28/May/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.0.0
Fix Version/s: Lustre 2.1.0

Type: Improvement Priority: Minor
Reporter: Jinshan Xiong (Inactive) Assignee: Jinshan Xiong (Inactive)
Resolution: Duplicate Votes: 0
Labels: clio

Issue Links:
Related
is related to LU-3259 cl_lock refactoring Resolved
Rank (Obsolete): 10405

 Description   

This task is based on a discussion about simplifying cl_lock at Beijing.

The proposal to simplify cl_lock
================================

We have discussed the scheme to simplify cl_lock many times, but we always
don't start a paper work to record it. Based on this situation, I'm writing my
idea to simplify cl_lock.

1. Problems
First of all, we all have consensus that the current implementation of cl_lock
is far too much complex. This is because:

  • cl_lock has two-level caches,
  • a top lock may have several sublocks, and a sub lock may be shared by
    multiple top locks,
  • an IO lock is actually composed by one top lock, and several sublocks,
  • we have to hold the mutex of both top lock and sublocks to finish some
    operations, and finally,
  • both llite and osc can initiate an operation to update the lock.

The above difficutlies make cl_lock be hard to understand, deadlock prone and
out of maintance. It also affects performance because we have to grab unknown #
of mutexes to finish an operation. And more, we have to invent the
cl_lock_closure to address the deadlock issue.

Life would be a bit eaiser if we can revise the lock modal to mitigate those
issues.

2 Scheme
Here is my proposal to fix this problem:

  • remove the top level cache, that is to say, make it be a pass through cache
  • revise the bottom to top lock operations, so that it can have only top to
    bottom operations

If we can reach the above targets, we can simplify the lock modal a lot,because:

  • No deadlock concerns any more, we can then remove cl_lock_closure;
  • The # of mutexes to be held to finish an operation is determined,
    the # is (N + 1) at most, N = stripe count;
  • we can remove hundred lines of code related with cl_lock;
  • code will become easy to understand.

2.1 remove top level cache
After removing top level cache, each IO has to request new locks
unconditionally. This can be done to have a new enqueue bit in cl_lock_descr,
says CEF_NOCACHE. After an IO is done, we will cancel and delete these top
locks voluntarily. Based on the fact that when we're doing IO, we have to hold
the user count(->cll_users) to prevent the sublock from being cancelled, this
has a benefit that if the sublock is able to be canceled, it must not have any
top lock stacked upon.

2.2 remove bottom-to-top lock callback methods
Currently, we have the following operations initiating from osc:

  • ->clo_weigh: this operation is used to determine which locks can be early
    canceled. in ccc_lock_weigh, it checks if the object has mmap
    regions, if this is true, we aren't keen on canceling this
    lock. We can invent a new mechanism to address this issue.
    For example, we can have an ->cll_points field in cl_lock.
    And if the lock is from mmap region, we can assign it a higher
    point. Then, early cancel logic scans namespaces, and
    decreases the point of each lock by one, if the point reaches
    zero, it can be canceled. Also, we can make a good LRU
    algorithm based on this scheme.
  • ->clo_closure: now that there is no deadlock concerns, we don't need this
    method
  • ->clo_modify: since the top lock can't be cached, there is no point in
    modifying the descr of top lock
  • ->clo_delete: we can make absolutely sure that when the lock is being
    deleted, there's no top lock stacked upon.
  • ->clo_state: we still need this method because we need a way to notify top
    locks that a sublock is able to be operated.
    However, we can implement this function in a new way: we've
    already maintained a parent lock list at lovsub_lock, the list
    is a private data to sublock, so that we can access this list
    under the protection of sublock's mutex. We can then access
    each parent lock in the list, and just wake up the processes
    by calling wakeup(parent->cll_wq). We don't need to hold
    the parent's mutex at all.
    2.3 other updates
    We need to revise some code in lov_lock.c to overcome the lack of notification
    from bottom to top. This seems to not be difficult.

2.4 Pros and cons
In this scheme, we still use most ideas in current implementation, we just
remove some hard-to-understand code, this makes the modification is under
control.
However, the code might be still difficult, a new engineer still need much time
to grok the cl_lock. This is unavoidable.

3. Discussions
3.1 What if we made sublock be pass through cache as well?
One word: mininal update. The most difficult parts are two-level cache and
two-direction path to update the lock, we just need to grab the essence. To
implement this update, we have to rework dlm callbacks, page finding routines.
This is not worth, because they are working good so far.



 Comments   
Comment by Jinshan Xiong (Inactive) [ 21/Feb/11 ]

SUMMARY OF CHANGES

==================
0. Summary
For sub objects, there is a bit cl_object_header::coh_lock_cacheable
set. When creating a cl_lock, this bit will be checked: if set, it will try to
match locks from cache; otherwise, new lock will be created.

1.Retained methods of cl_lock_operations

  • ->clo_enqueue: Yes
  • ->clo_wait: Yes
  • ->clo_unuse: No
  • ->clo_use: No
  • ->clo_fits_into: Yes
  • ->clo_cancel: Yes, but the semantcis will be changed to the opposite
    operation of enqueue, in history enqueue and cancel are mutual
    opposite operations.
  • ->clo_closure: No
  • ->clo_modify: No
  • ->clo_delete: Yes
  • ->clo_fini: Yes
  • ->clo_state: Yes
  • ->clo_weigh: No

2. enqueue and cancel
The sematics of cl_lock_cancel is changed to unref the ldlm lock. ldlm
lock may be cancelled anytime after cl_lock_cancel is called.
Besides enqueuing new ldlm locks, cached ldlm lock will be handled in
clo_enqueue. In osc_lock_enqueue, for cl_lock in CLS_CACHE state, it will call
ldlm_lock_addref_try to add a refcount.
When a lock is being held, i.e. cll_holds is not zero, it cannot be
cancelled. When unholding a lock with cl_lock_unhold, the following policy
will be applied:

       if (--lock->cll_holds == 0) {

                cl_lock_cancel();
                if (lock is uncacheable)
                        destroy it;
                else if (lock is in CLS_HELD)
                        change the state to CLS_CACHE;
                else
                        destroy it;
        }

3. No cl_use/unuse, cl_lock_release, etc any more
4. Other changes
To suppoer cl_lock_peek, a special enq flag, CEF_PEEK will be defined. If
cl_lock_request is called with this flag, no new sublocks will be created -
only cached sublocks will be used.

Comment by James A Simmons [ 12/Dec/14 ]

Same as LU-3259. Needs to be closed.

Generated at Sat Feb 10 01:03:39 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.