Details
-
Improvement
-
Resolution: Duplicate
-
Minor
-
Lustre 2.0.0
-
10405
Description
This task is based on a discussion about simplifying cl_lock at Beijing.
The proposal to simplify cl_lock
================================
We have discussed the scheme to simplify cl_lock many times, but we always
don't start a paper work to record it. Based on this situation, I'm writing my
idea to simplify cl_lock.
1. Problems
First of all, we all have consensus that the current implementation of cl_lock
is far too much complex. This is because:
- cl_lock has two-level caches,
- a top lock may have several sublocks, and a sub lock may be shared by
multiple top locks, - an IO lock is actually composed by one top lock, and several sublocks,
- we have to hold the mutex of both top lock and sublocks to finish some
operations, and finally, - both llite and osc can initiate an operation to update the lock.
The above difficutlies make cl_lock be hard to understand, deadlock prone and
out of maintance. It also affects performance because we have to grab unknown #
of mutexes to finish an operation. And more, we have to invent the
cl_lock_closure to address the deadlock issue.
Life would be a bit eaiser if we can revise the lock modal to mitigate those
issues.
2 Scheme
Here is my proposal to fix this problem:
- remove the top level cache, that is to say, make it be a pass through cache
- revise the bottom to top lock operations, so that it can have only top to
bottom operations
If we can reach the above targets, we can simplify the lock modal a lot,because:
- No deadlock concerns any more, we can then remove cl_lock_closure;
- The # of mutexes to be held to finish an operation is determined,
the # is (N + 1) at most, N = stripe count; - we can remove hundred lines of code related with cl_lock;
- code will become easy to understand.
2.1 remove top level cache
After removing top level cache, each IO has to request new locks
unconditionally. This can be done to have a new enqueue bit in cl_lock_descr,
says CEF_NOCACHE. After an IO is done, we will cancel and delete these top
locks voluntarily. Based on the fact that when we're doing IO, we have to hold
the user count(->cll_users) to prevent the sublock from being cancelled, this
has a benefit that if the sublock is able to be canceled, it must not have any
top lock stacked upon.
2.2 remove bottom-to-top lock callback methods
Currently, we have the following operations initiating from osc:
- ->clo_weigh: this operation is used to determine which locks can be early
canceled. in ccc_lock_weigh, it checks if the object has mmap
regions, if this is true, we aren't keen on canceling this
lock. We can invent a new mechanism to address this issue.
For example, we can have an ->cll_points field in cl_lock.
And if the lock is from mmap region, we can assign it a higher
point. Then, early cancel logic scans namespaces, and
decreases the point of each lock by one, if the point reaches
zero, it can be canceled. Also, we can make a good LRU
algorithm based on this scheme. - ->clo_closure: now that there is no deadlock concerns, we don't need this
method - ->clo_modify: since the top lock can't be cached, there is no point in
modifying the descr of top lock - ->clo_delete: we can make absolutely sure that when the lock is being
deleted, there's no top lock stacked upon. - ->clo_state: we still need this method because we need a way to notify top
locks that a sublock is able to be operated.
However, we can implement this function in a new way: we've
already maintained a parent lock list at lovsub_lock, the list
is a private data to sublock, so that we can access this list
under the protection of sublock's mutex. We can then access
each parent lock in the list, and just wake up the processes
by calling wakeup(parent->cll_wq). We don't need to hold
the parent's mutex at all.
2.3 other updates
We need to revise some code in lov_lock.c to overcome the lack of notification
from bottom to top. This seems to not be difficult.
2.4 Pros and cons
In this scheme, we still use most ideas in current implementation, we just
remove some hard-to-understand code, this makes the modification is under
control.
However, the code might be still difficult, a new engineer still need much time
to grok the cl_lock. This is unavoidable.
3. Discussions
3.1 What if we made sublock be pass through cache as well?
One word: mininal update. The most difficult parts are two-level cache and
two-direction path to update the lock, we just need to grab the essence. To
implement this update, we have to rework dlm callbacks, page finding routines.
This is not worth, because they are working good so far.
Attachments
Issue Links
- is related to
-
LU-3259 cl_lock refactoring
- Resolved