<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:03:39 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-90] Simplify cl_lock</title>
                <link>https://jira.whamcloud.com/browse/LU-90</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;This task is based on a discussion about simplifying cl_lock at Beijing.&lt;/p&gt;

&lt;p&gt;The proposal to simplify cl_lock&lt;br/&gt;
================================&lt;/p&gt;

&lt;p&gt;We have discussed the scheme to simplify cl_lock many times, but we always&lt;br/&gt;
don&apos;t start a paper work to record it. Based on this situation, I&apos;m writing my&lt;br/&gt;
idea to simplify cl_lock.&lt;/p&gt;

&lt;p&gt;1. Problems&lt;br/&gt;
First of all, we all have consensus that the current implementation of cl_lock&lt;br/&gt;
is far too much complex. This is because:&lt;/p&gt;

&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;cl_lock has two-level caches,&lt;/li&gt;
	&lt;li&gt;a top lock may have several sublocks, and a sub lock may be shared by&lt;br/&gt;
  multiple top locks,&lt;/li&gt;
	&lt;li&gt;an IO lock is actually composed by one top lock, and several sublocks,&lt;/li&gt;
	&lt;li&gt;we have to hold the mutex of both top lock and sublocks to finish some&lt;br/&gt;
  operations, and finally,&lt;/li&gt;
	&lt;li&gt;both llite and osc can initiate an operation to update the lock.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;The above difficutlies make cl_lock be hard to understand, deadlock prone and&lt;br/&gt;
out of maintance. It also affects performance because we have to grab unknown #&lt;br/&gt;
of mutexes to finish an operation. And more, we have to invent the&lt;br/&gt;
cl_lock_closure to address the deadlock issue.&lt;/p&gt;

&lt;p&gt;Life would be a bit eaiser if we can revise the lock modal to mitigate those&lt;br/&gt;
issues.&lt;/p&gt;

&lt;p&gt;2 Scheme&lt;br/&gt;
Here is my proposal to fix this problem:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;remove the top level cache, that is to say, make it be a pass through cache&lt;/li&gt;
	&lt;li&gt;revise the bottom to top lock operations, so that it can have only top to&lt;br/&gt;
  bottom operations&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;If we can reach the above targets, we can simplify the lock modal a lot,because:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;No deadlock concerns any more, we can then remove cl_lock_closure;&lt;/li&gt;
	&lt;li&gt;The # of mutexes to be held to finish an operation is determined,&lt;br/&gt;
  the # is (N + 1) at most, N = stripe count;&lt;/li&gt;
	&lt;li&gt;we can remove hundred lines of code related with cl_lock;&lt;/li&gt;
	&lt;li&gt;code will become easy to understand.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;2.1 remove top level cache&lt;br/&gt;
After removing top level cache, each IO has to request new locks&lt;br/&gt;
unconditionally. This can be done to have a new enqueue bit in cl_lock_descr,&lt;br/&gt;
says CEF_NOCACHE. After an IO is done, we will cancel and delete these top&lt;br/&gt;
locks voluntarily. Based on the fact that when we&apos;re doing IO, we have to hold&lt;br/&gt;
the user count(-&amp;gt;cll_users) to prevent the sublock from being cancelled, this&lt;br/&gt;
has a benefit that if the sublock is able to be canceled, it must not have any&lt;br/&gt;
top lock stacked upon.&lt;/p&gt;

&lt;p&gt;2.2 remove bottom-to-top lock callback methods&lt;br/&gt;
Currently, we have the following operations initiating from osc:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;-&amp;gt;clo_weigh: this operation is used to determine which locks can be early&lt;br/&gt;
                  canceled. in ccc_lock_weigh, it checks if the object has mmap&lt;br/&gt;
                  regions, if this is true, we aren&apos;t keen on canceling this&lt;br/&gt;
                  lock. We can invent a new mechanism to address this issue.&lt;br/&gt;
                  For example, we can have an -&amp;gt;cll_points field in cl_lock.&lt;br/&gt;
                  And if the lock is from mmap region, we can assign it a higher&lt;br/&gt;
                  point. Then, early cancel logic scans namespaces, and&lt;br/&gt;
                  decreases the point of each lock by one, if the point reaches&lt;br/&gt;
                  zero, it can be canceled. Also, we can make a good LRU&lt;br/&gt;
                  algorithm based on this scheme.&lt;/li&gt;
	&lt;li&gt;-&amp;gt;clo_closure: now that there is no deadlock concerns, we don&apos;t need this&lt;br/&gt;
                  method&lt;/li&gt;
	&lt;li&gt;-&amp;gt;clo_modify: since the top lock can&apos;t be cached, there is no point in&lt;br/&gt;
                  modifying the descr of top lock&lt;/li&gt;
	&lt;li&gt;-&amp;gt;clo_delete: we can make absolutely sure that when the lock is being&lt;br/&gt;
                  deleted, there&apos;s no top lock stacked upon.&lt;/li&gt;
	&lt;li&gt;-&amp;gt;clo_state: we still need this method because we need a way to notify top&lt;br/&gt;
                  locks that a sublock is able to be operated.&lt;br/&gt;
                  However, we can implement this function in a new way: we&apos;ve&lt;br/&gt;
                  already maintained a parent lock list at lovsub_lock, the list&lt;br/&gt;
                  is a private data to sublock, so that we can access this list&lt;br/&gt;
                  under the protection of sublock&apos;s mutex. We can then access&lt;br/&gt;
                  each parent lock in the list, and just wake up the processes&lt;br/&gt;
                  by calling wakeup(parent-&amp;gt;cll_wq). We don&apos;t need to hold&lt;br/&gt;
                  the parent&apos;s mutex at all.&lt;br/&gt;
2.3 other updates&lt;br/&gt;
We need to revise some code in lov_lock.c to overcome the lack of notification&lt;br/&gt;
from bottom to top. This seems to not be difficult.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;2.4 Pros and cons&lt;br/&gt;
In this scheme, we still use most ideas in current implementation, we just&lt;br/&gt;
remove some hard-to-understand code, this makes the modification is under&lt;br/&gt;
control.&lt;br/&gt;
However, the code might be still difficult, a new engineer still need much time&lt;br/&gt;
to grok the cl_lock. This is unavoidable.&lt;/p&gt;

&lt;p&gt;3. Discussions&lt;br/&gt;
3.1 What if we made sublock be pass through cache as well?&lt;br/&gt;
One word: mininal update. The most difficult parts are two-level cache and&lt;br/&gt;
two-direction path to update the lock, we just need to grab the essence. To&lt;br/&gt;
implement this update, we have to rework dlm callbacks, page finding routines.&lt;br/&gt;
This is not worth, because they are working good so far.&lt;/p&gt;</description>
                <environment></environment>
        <key id="10387">LU-90</key>
            <summary>Simplify cl_lock</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="jay">Jinshan Xiong</assignee>
                                    <reporter username="jay">Jinshan Xiong</reporter>
                        <labels>
                            <label>clio</label>
                    </labels>
                <created>Mon, 21 Feb 2011 21:59:03 +0000</created>
                <updated>Sun, 28 May 2017 06:36:34 +0000</updated>
                            <resolved>Sun, 28 May 2017 06:36:34 +0000</resolved>
                                    <version>Lustre 2.0.0</version>
                                    <fixVersion>Lustre 2.1.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>2</watches>
                                                                            <comments>
                            <comment id="10706" author="jay" created="Mon, 21 Feb 2011 21:59:44 +0000"  >&lt;h3&gt;&lt;a name=&quot;SUMMARYOFCHANGES&quot;&gt;&lt;/a&gt;&lt;b&gt;SUMMARY OF CHANGES&lt;/b&gt;&lt;/h3&gt;
&lt;p&gt;==================&lt;br/&gt;
&lt;b&gt;0. Summary&lt;/b&gt;&lt;br/&gt;
        For sub objects, there is a bit cl_object_header::coh_lock_cacheable&lt;br/&gt;
  set. When creating a cl_lock, this bit will be checked: if set, it will try to&lt;br/&gt;
  match locks from cache; otherwise, new lock will be created.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;1.Retained methods of cl_lock_operations&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;-&amp;gt;clo_enqueue:  Yes&lt;/li&gt;
	&lt;li&gt;-&amp;gt;clo_wait:     Yes&lt;/li&gt;
	&lt;li&gt;-&amp;gt;clo_unuse:    No&lt;/li&gt;
	&lt;li&gt;-&amp;gt;clo_use:      No&lt;/li&gt;
	&lt;li&gt;-&amp;gt;clo_fits_into: Yes&lt;/li&gt;
	&lt;li&gt;-&amp;gt;clo_cancel:   Yes, but the semantcis will be changed to the opposite&lt;br/&gt;
                operation of enqueue, in history enqueue and cancel are mutual&lt;br/&gt;
                opposite operations.&lt;/li&gt;
	&lt;li&gt;-&amp;gt;clo_closure:  No&lt;/li&gt;
	&lt;li&gt;-&amp;gt;clo_modify:   No&lt;/li&gt;
	&lt;li&gt;-&amp;gt;clo_delete:   Yes&lt;/li&gt;
	&lt;li&gt;-&amp;gt;clo_fini:     Yes&lt;/li&gt;
	&lt;li&gt;-&amp;gt;clo_state:    Yes&lt;/li&gt;
	&lt;li&gt;-&amp;gt;clo_weigh:    No&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;&lt;b&gt;2. enqueue and cancel&lt;/b&gt;&lt;br/&gt;
        The sematics of cl_lock_cancel is changed to unref the ldlm lock. ldlm&lt;br/&gt;
  lock may be cancelled anytime after cl_lock_cancel is called.&lt;br/&gt;
        Besides enqueuing new ldlm locks, cached ldlm lock will be handled in&lt;br/&gt;
  clo_enqueue. In osc_lock_enqueue, for cl_lock in CLS_CACHE state, it will call&lt;br/&gt;
  ldlm_lock_addref_try to add a refcount.&lt;br/&gt;
        When a lock is being held, i.e. cll_holds is not zero, it cannot be&lt;br/&gt;
  cancelled. When unholding a lock with cl_lock_unhold, the following policy&lt;br/&gt;
  will be applied:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;       &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (--lock-&amp;gt;cll_holds == 0) {

                cl_lock_cancel();
                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (lock is uncacheable)
                        destroy it;
                &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (lock is in CLS_HELD)
                        change the state to CLS_CACHE;
                &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt;
                        destroy it;
        }
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;b&gt;3. No cl_use/unuse, cl_lock_release, etc any more&lt;/b&gt;&lt;br/&gt;
&lt;b&gt;4. Other changes&lt;/b&gt;&lt;br/&gt;
  To suppoer cl_lock_peek, a special enq flag, CEF_PEEK will be defined. If&lt;br/&gt;
  cl_lock_request is called with this flag, no new sublocks will be created -&lt;br/&gt;
  only cached sublocks will be used.&lt;/p&gt;</comment>
                            <comment id="101475" author="simmonsja" created="Fri, 12 Dec 2014 17:22:09 +0000"  >&lt;p&gt;Same as &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3259&quot; title=&quot;cl_lock refactoring&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3259&quot;&gt;&lt;del&gt;LU-3259&lt;/del&gt;&lt;/a&gt;. Needs to be closed.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="18666">LU-3259</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzw1zz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>10405</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>