<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:07:25 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-7266] Fix LDLM pool to make LRUR working properly</title>
                <link>https://jira.whamcloud.com/browse/LU-7266</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Current LDLM pool doesn&apos;t work as expected, it leads to server sometimes can be overburdened by too many cached LDLM locks (observed in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6529&quot; title=&quot;Server side lock limits to avoid unnecessary memory exhaustion&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6529&quot;&gt;&lt;del&gt;LU-6529&lt;/del&gt;&lt;/a&gt;),  to fix the LDLM pool and make the LRUR working properly, I think following particular issues need be addressed:&lt;/p&gt;

&lt;p&gt;1. No hard limit for the server lock count;&lt;/p&gt;

&lt;p&gt;There are always exceptions (all locks are actively used on client, client failure, network lag, etc) can make the client unable to cancel lock in time, so a hard limit for the server lock count is crucial to make sure the server not being overburdened by LDLM lock at any time. When the server lock count exceeds the hard limit, server should reject any incoming lock enqueue request (Let client retry on -EINPROGRESS) until the lock count shrink back to a safe zone.&lt;/p&gt;

&lt;p&gt;This issue has been addressed by &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6529&quot; title=&quot;Server side lock limits to avoid unnecessary memory exhaustion&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6529&quot;&gt;&lt;del&gt;LU-6529&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;2. Lock load can&apos;t be balanced between the namespaces on same server;&lt;/p&gt;

&lt;p&gt;Current LDLM pool divide the memory quota equally for each namespace (MDTs, OSTs) on the same host, that could result in lots of memory is reserved by the idle namespaces but not used.&lt;/p&gt;

&lt;p&gt;I think we&apos;d leverage the global lock counter introduced in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6529&quot; title=&quot;Server side lock limits to avoid unnecessary memory exhaustion&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6529&quot;&gt;&lt;del&gt;LU-6529&lt;/del&gt;&lt;/a&gt; to address this problem.&lt;/p&gt;


&lt;p&gt;3. Client needs to cancel lock more aggressively;&lt;/p&gt;

&lt;p&gt;According to the SLV/CLV formula, server lock count won&apos;t be decreased even if it has already exceeded the pool_limit (which is 25% total memory by default), a simulate program shows that server lock count only decreases after the lock consumed more than 32% of total memory, the number is calculated with the assumption that all locks on client are unused and client can always cancel lock instantly. I think the number will be larger in real world.&lt;/p&gt;

&lt;p&gt;As a workaround, I think this problem could be addressed by tweaking the LDLM pool parameters (such as pool_limit, lock_volume_factor, etc.).&lt;/p&gt;

&lt;p&gt;In a longer term solution, I think we&apos;d get rid of the complexity of SLV recalculation, instead, just notify client directly with an estimated CLV when server is aware of memory pressure (or when server decide to reclaim some memory)&lt;/p&gt;

&lt;p&gt;4. Current server pool shrinker is barely functional;&lt;/p&gt;

&lt;p&gt;Current LDLM server pool shrinker decreases only a small amount of SLV , that&apos;s not enough to trigger lock cancel on client, to make it worse, the decreased SLV could be overwritten by SLV recalculation thread before it&apos;s carried back to client by some random RPC.&lt;/p&gt;

&lt;p&gt;As I mentioned in the long term solution of 3rd item, I think server pool shrinker should just notify client with an estimated CLV initiatively, that&apos;s simpler and more reliable.&lt;/p&gt;

&lt;p&gt;5. Improve LRU algorithm&lt;/p&gt;

&lt;p&gt;Using strict LRU to replace cached locks is sub-optimal due to cache thrashing and removal of valuable locks.  A better algorithm like LFRU or ARC would improve lock cache reuse and value.  (&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11509&quot; title=&quot;LDLM: replace lock LRU with improved cache algorithm&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11509&quot;&gt;LU-11509&lt;/a&gt;)&lt;/p&gt;</description>
                <environment></environment>
        <key id="32543">LU-7266</key>
            <summary>Fix LDLM pool to make LRUR working properly</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="niu">Niu Yawei</reporter>
                        <labels>
                    </labels>
                <created>Thu, 8 Oct 2015 08:05:02 +0000</created>
                <updated>Sun, 21 Jan 2024 19:15:00 +0000</updated>
                                                                                <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="13265">LU-1128</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="25048">LU-5152</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="29728">LU-6529</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="53583">LU-11509</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="62004">LU-14221</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="63324">LU-14517</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="37942" name="lru-resize-dld.lyx" size="65250" author="adilger" created="Fri, 12 Mar 2021 21:09:06 +0000"/>
                            <attachment id="37941" name="lru-resize-hld.lyx" size="23645" author="adilger" created="Fri, 12 Mar 2021 21:08:46 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzxpvb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>