<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:35:53 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-17493] restore LDLM cancel on blocking callback</title>
                <link>https://jira.whamcloud.com/browse/LU-17493</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;In the old days of &lt;a href=&quot;https://en.wikipedia.org/wiki/Red_Storm_(computing)&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;Catamount on ASCI Red&lt;/a&gt; with liblustre running in the Catamount OS that did not have any CPU interrupts.  That meant any server-to-client requests (such as DLM lock cancellations) must be handled asynchronously on the client when the application yielded the processor to filesystem administrative tasks.&lt;/p&gt;

&lt;p&gt;In that environment, the server would immediately assume that a DLM lock was cancelled as soon as the AST was sent on a lock with &lt;tt&gt;LDLM_FL_CANCEL_ON_BLOCK&lt;/tt&gt; set on a lock, rather than waiting for the client to reply to the AST and actually cancel the lock.  This avoided potentially significant delays for servers granting locks.&lt;/p&gt;

&lt;p&gt;In large clusters, there are some locks that are invariably highly contended (e.g. &lt;tt&gt;ROOT/&lt;/tt&gt;, &lt;tt&gt;/home/&lt;/tt&gt; or &lt;tt&gt;/project&lt;/tt&gt; directories, either because many clients are holding a read lock and &lt;em&gt;some&lt;/em&gt; client wants to modify the directory, or because of conflicting workloads (e.g. &quot;&lt;tt&gt;ls -l&lt;/tt&gt;&quot; or &quot;&lt;tt&gt;rm&lt;/tt&gt;&quot; in a directory (tree) that is actively in use by other clients.  If &lt;b&gt;any&lt;/b&gt; client holding a contended lock has a problem, for example &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-17453&quot; title=&quot;Use dget_parent/dput during d_revalidate&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-17453&quot;&gt;LU-17453&lt;/a&gt;/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-17476&quot; title=&quot;lnet: only report mismatched nid in ME if bits match&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-17476&quot;&gt;LU-17476&lt;/a&gt;, then other nodes accessing that lock may block for tens or hundreds of seconds until it is cancelled or the client is evicted.&lt;/p&gt;

&lt;p&gt;It would be useful if &lt;tt&gt;LDLM_FL_CANCEL_ON_BLOCK&lt;/tt&gt; was used for such highly-contended resources when requested with &lt;tt&gt;LCK_PR&lt;/tt&gt; mode, so that the server can send asynchronous ASTs to all clients and then cancel the DLM locks rapidly and perform the required operation without getting blocked by unresponsive clients.  Any responsive client will receive the AST and not even need to send the cancel RPC, while unresponsive clients are already unlikely to know or care whether the server sent the AST, so they will have an inconsistent local state until they again contact the server (as they already do today).&lt;/p&gt;

&lt;p&gt;This could potentially also be tied into &quot;&lt;tt&gt;ls&lt;/tt&gt;&quot; (&lt;tt&gt;readdir()&lt;/tt&gt;) being able to run with &quot;&lt;tt&gt;LDLM_FL_CANCEL_ON_BLOCK&lt;/tt&gt;&quot; locks, or no DLM locks at all on the directory or inodes.  &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3308#comment-58611&quot; class=&quot;external-link&quot; rel=&quot;nofollow&quot;&gt;Per comments in LU-3308&lt;/a&gt;, POSIX does not require &lt;tt&gt;readdir()&lt;/tt&gt; to be fully cache coherent even among processes on the same node, only that the readdir cache is reset with &lt;tt&gt;rewinddir()&lt;/tt&gt; and &lt;tt&gt;close()&lt;/tt&gt;.&lt;/p&gt;</description>
                <environment></environment>
        <key id="80489">LU-17493</key>
            <summary>restore LDLM cancel on blocking callback</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="adilger">Andreas Dilger</reporter>
                        <labels>
                    </labels>
                <created>Thu, 1 Feb 2024 00:28:27 +0000</created>
                <updated>Thu, 1 Feb 2024 00:30:44 +0000</updated>
                                            <version>Lustre 2.14.0</version>
                    <version>Lustre 2.16.0</version>
                    <version>Lustre 2.17.0</version>
                                                        <due></due>
                            <votes>1</votes>
                                    <watches>4</watches>
                                                                                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="18787">LU-3308</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="53583">LU-11509</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i049w7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>