<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:12:07 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-14711] Canceling lock with a lot of cached data can take a lot of time</title>
                <link>https://jira.whamcloud.com/browse/LU-14711</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;On clients with large amounts of RAM it&apos;s possible to have large thinly-striped files to have a single object with a lot of pages cached.&lt;/p&gt;

&lt;p&gt;When such a lock is then canceled iterating over all of those pages takes a long time during which three are no RPCs to be sent (e.g. because we are truncating the lock or if the lock is PR).&lt;/p&gt;

&lt;p&gt;Here&apos;s a simple testcase I have&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;lfs setstripe /mnt/lustre -c 2
dd if=/dev/zero of=/mnt/lustre/testfile1 bs=4096k count=1
dd if=/dev/zero of=/mnt/lustre/testfile2 bs=4096k count=800
mv /mnt/lustre/testfile1 /mnt/lustre/testfile2&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Now the the destroy for the 3.2G file causes every of both stripes to be destroyed and according to the logs even at default log level the process takes 4.7s, so if the file was 30x bigger (100G) we&apos;d already spend 141 second just iterating over pages on this particular machine.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00010000:00010000:0.0:1622008589.369887:0:5816:0:(ldlm_request.c:1150:ldlm_cli_cancel_local()) ### client-side cancel ns: lustre-OST0001-osc-ffff880316ae0800 lock: ffff88039a18cd80/0xfe254c0b2e6873ba lrc: 3/0,0 mode: PW/PW res: [0x19:0x0:0x0].0x0 rrc: 2 type: EXT [0-&amp;gt;18446744073709551615] (req 0-&amp;gt;1048575) flags: 0x428400010000 nid: local remote: 0xfe254c0b2e6873c1 expref: -99 pid: 11550 timeout: 0 lvb_type: 1
00000080:00200000:0.0:1622008589.369896:0:5816:0:(vvp_io.c:1717:vvp_io_init()) [0x200000401:0x18:0x0] ignore/verify layout 1/0, layout version 0 restore needed 0
00000080:00200000:0.0:1622008594.161234:0:5816:0:(vvp_io.c:313:vvp_io_fini()) [0x200000401:0x18:0x0] ignore/verify layout 1/0, layout version 0 need write layout 0, restore needed 0
00010000:00010000:0.0:1622008594.161266:0:5816:0:(ldlm_request.c:1209:ldlm_cancel_pack()) ### packing ns: lustre-OST0001-osc-ffff880316ae0800 lock: ffff88039a18cd80/0xfe254c0b2e6873ba lrc: 2/0,0 mode: --/PW res: [0x19:0x0:0x0].0x0 rrc: 2 type: EXT [0-&amp;gt;18446744073709551615] (req 0-&amp;gt;1048575) flags: 0x4c69400010000 nid: local remote: 0xfe254c0b2e6873c1 expref: -99 pid: 11550 timeout: 0 lvb_type: 1&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;We need to send something to the server if cancel is taking a long time just to prolong the lock and indicate we are still there. This is not super ideal because of course instant cancel RPC sounds better on the surface but is trickier to implement in all cases but DESTROY where we are sure no more data could be added to the mapping.&lt;/p&gt;</description>
                <environment></environment>
        <key id="64412">LU-14711</key>
            <summary>Canceling lock with a lot of cached data can take a lot of time</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="green">Oleg Drokin</assignee>
                                    <reporter username="green">Oleg Drokin</reporter>
                        <labels>
                    </labels>
                <created>Wed, 26 May 2021 19:24:38 +0000</created>
                <updated>Fri, 20 Jan 2023 21:39:39 +0000</updated>
                            <resolved>Mon, 4 Oct 2021 17:14:52 +0000</resolved>
                                    <version>Lustre 2.15.0</version>
                                    <fixVersion>Lustre 2.15.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>8</watches>
                                                                            <comments>
                            <comment id="302724" author="green" created="Wed, 26 May 2021 19:52:11 +0000"  >&lt;p&gt;tangentially related to speed up processing is &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11290&quot; title=&quot;Batch callbacks in osc_page_gang_lookup&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11290&quot;&gt;&lt;del&gt;LU-11290&lt;/del&gt;&lt;/a&gt; with two patches there (Vitaly quotes 30% processing time improvement with both), but I feel like it does not fully fix the problems since at certain size processing would still be higher than the timeout so we still need to have a way to calm impatient server at the very least.&lt;/p&gt;</comment>
                            <comment id="302729" author="adilger" created="Wed, 26 May 2021 21:35:43 +0000"  >&lt;p&gt;Per earlier discussion, it may be possible that sending a zero-byte read or write to the OST with the cancelling DLM lock handle would be enough to prolong the lock timeout on the OSS, and avoid eviction.&lt;/p&gt;

&lt;p&gt;However, reducing the time that page eviction takes would also be desirable, such as &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11290&quot; title=&quot;Batch callbacks in osc_page_gang_lookup&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11290&quot;&gt;&lt;del&gt;LU-11290&lt;/del&gt;&lt;/a&gt;, and any other optimizations to reduce the per-page overhead, like &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13134&quot; title=&quot;try to use slab allocation for cl_page&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13134&quot;&gt;&lt;del&gt;LU-13134&lt;/del&gt;&lt;/a&gt; which reduces the size/count of allocations per page.&lt;/p&gt;</comment>
                            <comment id="302909" author="green" created="Fri, 28 May 2021 01:44:27 +0000"  >&lt;p&gt;zero sized io sadly does not work so I&apos;ll do 1 byte io with &quot;discard me&quot; flag, old servers not aware of the flag will do io, new servers will discard the io altogether.&lt;/p&gt;

&lt;p&gt;As I am adding a patch here, I just realized that just prolonging the lock from client side is still only a half measure, the client that sent a lock cancel is still going to timeout in 600 seconds (at_max). Though they will resend so at least no evictions, but the chatter in the logs will be substantial. Something to keep in mind.&lt;/p&gt;</comment>
                            <comment id="302921" author="gerrit" created="Fri, 28 May 2021 03:04:35 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/43857&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/43857&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14711&quot; title=&quot;Canceling lock with a lot of cached data can take a lot of time&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14711&quot;&gt;&lt;del&gt;LU-14711&lt;/del&gt;&lt;/a&gt; osc: Notify server if cache discard takes a long time&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 9cd980cb07838ebb9a543870a3a8e998d567ac8c&lt;/p&gt;</comment>
                            <comment id="303063" author="gerrit" created="Sat, 29 May 2021 02:47:00 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/43869&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/43869&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14711&quot; title=&quot;Canceling lock with a lot of cached data can take a lot of time&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14711&quot;&gt;&lt;del&gt;LU-14711&lt;/del&gt;&lt;/a&gt; tests: Test demonstrating eviction during long cache processing&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: e868e6deb017cadf1e75a355fee0639140faf6f8&lt;/p&gt;</comment>
                            <comment id="304441" author="gerrit" created="Mon, 14 Jun 2021 16:43:14 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/43857/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/43857/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14711&quot; title=&quot;Canceling lock with a lot of cached data can take a lot of time&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14711&quot;&gt;&lt;del&gt;LU-14711&lt;/del&gt;&lt;/a&gt; osc: Notify server if cache discard takes a long time&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 564070343ac4ccf4f97843009e1c36f5130ac19c&lt;/p&gt;</comment>
                            <comment id="310121" author="gerrit" created="Fri, 13 Aug 2021 00:52:49 +0000"  >&lt;p&gt;&quot;Andreas Dilger &amp;lt;adilger@whamcloud.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/44654&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/44654&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14711&quot; title=&quot;Canceling lock with a lot of cached data can take a lot of time&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14711&quot;&gt;&lt;del&gt;LU-14711&lt;/del&gt;&lt;/a&gt; osc: Do not attempt sending empty pages&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: d3e9202944c6760b1269dd78d4043699200cbf38&lt;/p&gt;</comment>
                            <comment id="313168" author="gerrit" created="Fri, 17 Sep 2021 14:06:22 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/43869/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/43869/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14711&quot; title=&quot;Canceling lock with a lot of cached data can take a lot of time&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14711&quot;&gt;&lt;del&gt;LU-14711&lt;/del&gt;&lt;/a&gt; tests: Ensure there&apos;s no eviction with long cache discard&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: c0a7f78529e21c9cafa986abea255925b4b41244&lt;/p&gt;</comment>
                            <comment id="314594" author="gerrit" created="Mon, 4 Oct 2021 16:55:46 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/44654/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/44654/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14711&quot; title=&quot;Canceling lock with a lot of cached data can take a lot of time&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14711&quot;&gt;&lt;del&gt;LU-14711&lt;/del&gt;&lt;/a&gt; osc: Do not attempt sending empty pages&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 1a409a3e6a74685970ee779ebe32917bf51eaf3a&lt;/p&gt;</comment>
                            <comment id="314598" author="pjones" created="Mon, 4 Oct 2021 17:14:52 +0000"  >&lt;p&gt;Landed for 2.15&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                                        </outwardlinks>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="65381">LU-14885</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="53139">LU-11290</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="57796">LU-13134</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i01vfr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>