<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:22:23 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-15915] /bin/rm: fts_read failed: Cannot send after transport endpoint shutdown</title>
                <link>https://jira.whamcloud.com/browse/LU-15915</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Am running a large number of deletes on clients and after a while they get evicted, the error on the client is:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
/bin/rm: fts_read failed: Cannot send after transport endpoint shutdown
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;On the MDS, the error is:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
un  6 19:28:59 fmds1 kernel: LustreError: 9744:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 100s: evicting client at 10.21.22.31@tcp  ns: mdt-foxtrot-MDT0000_UUID lock: ffff94f72a408480/0xb4442ee3e798319c lrc: 3/0,0 mode: PR/PR res: [0x20009b3c6:0x29eb:0x0].0x0 bits 0x20/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.21.22.31@tcp remote: 0x40ff70b2e6a5419f expref: 147862 pid: 61992 timeout: 6578337 lvb_type: 0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I&apos;m running maybe 10-15 recursive rm on 3 clients, so 30-45 in total at once. &lt;/p&gt;

&lt;p&gt;I&apos;ve set debugging params as follows:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;lctl set_param debug_mb=1024
lctl set_param debug=&quot;+dlmtrace +info +rpctrace&quot;
lctl set_param dump_on_eviction=1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;on clients and the MDS.&lt;/p&gt;

&lt;p&gt;Lustre version is 2.12.8_6_g5457c37&lt;/p&gt;</description>
                <environment></environment>
        <key id="70643">LU-15915</key>
            <summary>/bin/rm: fts_read failed: Cannot send after transport endpoint shutdown</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="laisiyao">Lai Siyao</assignee>
                                    <reporter username="dneg">Dneg</reporter>
                        <labels>
                    </labels>
                <created>Mon, 6 Jun 2022 19:23:05 +0000</created>
                <updated>Fri, 25 Nov 2022 13:48:50 +0000</updated>
                            <resolved>Sat, 19 Nov 2022 16:22:53 +0000</resolved>
                                    <version>Lustre 2.12.8</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="336854" author="dneg" created="Mon, 6 Jun 2022 19:29:14 +0000"  >&lt;p&gt;should one of the clients get evicted again, I&apos;ll upload the logs&lt;/p&gt;</comment>
                            <comment id="336893" author="dneg" created="Tue, 7 Jun 2022 11:02:49 +0000"  >&lt;p&gt;Several evictions last night. I&apos;ve uploaded the logs to ftp.whamcloud.com uploads/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15915&quot; title=&quot;/bin/rm: fts_read failed: Cannot send after transport endpoint shutdown&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15915&quot;&gt;&lt;del&gt;LU-15915&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="336894" author="dneg" created="Tue, 7 Jun 2022 11:03:28 +0000"  >&lt;p&gt;Could we please raise the priority on this? Thanks&lt;/p&gt;</comment>
                            <comment id="348405" author="JIRAUSER18019" created="Fri, 30 Sep 2022 17:42:42 +0000"  >&lt;p&gt;Are you still experiencing this issue? &lt;del&gt;There isn&apos;t enough technical information to diagnose the problem.&lt;/del&gt; (We just noticed your follow-on comment about the Lustre logs uploaded to ftp.whamcloud.com and found your logs still there).&lt;/p&gt;

&lt;p&gt;Can you please attach an hour or two of console logs preceding the client eviction from the MDS and from the affected clients? Also, the Lustre kernel debug logs if they are available.&lt;/p&gt;</comment>
                            <comment id="348410" author="adilger" created="Fri, 30 Sep 2022 18:59:21 +0000"  >&lt;p&gt;Based on the description of the problematic workload (concurrent &quot;&lt;tt&gt;rm -r&lt;/tt&gt;&quot; from many clients) it is possible that this relates to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15821&quot; title=&quot;Server driven blocking callbacks can wait behind general lru_size management&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15821&quot;&gt;&lt;del&gt;LU-15821&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The patch &lt;a href=&quot;https://review.whamcloud.com/47215&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/47215&lt;/a&gt; &quot;&lt;tt&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15821&quot; title=&quot;Server driven blocking callbacks can wait behind general lru_size management&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15821&quot;&gt;&lt;del&gt;LU-15821&lt;/del&gt;&lt;/a&gt; ldlm: Prioritize blocking callbacks&lt;/tt&gt;&quot; prioritizes the processing of lock callbacks from the servers, which can be problematic if there is a workload that is continually creating new locks and preventing the lock callbacks from the server from being processed in a timely manner.&lt;/p&gt;

&lt;p&gt;Would you be interested to test this simple patch on the clients that are running the &quot;&lt;tt&gt;rm -r&lt;/tt&gt;&quot; tasks?  It is very low risk and unlikely to cause any problems, but is not totally sure whether it will solve the problem. &lt;/p&gt;</comment>
                            <comment id="348599" author="dneg" created="Tue, 4 Oct 2022 08:37:20 +0000"  >&lt;p&gt;Our current workaround is to set the lru_size to 128 (ldlm.namespaces.*.lru_size=128), on the principle that having less locks allow the client and MDS to communicate all the locks in a timely manner, which seems to have reduced the evictions (though it means a lot more getattr calls to the MDS), and things are slower. &lt;/p&gt;

&lt;p&gt;How would we apply this patch? Should I download srpms, apply the patch and build? If so, which srpms need rebuilding? Can I patch the same release (2.12.8_6_g5457c37) as we&apos;re currently using? And is it client only? Or for the servers as well? &lt;/p&gt;</comment>
                            <comment id="348606" author="adilger" created="Tue, 4 Oct 2022 10:20:31 +0000"  >&lt;p&gt;The patch is only on the client, and should be able to apply to the source you are currently using and then rebuild the client RPMs.  My assumption, based on the build version in use on your clients, is that you already build your own client RPMs, so this should not be any different.&lt;/p&gt;</comment>
                            <comment id="348652" author="dneg" created="Tue, 4 Oct 2022 15:30:56 +0000"  >&lt;p&gt;Thanks Andreas. Actually, we use Whamcloud-supplied rpms, so I&apos;ll download the corresponding srpm(s)&lt;/p&gt;</comment>
                            <comment id="348733" author="adilger" created="Wed, 5 Oct 2022 03:16:57 +0000"  >&lt;p&gt;I&apos;ve cherry-picked patch &lt;a href=&quot;https://review.whamcloud.com/48764&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/48764&lt;/a&gt; &quot;&lt;tt&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15821&quot; title=&quot;Server driven blocking callbacks can wait behind general lru_size management&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15821&quot;&gt;&lt;del&gt;LU-15821&lt;/del&gt;&lt;/a&gt; ldlm: Prioritize blocking callbacks&lt;/tt&gt;&quot; to b2_12 to generate a build and run some testing.&lt;/p&gt;</comment>
                            <comment id="348756" author="pjones" created="Wed, 5 Oct 2022 11:45:50 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=dneg&quot; class=&quot;user-hover&quot; rel=&quot;dneg&quot;&gt;dneg&lt;/a&gt;&#160;I&apos;m not sure how familiar you are in navigating the Whamcloud development infrastructure but you should now be able to grab SRMs from Jenkins - &lt;a href=&quot;https://build.whamcloud.com/job/lustre-reviews/89806/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://build.whamcloud.com/job/lustre-reviews/89806/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="348872" author="dneg" created="Thu, 6 Oct 2022 10:55:43 +0000"  >&lt;p&gt;Hi Peter, &lt;/p&gt;

&lt;p&gt;I found the rpms from that link (e.g., &lt;a href=&quot;https://build.whamcloud.com/job/lustre-reviews/89806/arch=x86_64,build_type=client,distro=el7.9,ib_stack=inkernel/artifact/artifacts/RPMS/x86_64/kmod-lustre-client-2.12.9_11_gf855161-1.el7.x86_64.rpm&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://build.whamcloud.com/job/lustre-reviews/89806/arch=x86_64,build_type=client,distro=el7.9,ib_stack=inkernel/artifact/artifacts/RPMS/x86_64/kmod-lustre-client-2.12.9_11_gf855161-1.el7.x86_64.rpm&lt;/a&gt; etc), but unless you recommend using that later version (2.12.9) I was going to use the same version as we&apos;re running, which I&apos;ve downloaded the source rpms for, and applied the patch Andreas linked then built the client rpms. Let me know if that is ok.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Campbell&lt;/p&gt;</comment>
                            <comment id="348884" author="pjones" created="Thu, 6 Oct 2022 12:35:00 +0000"  >&lt;p&gt;Yes that is fine - I was just trying to make things easier for you.&lt;/p&gt;</comment>
                            <comment id="349753" author="pjones" created="Fri, 14 Oct 2022 23:59:14 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=dneg&quot; class=&quot;user-hover&quot; rel=&quot;dneg&quot;&gt;dneg&lt;/a&gt;&#160;just checking in to see whether you have tried the suggested patch yet...&lt;/p&gt;</comment>
                            <comment id="350109" author="dneg" created="Wed, 19 Oct 2022 09:22:18 +0000"  >&lt;p&gt;Hi Peter, have just managed to apply the patched rpm to the clients now, and have increased the lru_size back up to 10000 from the 128 we had it set to, across the cluster. It took a long time as I had to wait for all the long running backups on these clients to finish, which I had to do one by one. So we should get some results through in the next couple of days, and I&apos;ll let you know either way.&lt;br/&gt;
-Campbell&lt;/p&gt;</comment>
                            <comment id="350175" author="pjones" created="Wed, 19 Oct 2022 15:45:16 +0000"  >&lt;p&gt;Great - thanks!&lt;/p&gt;</comment>
                            <comment id="350265" author="dneg" created="Thu, 20 Oct 2022 08:55:58 +0000"  >&lt;p&gt;Got three evictions last night on one of the three patched clients:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
Oct 20 01:54:22 foxtrot2 kernel: LustreError: 167-0: foxtrot-MDT0000-mdc-ffff99c032e95800: This client was evicted by foxtrot-MDT0000; in progress operations using &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; service will fail.
Oct 20 03:02:17 foxtrot2 kernel: LustreError: 167-0: foxtrot-MDT0000-mdc-ffff99c032e95800: This client was evicted by foxtrot-MDT0000; in progress operations using &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; service will fail.
Oct 20 03:38:09 foxtrot2 kernel: LustreError: 167-0: foxtrot-MDT0000-mdc-ffff99c032e95800: This client was evicted by foxtrot-MDT0000; in progress operations using &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; service will fail.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="350483" author="dneg" created="Fri, 21 Oct 2022 16:44:16 +0000"  >&lt;p&gt;3 more evictions last night, two on one client, one on another. &lt;/p&gt;</comment>
                            <comment id="350665" author="green" created="Tue, 25 Oct 2022 06:29:37 +0000"  >&lt;p&gt;I just realized you did not include the server side logs so we don&apos;t know which logs the client was supposed to release and did not in time.&lt;/p&gt;

&lt;p&gt;Can you please provide the MDS logs from the timeframe when those lustre-log files you uploaded were created?&lt;/p&gt;</comment>
                            <comment id="350856" author="dneg" created="Wed, 26 Oct 2022 16:17:20 +0000"  >&lt;p&gt;Hi Oleg,&lt;/p&gt;

&lt;p&gt;I&apos;ve uploaded the logs from the mds from the earlier June evictions, as well as the latest batch, to ftp.whamcloud.com/uploads/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15915&quot; title=&quot;/bin/rm: fts_read failed: Cannot send after transport endpoint shutdown&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15915&quot;&gt;&lt;del&gt;LU-15915&lt;/del&gt;&lt;/a&gt;. I&apos;ve counted altogether 39 evictions in the 6 days since the patches were applied and lru_size was increased back to to 10000. I&apos;ve dropped it back down to 2048 to see if that makes much of a difference, though it might increase the load on the mds due to increased numbers of getattr calls, guess we&apos;ll see.&lt;/p&gt;

&lt;p&gt;Campbell&lt;/p&gt;</comment>
                            <comment id="351080" author="dneg" created="Fri, 28 Oct 2022 09:04:52 +0000"  >&lt;p&gt;Dropped the lru_size to 128 as we were still getting evictions and backups were failing.&lt;/p&gt;</comment>
                            <comment id="351325" author="green" created="Tue, 1 Nov 2022 01:39:32 +0000"  >&lt;p&gt;so these (June) logs look very familiar, I think there was a similar report&#160; some time ago where a lock cancel was triggering other lock cancels ? Can&apos;t quite find the report readily though to refresh my memory. ...&#160; &lt;/p&gt;

&lt;p&gt;On the surface it sounds like &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15821&quot; title=&quot;Server driven blocking callbacks can wait behind general lru_size management&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15821&quot;&gt;&lt;del&gt;LU-15821&lt;/del&gt;&lt;/a&gt; but I think there was another path where as we do a cancel we also want to pack up some more locks that are ready to be cancelled (I cannot verify what your current client version is to see it really got the fix and I don&apos;t have more recent debug logs to confirm the behavior is same or different now).&lt;/p&gt;

&lt;p&gt;In the old logs this is how it unfolds:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00010000:00010000:9.0:1654595833.299427:0:3716:0:(ldlm_lockd.c:1775:ldlm_handle_bl_callback()) ### client blocking AST callback handler ns: foxtrot-MDT0000-mdc-ffff8acff2c13800 lock: ffff8af441314d80/0x7960025dca1d30a8 lrc: 2/0,0 mode: PR/PR res: [0x20009b4d8:0x14677:0x0].0x0 bits 0x20/0x0 rrc: 3 type: IBT flags: 0x400000000000 nid: local remote: 0xb4442ee9c835c045 expref: -99 pid: 35329 timeout: 0 lvb_type: 0
00010000:00010000:9.0:1654595833.299485:0:3716:0:(ldlm_request.c:1150:ldlm_cli_cancel_local()) ### client-side cancel ns: foxtrot-MDT0000-mdc-ffff8acff2c13800 lock: ffff8af441314d80/0x7960025dca1d30a8 lrc: 3/0,0 mode: PR/PR res: [0x20009b4d8:0x14677:0x0].0x0 bits 0x20/0x20 rrc: 3 type: IBT flags: 0x408400000000 nid: local remote: 0xb4442ee9c835c045 expref: -99 pid: 35329 timeout: 0 lvb_type: 0
00010000:00000040:9.0:1654595833.299489:0:3716:0:(ldlm_resource.c:1601:ldlm_resource_putref()) putref res: ffff8b0921172e40 count: 2
00010000:00000040:24.0:1654595833.299492:0:35567:0:(ldlm_resource.c:1601:ldlm_resource_putref()) putref res: ffff8afaf44dac00 count: 1
00000020:00000040:9.0:1654595833.299493:0:3716:0:(lustre_handles.c:113:class_handle_unhash_nolock()) removing object ffff8af441314d80 with handle 0x7960025dca1d30a8 from hash 
...
lots and lots of other locks are being collected to be sent in the cancel RPC
...
00010000:00010000:4.0:1654595933.168282:0:3716:0:(ldlm_lockd.c:1800:ldlm_handle_bl_callback()) ### client blocking callback handler END ns: foxtrot-MDT0000-mdc-ffff8acff2c13800 lock: ffff8af441314d80/0x7960025dca1d30a8 lrc: 1/0,0 mode: --/PR res: [0x20009b4d8:0x14677:0x0].0x0 bits 0x20/0x20 rrc: 3 type: IBT flags: 0x4c09400000000 nid: local remote: 0xb4442ee9c835c045 expref: -99 pid: 35329 timeout: 0 lvb_type: 0&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;So from this we see we already traversed the ldlm_cli_cancel_local-&amp;gt;ldlm_lock_cancel-&amp;gt;ldlm_lock_destroy_nolock-&amp;gt;class_handle_unhash_nolock - AKA the lock was totally destroyed and it&apos;s not a matter of the cancel thread not even getting to it as in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15821&quot; title=&quot;Server driven blocking callbacks can wait behind general lru_size management&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15821&quot;&gt;&lt;del&gt;LU-15821&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;anyway, looking closer at this thread (which is named ldlm_bl_126) we actually see this:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000020:00100000:11.0:1654595833.301719:0:3716:0:(genops.c:2379:obd_get_mod_rpc_slot()) foxtrot-MDT0000-mdc-ffff8acff2c13800: sleeping for a modify RPC slot opc 35, max 7
00000100:00100000:4.0:1654595933.167133:0:3716:0:(client.c:2096:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ldlm_bl_126:22bdcfdc-5f27-ca71-fcf9-840efc5add00:3716:1733599145210368:10.21.22.10@tcp:35 &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;so somewhere along the way of canceling multiple locks it suddenly met one that required a modify rpc slot that we did not have. opc 35 is MDS_CLOSE, so we were canceling an open lock and could not send the close stopping this whole canceling thread in it&apos;s tracks.&lt;/p&gt;

&lt;p&gt;So this is actually &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14741&quot; title=&quot;Close RPC might get stuck behind normal RPCs waiting for slot&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14741&quot;&gt;&lt;del&gt;LU-14741&lt;/del&gt;&lt;/a&gt; that was never included into b2_12, the backported patch is here: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/45850&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/45850&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="351344" author="dneg" created="Tue, 1 Nov 2022 11:51:11 +0000"  >&lt;p&gt;Hi Oleg, I have applied the patch to lustre/obdclass/genops.c (there was just the one at &lt;a href=&quot;https://review.whamcloud.com/changes/fs%2Flustre-release~45850/revisions/1/patch?zip&amp;amp;path=lustre%2Fobdclass%2Fgenops.c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/changes/fs%2Flustre-release~45850/revisions/1/patch?zip&amp;amp;path=lustre%2Fobdclass%2Fgenops.c&lt;/a&gt;, correct?) and have built new client rpms. I&apos;ll install them on the clients over the next few days, then bump up the lru_size across the cluster and let you know the result.&lt;br/&gt;
Thanks,&lt;br/&gt;
Campbell&lt;/p&gt;</comment>
                            <comment id="352769" author="pjones" created="Fri, 11 Nov 2022 16:20:54 +0000"  >&lt;p&gt;Hey Campbell&lt;/p&gt;

&lt;p&gt;Just checking in to see how things are progressing&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="352797" author="dneg" created="Fri, 11 Nov 2022 19:24:14 +0000"  >&lt;p&gt;Hi Peter, was just about to post an update. No evictions since the patch was applied ealier in the week (Tuesday), so good news on that front. Will keep an eye on it over the weekend. We get the odd soft lockup (e.g., Nov  9 03:11:25 foxtrot3 kernel: NMI watchdog: BUG: soft lockup - CPU#23 stuck for 22s! &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpcd_01_10:3531&amp;#93;&lt;/span&gt;). I can open a separate ticket for that issue if you like&lt;/p&gt;</comment>
                            <comment id="352799" author="dneg" created="Fri, 11 Nov 2022 19:35:17 +0000"  >&lt;p&gt;I saw &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15742&quot; class=&quot;external-link&quot; rel=&quot;nofollow&quot;&gt;https://jira.whamcloud.com/browse/LU-15742&lt;/a&gt;, we already have lru_size at 10000, and ldlm.namespaces.*.lru_max_age=60000&lt;/p&gt;</comment>
                            <comment id="353297" author="dneg" created="Thu, 17 Nov 2022 10:28:07 +0000"  >&lt;p&gt;Looking good, still no evictions after a week. &lt;/p&gt;</comment>
                            <comment id="353611" author="pjones" created="Sat, 19 Nov 2022 16:22:53 +0000"  >&lt;p&gt;Great! Then let&apos;s mark this as a duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14741&quot; title=&quot;Close RPC might get stuck behind normal RPCs waiting for slot&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14741&quot;&gt;&lt;del&gt;LU-14741&lt;/del&gt;&lt;/a&gt;. It would be better to track the soft lockup issue under a new ticket.&lt;/p&gt;</comment>
                            <comment id="354163" author="dneg" created="Fri, 25 Nov 2022 10:20:38 +0000"  >&lt;p&gt;Thanks Peter, can you tell me which releases (in particular, 2.15.x and 2.12.x) have this genops.c patch?&lt;/p&gt;</comment>
                            <comment id="354176" author="pjones" created="Fri, 25 Nov 2022 13:48:50 +0000"  >&lt;p&gt;This fix was in 2.15.0 and will be in 2.12.10 (if we do one)&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                        <issuelink>
            <issuekey id="64569">LU-14741</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="64569">LU-14741</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="70152">LU-15821</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i02rhz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>