<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:19:15 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-8634] 2.8.0 MDS (layout.c:2025:__req_capsule_get()) @@@ Wrong buffer for field `quota_body&apos; (3 of 1) in format `LDLM_INTENT_QUOTA&apos;: 0 vs. 112 (server)</title>
                <link>https://jira.whamcloud.com/browse/LU-8634</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;h3&gt;&lt;a name=&quot;Synopsis%3A&quot;&gt;&lt;/a&gt;Synopsis:&lt;/h3&gt;

&lt;p&gt;Lustre version 2.8.0 on servers and clients, sudden degraded namespace lookups on clients.  Subsystem RPC ERRORS related to LDLM_INTENT_QUOTA are emitted to the Lustre debug log on all Lustre servers.  Degraded performance is resolved by unmounting MDT and rebooting the MDS.&lt;/p&gt;

&lt;p&gt;I haven&apos;t found any JIRA matches on what we&apos;re seeing in particular, so opening this ticket.&lt;/p&gt;

&lt;p&gt;Servers are running Lustre 2.8 from latest el6 Community Releases:&lt;br/&gt;
&lt;a href=&quot;https://downloads.hpdd.intel.com/public/lustre/latest-release/el6.7/server/RPMS/x86_64/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://downloads.hpdd.intel.com/public/lustre/latest-release/el6.7/server/RPMS/x86_64/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;More details follow, any suggestions on how to collect more meaningful debugging information is appreciated as we&apos;ve experienced this degraded performance twice on this production cluster.&lt;/p&gt;
&lt;hr /&gt;
&lt;h3&gt;&lt;a name=&quot;Detailed%3A&quot;&gt;&lt;/a&gt;Detailed:&lt;/h3&gt;

&lt;p&gt;The Lustre system is comprised of 1 MDS with 1 MDT, 6 OSSs with 6 OSTs per OSS, 36x OSTs total.&lt;/p&gt;

&lt;p&gt;The first time we experienced this issue (2016-09-07) we suspected something was wrong with the MDT since all the errors pointed to that component (mds1 warnings on nid2str of &quot;lo&quot; and OSSs warning on nid2str of &quot;10.137.32.37@o2ib,&quot; the mds1&apos;s nid).&lt;/p&gt;

&lt;p&gt;The degraded namespace performance on the clients appeared to be totally hung for some users/processes.  Simple tests showed that a &apos;stat /lustre&apos; took 6 sconds before returning valid data.  Stating directories in deeper paths of /lustre added 6 seconds per path component, ie &apos;stat /lustre/work&apos; was 12 seconds, etc.  Path components with more entries did take a bit longer to stat.&lt;/p&gt;

&lt;p&gt;The corrective actions we took at that time was to unmount the MDT, unload all the Lustre kernel modules, unconfigure the LNET network and unload the remaining lnd kernel modules.  We reversed this process without rebooting the MDS.  Remounting the MDT succeeded, however we experienced copious LustreError messages on the console and most of our clients were evicted, snippets of errors I still have on hand from then:&lt;/p&gt;

&lt;div class=&quot;panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;panelContent&quot;&gt;
&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;1993495.191766&amp;#93;&lt;/span&gt; LustreError: 24447:0:(upcall_cache.c:237:upcall_cache_get_entry()) acquire for key 4254: error -110&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;1993695.996095&amp;#93;&lt;/span&gt; Lustre: lstrFS-MDT0000: Recovery over after 5:00, of 111 clients 16 recovered and 95 were evicted.&lt;/p&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&apos;lctl dl&apos; didn&apos;t show anything and /proc/fs/lustre was empty on the MDS post mount of the MDT.  Calling &apos;stat /lustre&apos; on a client was still very heavily delayed.  Most namespace related calls seemed to be completely hung.&lt;/p&gt;

&lt;p&gt;Having arrived at this point we determined our only option was to reboot the MDS.  Upon our attempt to again &apos;unmount /lustre&apos;, messages were emitted to the console about various Lustre services threads being hung.  The umount process had the following back trace:&lt;/p&gt;

&lt;div class=&quot;panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;panelHeader&quot; style=&quot;border-bottom-width: 1px;&quot;&gt;&lt;b&gt;cat /proc/25973/stack&lt;/b&gt;&lt;/div&gt;&lt;div class=&quot;panelContent&quot;&gt;
&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa12ab1e0&amp;gt;&amp;#93;&lt;/span&gt; __ldlm_namespace_free+0x1c0/0x560 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa12ab5ef&amp;gt;&amp;#93;&lt;/span&gt; ldlm_namespace_free_prior+0x6f/0x220 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa19055d2&amp;gt;&amp;#93;&lt;/span&gt; mdt_device_fini+0x6a2/0x12e0 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdt&amp;#93;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa10c3332&amp;gt;&amp;#93;&lt;/span&gt; class_cleanup+0x572/0xd20 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdclass&amp;#93;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa10c5646&amp;gt;&amp;#93;&lt;/span&gt; class_process_config+0x1b66/0x24c0 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdclass&amp;#93;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa10c645f&amp;gt;&amp;#93;&lt;/span&gt; class_manual_cleanup+0x4bf/0xc90 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdclass&amp;#93;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa10f7a1c&amp;gt;&amp;#93;&lt;/span&gt; server_put_super+0x8bc/0xcd0 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdclass&amp;#93;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff811944bb&amp;gt;&amp;#93;&lt;/span&gt; generic_shutdown_super+0x5b/0xe0&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff811945a6&amp;gt;&amp;#93;&lt;/span&gt; kill_anon_super+0x16/0x60&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa10c9646&amp;gt;&amp;#93;&lt;/span&gt; lustre_kill_super+0x36/0x60 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdclass&amp;#93;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff81194d47&amp;gt;&amp;#93;&lt;/span&gt; deactivate_super+0x57/0x80&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff811b4d3f&amp;gt;&amp;#93;&lt;/span&gt; mntput_no_expire+0xbf/0x110&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff811b588b&amp;gt;&amp;#93;&lt;/span&gt; sys_umount+0x7b/0x3a0&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8100b0d2&amp;gt;&amp;#93;&lt;/span&gt; system_call_fastpath+0x16/0x1b&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffffffffff&amp;gt;&amp;#93;&lt;/span&gt; 0xffffffffffffffff&lt;/p&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We waited ~15 minutes but the umount attempt seemed thoroughly wedged, so we reset the MDS.&lt;/p&gt;

&lt;p&gt;Upon reboot, we &apos;e2fsck -fp /dev/path/to/mdt&apos; prior to remounting.  When we did, the recovery process seemed normal.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;Fast-forward to today:&lt;br/&gt;
We ran without issue again until this morning around 5am, with the same symptoms on clients and errors on the servers:&lt;/p&gt;

&lt;div class=&quot;panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;panelHeader&quot; style=&quot;border-bottom-width: 1px;&quot;&gt;&lt;b&gt;mds1 lctl dk entry&lt;/b&gt;&lt;/div&gt;&lt;div class=&quot;panelContent&quot;&gt;
&lt;p&gt;00000100:00020000:7.0:1474538545.022548:0:3082:0:(layout.c:2025:__req_capsule_get()) @@@ Wrong buffer for field `quota_body&apos; (3 of 1) in format `LDLM_INTENT_QUOTA&apos;: 0 vs. 112 (server)&lt;br/&gt;
  req@ffff8805c33040c0 x1544837681171836/t0(0) o101-&amp;gt;lstrFS-MDT0000-lwp-MDT0000@0@lo:23/10 lens 456/192 e 0 to 0 dl 1474538552 ref 1 fl Interpret:RN/0/0 rc -115/-115&lt;/p&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;panelHeader&quot; style=&quot;border-bottom-width: 1px;&quot;&gt;&lt;b&gt;oss1 (lctl dk log rolled over, so /var/log/messages entry)&lt;/b&gt;&lt;/div&gt;&lt;div class=&quot;panelContent&quot;&gt;
&lt;p&gt;Sep 22 05:02:06 oss1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;3264568.614229&amp;#93;&lt;/span&gt; LustreError: 11270:0:(layout.c:2025:__req_capsule_get()) @@@ Wrong buffer for field `quota_body&apos; (3 of 1) in format `LDLM_INTENT_QUOTA&apos;: 0 vs. 112 (server)&lt;br/&gt;
Sep 22 05:02:06 oss1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;3264568.614231&amp;#93;&lt;/span&gt;   req@ffff8800705e3c80 x1542797516212496/t0(0) o101-&amp;gt;lstrFS-MDT0000-lwp-OST0000@10.137.32.37@o2ib:23/10 lens 456/192 e 0 to 0 dl 1474538533 ref 1 fl Interpret:RN/0/0 rc -115/-115&lt;/p&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;panelHeader&quot; style=&quot;border-bottom-width: 1px;&quot;&gt;&lt;b&gt;oss2 (lctl dk log rolled over, so /var/log/messages entry)&lt;/b&gt;&lt;/div&gt;&lt;div class=&quot;panelContent&quot;&gt;
&lt;p&gt;Sep 22 05:01:16 oss2 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;3264426.880193&amp;#93;&lt;/span&gt; LustreError: 17980:0:(layout.c:2025:__req_capsule_get()) @@@ Wrong buffer for field `quota_body&apos; (3 of 1) in format `LDLM_INTENT_QUOTA&apos;: 0 vs. 112 (server)&lt;br/&gt;
Sep 22 05:01:16 oss2 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;3264426.880195&amp;#93;&lt;/span&gt;   req@ffff88030a58ccc0 x1542838780945432/t0(0) o101-&amp;gt;lstrFS-MDT0000-lwp-OST0007@10.137.32.37@o2ib:23/10 lens 456/192 e 0 to 0 dl 1474538483 ref 1 fl Interpret:RN/0/0 rc -115/-115&lt;/p&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;panelHeader&quot; style=&quot;border-bottom-width: 1px;&quot;&gt;&lt;b&gt;oss3 (lctl dk log rolled over, so /var/log/messages entry)&lt;/b&gt;&lt;/div&gt;&lt;div class=&quot;panelContent&quot;&gt;
&lt;p&gt;Sep 22 05:01:40 oss3 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;3264396.120233&amp;#93;&lt;/span&gt; LustreError: 17958:0:(layout.c:2025:__req_capsule_get()) @@@ Wrong buffer for field `quota_body&apos; (3 of 1) in format `LDLM_INTENT_QUOTA&apos;: 0 vs. 112 (server)&lt;br/&gt;
Sep 22 05:01:40 oss3 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;3264396.120235&amp;#93;&lt;/span&gt;   req@ffff880362c2f980 x1542838278617176/t0(0) o101-&amp;gt;lstrFS-MDT0000-lwp-OST0011@10.137.32.37@o2ib:23/10 lens 456/192 e 0 to 0 dl 1474538507 ref 1 fl Interpret:RN/0/0 rc -115/-115&lt;/p&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;panelHeader&quot; style=&quot;border-bottom-width: 1px;&quot;&gt;&lt;b&gt;oss4 (lctl dk log rolled over, so /var/log/messages entry)&lt;/b&gt;&lt;/div&gt;&lt;div class=&quot;panelContent&quot;&gt;
&lt;p&gt;Sep 22 05:01:46 oss4 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;3264365.930765&amp;#93;&lt;/span&gt; LustreError: 17958:0:(layout.c:2025:__req_capsule_get()) @@@ Wrong buffer for field `quota_body&apos; (3 of 1) in format `LDLM_INTENT_QUOTA&apos;: 0 vs. 112 (server)&lt;br/&gt;
Sep 22 05:01:46 oss4 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;3264365.930767&amp;#93;&lt;/span&gt;   req@ffff8800af9fc6c0 x1542838806272068/t0(0) o101-&amp;gt;lstrFS-MDT0000-lwp-OST0012@10.137.32.37@o2ib:23/10 lens 456/192 e 0 to 0 dl 1474538513 ref 1 fl Interpret:RN/0/0 rc -115/-115&lt;/p&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;panelHeader&quot; style=&quot;border-bottom-width: 1px;&quot;&gt;&lt;b&gt;oss5 lctl dk&lt;/b&gt;&lt;/div&gt;&lt;div class=&quot;panelContent&quot;&gt;
&lt;p&gt;00000100:00020000:7.0:1474538507.425216:0:18722:0:(layout.c:2025:__req_capsule_get()) @@@ Wrong buffer for field `quota_body&apos; (3 of 1) in format `LDLM_INTENT_QUOTA&apos;: 0 vs. 112 (server)&lt;br/&gt;
  req@ffff8801d42fb980 x1542838266300652/t0(0) o101-&amp;gt;lstrFS-MDT0000-lwp-OST001b@10.137.32.37@o2ib:23/10 lens 456/192 e 0 to 0 dl 1474538514 ref 1 fl Interpret:RN/0/0 rc -115/-115&lt;/p&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;panelHeader&quot; style=&quot;border-bottom-width: 1px;&quot;&gt;&lt;b&gt;oss6 lctl dk&lt;/b&gt;&lt;/div&gt;&lt;div class=&quot;panelContent&quot;&gt;
&lt;p&gt;00000100:00020000:10.0:1474538538.449115:0:18717:0:(layout.c:2025:__req_capsule_get()) @@@ Wrong buffer for field `quota_body&apos; (3 of 1) in format `LDLM_INTENT_QUOTA&apos;: 0 vs. 112 (server)&lt;br/&gt;
  req@ffff8805ae9ad380 x1542838948349576/t0(0) o101-&amp;gt;lstrFS-MDT0000-lwp-OST001e@10.137.32.37@o2ib:23/10 lens 456/192 e 0 to 0 dl 1474538545 ref 1 fl Interpret:RN/0/0 rc -115/-115&lt;/p&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The corrective actions we took today involved the same unmount/unloading of kernel module steps as indicated above from (2016-09-07).  However, we rebooted the MDS rather than reloading/remounting the MDT to avoid any similar issues.  Upon MDS boot, we still &apos;e2fsck -fp /dev/path/to/mdt&apos; to ensure consistency of the cleanly unmounted MDT.&lt;/p&gt;

&lt;p&gt;The following was output:&lt;/p&gt;

&lt;div class=&quot;panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;panelContent&quot;&gt;
&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;QUOTA WARNING&amp;#93;&lt;/span&gt; Usage inconsistent for ID 0:actual (7314026496, 402396) != expected (7313580032, 402359)&lt;br/&gt;
lstrFS-MDT0000: Update quota info for quota type 0.&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;QUOTA WARNING&amp;#93;&lt;/span&gt; Usage inconsistent for ID 0:actual (7312138240, 402074) != expected (7311691776, 402037)&lt;br/&gt;
lstrFS-MDT0000: Update quota info for quota type 1.&lt;br/&gt;
lstrFS-MDT0000: 47314184/2147483648 files (0.1% non-contiguous), 273274458/536870912 blocks&lt;/p&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Remounting the MDT, all clients reconnected without being evicted.&lt;/p&gt;</description>
                <environment></environment>
        <key id="40001">LU-8634</key>
            <summary>2.8.0 MDS (layout.c:2025:__req_capsule_get()) @@@ Wrong buffer for field `quota_body&apos; (3 of 1) in format `LDLM_INTENT_QUOTA&apos;: 0 vs. 112 (server)</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="niu">Niu Yawei</assignee>
                                    <reporter username="jsamuels">Josh Samuelson</reporter>
                        <labels>
                            <label>mdt</label>
                            <label>ptlrpc</label>
                            <label>quota</label>
                    </labels>
                <created>Thu, 22 Sep 2016 19:22:00 +0000</created>
                <updated>Mon, 7 Nov 2016 20:18:28 +0000</updated>
                            <resolved>Mon, 7 Nov 2016 20:18:28 +0000</resolved>
                                    <version>Lustre 2.8.0</version>
                                    <fixVersion>Lustre 2.9.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="167170" author="niu" created="Mon, 26 Sep 2016 03:55:30 +0000"  >&lt;p&gt;Were there lots of client activities when this error message showed up? I&apos;m wondering the MDS could be overloaded by locking requests,  could you check the following three proc files  on MDS when you see this message (&quot;__req_capsule_get() ... &apos;quota_body&apos;) again?&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;/proc/fs/lustre/ldlm/lock_granted_count
/proc/fs/lustre/ldlm/lock_limit_mb
/proc/fs/lustre/ldlm/lock_reclaim_threshold_mb
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To avoid being fullfilled by ldlm locks, MDS will reject any new locking request with -EINPROGRESS(-115) when cached locks reaching a threshold, and I checked quota code, looks it would mistakenly interpret this error and try to unpack reply buffer.&lt;/p&gt;</comment>
                            <comment id="167409" author="jsamuels" created="Tue, 27 Sep 2016 00:21:23 +0000"  >&lt;p&gt;Hi Niu,&lt;/p&gt;

&lt;p&gt;Thank you for taking a look into this.&lt;/p&gt;

&lt;p&gt;I looked at the source for a bit and see lock_reclaim_threshold_mb and lock_limit_mb are 20/30% of totalram_pages, so our values are:&lt;/p&gt;

&lt;p&gt;grep -H . /proc/fs/lustre/ldlm/&lt;/p&gt;
{lock_reclaim_threshold_mb,lock_limit_mb}
&lt;p&gt;/proc/fs/lustre/ldlm/lock_reclaim_threshold_mb:9638&lt;br/&gt;
/proc/fs/lustre/ldlm/lock_limit_mb:14457&lt;/p&gt;

&lt;p&gt;Using the size reported in /proc/slabinfo for cache type ldlm_locks for data structure &apos;struct ldlm_locks&apos; (512 in our case), the low/high lock count for us would be (using lustre/ldlm/ldlm_reclaim.c:ldlm_ratio2locknr() to find the counts):&lt;/p&gt;

&lt;p&gt;low: 9638*2^20/(100*512) == ~197386&lt;br/&gt;
high: 14457*2^20/(100*512) == ~296079&lt;/p&gt;

&lt;p&gt;If I found the counts correctly, what are those threshold counts checked against, the number of slab ldlm_locks active objects?&lt;/p&gt;

&lt;p&gt;I&apos;m curious how the following proc values relate:&lt;/p&gt;

&lt;p&gt;In particular, should active_objs for ldlm_locks match closely to the lock values lustre outputs in /proc/fs/lustre/ldlm paths?&lt;/p&gt;

&lt;div class=&quot;panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;panelHeader&quot; style=&quot;border-bottom-width: 1px;&quot;&gt;&lt;b&gt;mds1# grep -E &apos;(active_objs)&lt;/b&gt;&lt;/div&gt;&lt;div class=&quot;panelContent&quot;&gt;
&lt;ol&gt;
	&lt;li&gt;name            &amp;lt;active_objs&amp;gt; &amp;lt;num_objs&amp;gt; &amp;lt;objsize&amp;gt; &amp;lt;objperslab&amp;gt; &amp;lt;pagesperslab&amp;gt;&lt;br/&gt;
 : tunables &amp;lt;limit&amp;gt; &amp;lt;batchcount&amp;gt; &amp;lt;sharedfactor&amp;gt; : slabdata &amp;lt;active_slabs&amp;gt; &amp;lt;num_s&lt;br/&gt;
labs&amp;gt; &amp;lt;sharedavail&amp;gt;&lt;br/&gt;
ldlm_locks          6669   8183    512    7    1 : tunables   54   27    8 : slabdata   1169   1169    255&lt;br/&gt;
ldlm_resources      1820   3580    192   20    1 : tunables  120   60    8 : slabdata    179    179    616&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;panelHeader&quot; style=&quot;border-bottom-width: 1px;&quot;&gt;&lt;b&gt;mds1# grep -H . /proc/fs/lustre/ldlm/lock_granted_count&lt;/b&gt;&lt;/div&gt;&lt;div class=&quot;panelContent&quot;&gt;
&lt;p&gt;/proc/fs/lustre/ldlm/lock_granted_count:3549033&lt;/p&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Should ldlm_locks active_objs of 6669 be close to the granted count of 3549033?&lt;/p&gt;

&lt;div class=&quot;panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;panelHeader&quot; style=&quot;border-bottom-width: 1px;&quot;&gt;&lt;b&gt;mds1# grep -H . /proc/fs/lustre/ldlm/namespaces/*/lock_count&lt;/b&gt;&lt;/div&gt;&lt;div class=&quot;panelContent&quot;&gt;
&lt;p&gt;/proc/fs/lustre/ldlm/namespaces/lstrFS-MDT0000-lwp-MDT0000/lock_count:232&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST0000-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST0001-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST0002-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST0003-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST0004-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST0005-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST0006-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST0007-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST0008-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST0009-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST000a-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST000b-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST000c-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST000d-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST000e-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST000f-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST0010-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST0011-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST0012-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST0013-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST0014-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST0015-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST0016-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST0017-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST0018-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST0019-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST001a-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST001b-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST001c-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST001d-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST001e-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST001f-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST0020-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST0021-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST0022-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/lstrFS-OST0023-osc-MDT0000/lock_count:0&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/mdt-lstrFS-MDT0000_UUID/lock_count:5156&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/MGC10.137.32.37@o2ib/lock_count:6&lt;br/&gt;
/proc/fs/lustre/ldlm/namespaces/MGS/lock_count:600&lt;/p&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Should these values add up (or be close) to the ldlm_locks active_objs count also?&lt;/p&gt;

&lt;div class=&quot;panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;panelHeader&quot; style=&quot;border-bottom-width: 1px;&quot;&gt;&lt;b&gt;mds1# cat /proc/fs/lustre/ldlm/namespaces/mdt-lstrFS-MDT0000_UUID/pool/state&lt;/b&gt;&lt;/div&gt;&lt;div class=&quot;panelContent&quot;&gt;
&lt;p&gt;LDLM pool state (ldlm-pool-mdt-lstrFS-MDT0000_UUID-1):&lt;br/&gt;
  SLV: 1&lt;br/&gt;
  CLV: 0&lt;br/&gt;
  LVF: 1&lt;br/&gt;
  GSP: 1%&lt;br/&gt;
  GP:  3011936&lt;br/&gt;
  GR:  33&lt;br/&gt;
  CR:  50&lt;br/&gt;
  GS:  -17&lt;br/&gt;
  G:   3549033&lt;br/&gt;
  L:   2409549&lt;/p&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Above, G (granted) &amp;gt; L (limit)?&lt;/p&gt;

&lt;p&gt;With your hint to look at the LDLM counters and my sharing the odd granted vs limit lock counts with a coworker, he came across the following &lt;a href=&quot;https://jira.hpdd.intel.com/browse/LU-8246&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://jira.hpdd.intel.com/browse/LU-8246&lt;/a&gt; which I believe matches what we&apos;re seeing.&lt;/p&gt;</comment>
                            <comment id="167418" author="niu" created="Tue, 27 Sep 2016 03:30:25 +0000"  >&lt;p&gt;Right, it&apos;s related to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8246&quot; title=&quot;Leaks on ldlm granted locks counter on MDS leading to canceling loop&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8246&quot;&gt;&lt;del&gt;LU-8246&lt;/del&gt;&lt;/a&gt;, that makes &apos;lock_granted_count&apos; an unreasonable high value, so server starts to reclaim locks and reject new locking requests. Another defect being revealed is that quota code doesn&apos;t handle such situation (quota intent lock is rejected with -EINPROGRESS) well,  I&apos;ll try to cook a patch soon.&lt;/p&gt;</comment>
                            <comment id="171503" author="niu" created="Fri, 28 Oct 2016 02:22:33 +0000"  >&lt;p&gt;Patch was uploaded at: &lt;a href=&quot;http://review.whamcloud.com/#/c/22751/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/22751/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="172530" author="gerrit" created="Mon, 7 Nov 2016 15:45:55 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/22751/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/22751/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8634&quot; title=&quot;2.8.0 MDS (layout.c:2025:__req_capsule_get()) @@@ Wrong buffer for field `quota_body&amp;#39; (3 of 1) in format `LDLM_INTENT_QUOTA&amp;#39;: 0 vs. 112 (server)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8634&quot;&gt;&lt;del&gt;LU-8634&lt;/del&gt;&lt;/a&gt; quota: fix return code of intent quota lock&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 11387730bac0ebb7940657ac6c463f4afd9b0fe8&lt;/p&gt;</comment>
                            <comment id="172617" author="pjones" created="Mon, 7 Nov 2016 20:18:28 +0000"  >&lt;p&gt;Landed for 2.9&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="37423">LU-8246</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzypdj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>