<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:07:03 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-14124] super slow i/o on client maybe related to low grant</title>
                <link>https://jira.whamcloud.com/browse/LU-14124</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;we&apos;re seeing slow i/o (~kB/s) to some OSTs from some clients. this seems to be correlated with low grant on the client.&lt;/p&gt;

&lt;p&gt;here is a reproducer that sets up striped dirs (one per OST) and then does a small dd to each. &lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;for f in {0..19}; do mkdir $f ; lfs setstripe -i $f $f; done
for f in {0..19}; do echo $f ; dd if=/tmp/50m of=$f/blah9i &amp;amp; done
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;where /tmp/50m is a almost incompressible file of size 64149552 bytes.&lt;/p&gt;

&lt;p&gt;a few seconds later, some dd&apos;s have finished quickly, and others will literally take 6 hours&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[me@farnarkle2 test]$ ls -l */blah9i | sort -n -k9
-rw-r--r-- 1 me oz997 64149552 Nov  6 18:56 0/blah9i
-rw-r--r-- 1 me oz997   423424 Nov  6 18:58 1/blah9i
-rw-r--r-- 1 me oz997 64149552 Nov  6 18:56 2/blah9i
-rw-r--r-- 1 me oz997   409600 Nov  6 18:58 3/blah9i
-rw-r--r-- 1 me oz997   409088 Nov  6 18:58 4/blah9i
-rw-r--r-- 1 me oz997   253440 Nov  6 18:58 5/blah9i
-rw-r--r-- 1 me oz997 64149552 Nov  6 18:56 6/blah9i
-rw-r--r-- 1 me oz997 64149552 Nov  6 18:56 7/blah9i
-rw-r--r-- 1 me oz997   158720 Nov  6 18:58 8/blah9i
-rw-r--r-- 1 me oz997   570368 Nov  6 18:58 9/blah9i
-rw-r--r-- 1 me oz997   574464 Nov  6 18:58 10/blah9i
-rw-r--r-- 1 me oz997   569344 Nov  6 18:58 11/blah9i
-rw-r--r-- 1 me oz997   379904 Nov  6 18:58 12/blah9i
-rw-r--r-- 1 me oz997 64149552 Nov  6 18:56 13/blah9i
-rw-r--r-- 1 me oz997 64149552 Nov  6 18:56 14/blah9i
-rw-r--r-- 1 me oz997 64149552 Nov  6 18:56 15/blah9i
-rw-r--r-- 1 me oz997    84992 Nov  6 18:58 16/blah9i
-rw-r--r-- 1 me oz997 64149552 Nov  6 18:56 17/blah9i
-rw-r--r-- 1 me oz997 64149552 Nov  6 18:56 18/blah9i
-rw-r--r-- 1 me oz997   389120 Nov  6 18:58 19/blah9i
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;the slow i/o seems to mostly but not completely correlate with low grant on the client&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[me@farnarkle2 test]$ grep -r . /proc/fs/lustre/osc/dagg-OST*/*grant*
/proc/fs/lustre/osc/dagg-OST0000-osc-ffff9f785aa2a800/cur_grant_bytes:3114734
/proc/fs/lustre/osc/dagg-OST0001-osc-ffff9f785aa2a800/cur_grant_bytes:1832833
/proc/fs/lustre/osc/dagg-OST0002-osc-ffff9f785aa2a800/cur_grant_bytes:1881800703
/proc/fs/lustre/osc/dagg-OST0003-osc-ffff9f785aa2a800/cur_grant_bytes:2050095
/proc/fs/lustre/osc/dagg-OST0004-osc-ffff9f785aa2a800/cur_grant_bytes:1902694
/proc/fs/lustre/osc/dagg-OST0005-osc-ffff9f785aa2a800/cur_grant_bytes:1946593
/proc/fs/lustre/osc/dagg-OST0006-osc-ffff9f785aa2a800/cur_grant_bytes:1879834623
/proc/fs/lustre/osc/dagg-OST0007-osc-ffff9f785aa2a800/cur_grant_bytes:1876295679
/proc/fs/lustre/osc/dagg-OST0008-osc-ffff9f785aa2a800/cur_grant_bytes:1867376
/proc/fs/lustre/osc/dagg-OST0009-osc-ffff9f785aa2a800/cur_grant_bytes:2035056
/proc/fs/lustre/osc/dagg-OST000a-osc-ffff9f785aa2a800/cur_grant_bytes:1674002
/proc/fs/lustre/osc/dagg-OST000b-osc-ffff9f785aa2a800/cur_grant_bytes:1790763
/proc/fs/lustre/osc/dagg-OST000c-osc-ffff9f785aa2a800/cur_grant_bytes:1979790
/proc/fs/lustre/osc/dagg-OST000d-osc-ffff9f785aa2a800/cur_grant_bytes:30273
/proc/fs/lustre/osc/dagg-OST000e-osc-ffff9f785aa2a800/cur_grant_bytes:60219801
/proc/fs/lustre/osc/dagg-OST000f-osc-ffff9f785aa2a800/cur_grant_bytes:251557488
/proc/fs/lustre/osc/dagg-OST0010-osc-ffff9f785aa2a800/cur_grant_bytes:1891954
/proc/fs/lustre/osc/dagg-OST0011-osc-ffff9f785aa2a800/cur_grant_bytes:33611568
/proc/fs/lustre/osc/dagg-OST0012-osc-ffff9f785aa2a800/cur_grant_bytes:9298722
/proc/fs/lustre/osc/dagg-OST0013-osc-ffff9f785aa2a800/cur_grant_bytes:1591979
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;this effect is most noticeable with the above default bs=512 byte dd. the slow files are writing at a few kB/s. the normal OSTs write this small i/o at ballpark 150 MB/s.&lt;br/&gt;
with dd bs=1M it&apos;s the same slow OSTs, but they run at a minimum of ~1 MB/s. normal OSTs write at ballpark 700 MB/s with 1M i/o&apos;s.&lt;br/&gt;
so it kinda feels like ops/second or rcps_in_flight is being limited rather than absolute MB/s, if you know what I mean.&lt;/p&gt;

&lt;p&gt;from other client nodes with larger grant showing, we don&apos;t have the same issues.&lt;br/&gt;
the above client is a login node, so has a lot of random stuff going on.&lt;/p&gt;

&lt;p&gt;rebooting the client, or rebooting the servers with the OSTs involved seems to clear the problem, but I don&apos;t have any other workarounds. I can&apos;t seem to find any grant tweakables on client or server, or in the manual, but perhaps I&apos;m missing something.&lt;/p&gt;

&lt;p&gt;there are ~250 client nodes. OSTs are very large at ~300T. OSS ram is 192g, but we&apos;ve told zfs ARC to use 75% of it. reducing zfs_arc_max on the fly to try to give more OSS ram to lustre doesn&apos;t seem to immediately help.&lt;/p&gt;

&lt;p&gt;we updated lustre on servers from 2.12.4 + zfs 0.8.3 to 2.12.5 + zfs 0.8.5 this week to see if this issue was resolved in 2.12.5, but the issue still seems to be there. clients have been 2.12.5 for ages.&lt;/p&gt;

&lt;p&gt;I forget if lustre grant is related to quota. anyway, this is a group quota&apos;d filesystem and the above oz997 group is not close.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Disk quotas for grp oz997 (gid 10273):
     Filesystem    used   quota   limit   grace   files   quota   limit   grace
          /fred  4.859T      0k     10T       -  609477       0 2000000       -
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;any ideas how to debug this further or for a less crappy workaround than rebooting things?&lt;br/&gt;
thanks.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</description>
                <environment>centos7, x86_64, OPA, lustre 2.12.5, zfs 0.8.5</environment>
        <key id="61569">LU-14124</key>
            <summary>super slow i/o on client maybe related to low grant</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="tappro">Mikhail Pershin</assignee>
                                    <reporter username="scadmin">SC Admin</reporter>
                        <labels>
                    </labels>
                <created>Fri, 6 Nov 2020 09:04:19 +0000</created>
                <updated>Wed, 9 Mar 2022 17:38:16 +0000</updated>
                            <resolved>Wed, 3 Nov 2021 03:33:47 +0000</resolved>
                                    <version>Lustre 2.12.5</version>
                                    <fixVersion>Lustre 2.12.8</fixVersion>
                    <fixVersion>Lustre 2.15.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>13</watches>
                                                                            <comments>
                            <comment id="284656" author="scadmin" created="Mon, 9 Nov 2020 06:50:29 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;I looked in lustre-discuss and found something similar from Mar, and a follow up from Oct.&lt;/p&gt;

&lt;p&gt;it looks like the current work-around is to to set grant_shrink=0 and apply some patches to make this stick.&lt;br/&gt;
is that correct?&lt;/p&gt;

&lt;p&gt;happy to hold off on this if you have better suggestions, or if eg. 2.12.6 is imminent with a real fix.&lt;/p&gt;

&lt;p&gt;thanks.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;
</comment>
                            <comment id="284730" author="pjones" created="Mon, 9 Nov 2020 18:37:26 +0000"  >&lt;p&gt;Mike&lt;/p&gt;

&lt;p&gt;Could you please advise?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="284737" author="adilger" created="Mon, 9 Nov 2020 19:02:07 +0000"  >&lt;p&gt;This sounds very similar to the issues being discussed in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14125&quot; title=&quot;client starved for grant but OST has plenty of free space&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14125&quot;&gt;LU-14125&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It appears that the problem is caused by the client-side grant amount overflowing, so that the client has a small grant, but the server thinks that it has a large grant, so doesn&apos;t give out any more grant to the client, forcing it to issue synchronous writes to the server.&lt;/p&gt;</comment>
                            <comment id="286366" author="adilger" created="Tue, 1 Dec 2020 15:27:15 +0000"  >&lt;p&gt;Were you able to test the patches from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14125&quot; title=&quot;client starved for grant but OST has plenty of free space&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14125&quot;&gt;LU-14125&lt;/a&gt; to see if that addresses the grant mismatch issue?&lt;/p&gt;</comment>
                            <comment id="286450" author="scadmin" created="Wed, 2 Dec 2020 05:37:29 +0000"  >&lt;p&gt;Hi Andreas,&lt;/p&gt;

&lt;p&gt;thanks for following up.&lt;/p&gt;

&lt;p&gt;we&apos;ve had 2.12.5 clients patched with &quot;osc: prevent overflow of o_dropped&quot; since approx Nov 15, and since then some of those have hit the slow i/o issue.&lt;br/&gt;
so that patch didn&apos;t fix the problem for us.&lt;/p&gt;

&lt;p&gt;I built the current version of the patch (at the time) which was CWARN instead of CDEBUG. there have been no o_dropped messages in syslog.&lt;br/&gt;
they&apos;d have gone to syslog I presume?&lt;/p&gt;

&lt;p&gt;we shutdown all clients and servers for a downtime last week. we&apos;ve been back up for 3 days now. since then the slow i/o issue hasn&apos;t re-appeared. we normally see it quite quickly - a couple of nodes a day - so that&apos;s a bit odd. perhaps the whole system shutdown (rather than just our usual rolling reboots of clients or servers) has wiped out some hysteresis?&lt;/p&gt;

&lt;p&gt;during the downtime were going to add the &quot;obdclass: add grant fields to export procfile&quot; patch to the server&apos;s 2.12.5, but unfortunately we ran out of time.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="292026" author="scadmin" created="Tue, 16 Feb 2021 04:34:21 +0000"  >&lt;p&gt;Hiya,&lt;/p&gt;

&lt;p&gt;an update.&lt;/p&gt;

&lt;p&gt;we turned off grant_shrink on clients 3 weeks ago as per Olaf&apos;s workaround in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14125?focusedCommentId=290273&amp;amp;page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-290273&quot; class=&quot;external-link&quot; rel=&quot;nofollow&quot;&gt;https://jira.whamcloud.com/browse/LU-14125?focusedCommentId=290273&amp;amp;page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-290273&lt;/a&gt; and that&apos;s working well so far. I haven&apos;t detected any slow i/o since then.&lt;/p&gt;

&lt;p&gt;we have 2.12.6 on servers and clients now, with the o_dropped patch on client side, and grant_show patch on server side.&lt;/p&gt;

&lt;p&gt;the grant numbers on clients in general don&apos;t match those on the servers, but they aren&apos;t wrong by much. when grants are summed across a subset of 121 clients they differ by about 1% - servers report ~4gb more grant than clients, out of total 3tb on one occasion and 4tb on another.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="299246" author="gerrit" created="Mon, 19 Apr 2021 21:59:15 +0000"  >&lt;p&gt;Vladimir Saveliev (c17830@cray.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/43375&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/43375&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14124&quot; title=&quot;super slow i/o on client maybe related to low grant&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14124&quot;&gt;&lt;del&gt;LU-14124&lt;/del&gt;&lt;/a&gt; target: set OBD_MD_FLGRANT in read&apos;s reply&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 3b5341a2f877d9a75bedb4bb64dff4624c10787c&lt;/p&gt;</comment>
                            <comment id="299249" author="vsaveliev" created="Mon, 19 Apr 2021 22:19:49 +0000"  >&lt;p&gt;The grant accounting disbalance could be caused by tgt_brw_read() which does not set OBD_MD_FLGRANT flag in reply:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (body-&amp;gt;oa.o_valid &amp;amp; OBD_MD_FLCKSUM) {
..
                repbody-&amp;gt;oa.o_valid = OBD_MD_FLCKSUM | OBD_MD_FLFLAGS;
..
        } &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt; {
                repbody-&amp;gt;oa.o_valid = 0;
        }
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Then osc_update_grant() does not update client&apos;s grant counter:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
&lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; void osc_update_grant(struct client_obd *cli, struct ost_body *body)
{
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (body-&amp;gt;oa.o_valid &amp;amp; OBD_MD_FLGRANT) {
                CDEBUG(D_CACHE, &lt;span class=&quot;code-quote&quot;&gt;&quot;got %llu extra grant\n&quot;&lt;/span&gt;, body-&amp;gt;oa.o_grant);
		__osc_update_grant(cli, body-&amp;gt;oa.o_grant);
        }
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If on server side tgt_grant_shrink() decides to not shrink grants and wants a client to recover grant counter&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
         &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (left_space &amp;gt;= tgd-&amp;gt;tgd_tot_granted_clients *
                          TGT_GRANT_SHRINK_LIMIT(exp))
                &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt;;
..
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;we get the case where server did not decrease export&apos;s grant counter, but client did.&lt;/p&gt;

&lt;p&gt;After several shrinks on read it may appear that client thinks that it owns only few grants, while server assumes that the client got max amount of grants already and limits the client with new grants. That may lead to performance degradation.&lt;/p&gt;</comment>
                            <comment id="300082" author="scadmin" created="Thu, 29 Apr 2021 09:38:24 +0000"  >&lt;p&gt;thanks!&lt;br/&gt;
we&apos;ve applied that patch to a couple of clients and will see how it goes.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="301759" author="scadmin" created="Mon, 17 May 2021 09:41:16 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;we have a few clients with 2.12.6 +&lt;br/&gt;
&lt;a href=&quot;https://review.whamcloud.com/#/c/40615/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/#/c/40615/&lt;/a&gt;   &quot;osc: prevent overflow of o_dropped&quot;&lt;br/&gt;
&lt;a href=&quot;https://review.whamcloud.com/#/c/43375/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/#/c/43375/&lt;/a&gt;   &quot;target: set OBD_MD_FLGRANT in read&apos;s reply&quot;&lt;/p&gt;

&lt;p&gt;hmm, should we also have this one? looks like I missed it...&lt;br/&gt;
&lt;a href=&quot;https://review.whamcloud.com/#/c/42129/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/#/c/42129/&lt;/a&gt;  &quot;target: prevent overflowing of tgd-&amp;gt;tgd_tot_granted&quot;&lt;/p&gt;

&lt;p&gt;servers have 2.12.6 +&lt;br/&gt;
&lt;a href=&quot;https://review.whamcloud.com/#/c/40563/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/#/c/40563/&lt;/a&gt;  &quot;obdclass: add grant fields to export procfile&quot;&lt;/p&gt;

&lt;p&gt;below is the state of a node (with grant_shrink on) that&apos;s had lots of different jobs on it for about 3 days, but is now idle ie. no i/o or open files on this filesystem.&lt;/p&gt;

&lt;p&gt;columns are ost, what the server thinks the grant is, what the client thinks the grant is, and server-client. ideally the last column would be zeros.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;john65 192.168.44.165
   ost     svr_grant    client_grant     svr_grant-client_grant
 OST0000   1879048191   1055293440    823754751
 OST0001   1879048191   1879048191            0
 OST0002      2097152      2097152            0
 OST0003   1881800703   1881800703            0
 OST0004   1881800703   1881800703            0
 OST0005   1879048191    591069696   1287978495
 OST0006   1870790655   1870790655            0
 OST0007      2097152      2097152            0
 OST0008   1884553215   1414397952    470155263
 OST0009   1876295679   1876295679            0
 OST000a      2097152      2097152            0
 OST000b   1876295679   1876295679            0
 OST000c   1884553215   1414791168    469762047
 OST000d   1881800703   1411842048    469958655
 OST000e   1881800703   1411350528    470450175
 OST000f      2097152      2097152            0
 OST0010   1880489983   1880489983            0
 OST0011   1881800703   1411645440    470155263
 OST0012   1884553215    795617280   1088935935
 OST0013   1870790655   1870790655            0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;the server vs. client grant imbalance tends to grow over time for each OST as the node runs more jobs, but (AFAICT) they get reset to zero again each time an OST disconnects and reconnects to the client (after what looks to be an hour of no i/o to an OST).&lt;/p&gt;

&lt;p&gt;the OSTs with 0 in the last column above tend to be the ones that have reconnected most recently.&lt;/p&gt;

&lt;p&gt;over May so far, this node has had 82 disconnect/reconnect events across the various OSTs.&lt;br/&gt;
server grant - client grant was 0 before the disconnect in 61 of these cases, so grant accounting was correct.&lt;/p&gt;

&lt;p&gt;in the other 21 OST disconnect cases, an average of 673223605.9 bytes of grant was lost for each one. I guess this means grant accounting wasn&apos;t ok in those cases.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="317282" author="gerrit" created="Wed, 3 Nov 2021 02:37:57 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/43375/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/43375/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14124&quot; title=&quot;super slow i/o on client maybe related to low grant&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14124&quot;&gt;&lt;del&gt;LU-14124&lt;/del&gt;&lt;/a&gt; target: set OBD_MD_FLGRANT in read&apos;s reply&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 4894683342d77964daeded9fbc608fc46aa479ee&lt;/p&gt;</comment>
                            <comment id="317309" author="pjones" created="Wed, 3 Nov 2021 03:33:47 +0000"  >&lt;p&gt;Landed for 2.15&lt;/p&gt;</comment>
                            <comment id="317571" author="gerrit" created="Fri, 5 Nov 2021 20:34:21 +0000"  >&lt;p&gt;&quot;Mike Pershin &amp;lt;mpershin@whamcloud.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/45471&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/45471&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14124&quot; title=&quot;super slow i/o on client maybe related to low grant&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14124&quot;&gt;&lt;del&gt;LU-14124&lt;/del&gt;&lt;/a&gt; target: set OBD_MD_FLGRANT in read&apos;s reply&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 82f5c0b6581f503b0c41e0e24fa135c9c0ce01c8&lt;/p&gt;</comment>
                            <comment id="318211" author="gerrit" created="Sun, 14 Nov 2021 05:45:36 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/45471/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/45471/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14124&quot; title=&quot;super slow i/o on client maybe related to low grant&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14124&quot;&gt;&lt;del&gt;LU-14124&lt;/del&gt;&lt;/a&gt; target: set OBD_MD_FLGRANT in read&apos;s reply&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 557d7d35e30b1f602104026ae0b024cee1abb494&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                        <issuelink>
            <issuekey id="61579">LU-14125</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="65465">LU-14901</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="46830">LU-9704</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i01ekf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>