<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:04:00 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-13763] ptlrpc_invalidate_import()) lsrza-OST0000_UUID: Unregistering RPCs found (0). Network is sluggish? Waiting them to error out.</title>
                <link>https://jira.whamcloud.com/browse/LU-13763</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Console log messages following this pattern, repeatedly, for several days:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;LustreError: 67801:0:(import.c:361:ptlrpc_invalidate_import()) lsrza-OST0000_UUID: rc = -110 waiting for callback (1 != 0)
LustreError: 67801:0:(import.c:387:ptlrpc_invalidate_import()) @@@ still on sending list&#160; req@ffff8c3eb65f7500 x1669124751850560/t0(0) o4-&amp;gt;lsrza-OST0000-osc-ffff8c44c608a000@172.21.3.5@o2ib700:6/4 lens 488/448 e 2 to 0 dl 1592847228 ref 1 fl Interpret:E/0/ffffffff rc -5/-1
LustreError: 67801:0:(import.c:401:ptlrpc_invalidate_import()) lsrza-OST0000_UUID: Unregistering RPCs found (0). Network is sluggish? Waiting them to error out.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Note that the number in the parentheses is 0.  This refers to imp-&amp;gt;imp_unregistering, an atomic variable that looks like it is intended to track the number of RPC buffers we ware waiting for the underlying network to unregister so know that no data will be lost.  But there is still one RPC on the sending list, so why is imp_unregistering 0?&lt;/p&gt;</description>
                <environment>TOSS 3.6-3 / RH78&lt;br/&gt;
in-kernel OFED&lt;br/&gt;
3.10.0-1127.8.2.1chaos.ch6.x86_64&lt;br/&gt;
lustre-2.12.4_6.chaos-1.ch6.x86_64&lt;br/&gt;
</environment>
        <key id="59887">LU-13763</key>
            <summary>ptlrpc_invalidate_import()) lsrza-OST0000_UUID: Unregistering RPCs found (0). Network is sluggish? Waiting them to error out.</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="tappro">Mikhail Pershin</assignee>
                                    <reporter username="ofaaland">Olaf Faaland</reporter>
                        <labels>
                            <label>llnl</label>
                    </labels>
                <created>Wed, 8 Jul 2020 19:27:24 +0000</created>
                <updated>Thu, 26 Nov 2020 18:58:25 +0000</updated>
                            <resolved>Sat, 12 Sep 2020 15:56:40 +0000</resolved>
                                    <version>Lustre 2.12.4</version>
                                    <fixVersion>Lustre 2.14.0</fixVersion>
                    <fixVersion>Lustre 2.12.6</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="274783" author="ofaaland" created="Wed, 8 Jul 2020 19:28:36 +0000"  >&lt;p&gt;Similar to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13303&quot; title=&quot;(import.c:361:ptlrpc_invalidate_import()) nbp1-OST0016_UUID: rc = -110 waiting for callback (1 != 0)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13303&quot;&gt;&lt;del&gt;LU-13303&lt;/del&gt;&lt;/a&gt; but in this case imp_unregistering is 0.&lt;/p&gt;</comment>
                            <comment id="274820" author="ofaaland" created="Thu, 9 Jul 2020 01:28:01 +0000"  >&lt;p&gt;For my tracking, my internal issue is TOSS4833&lt;/p&gt;</comment>
                            <comment id="274885" author="pjones" created="Thu, 9 Jul 2020 15:04:01 +0000"  >&lt;p&gt;Mike&lt;/p&gt;

&lt;p&gt;Could you please advise&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="275313" author="ofaaland" created="Tue, 14 Jul 2020 04:57:38 +0000"  >&lt;p&gt;The node where I saw this is still in this state.&lt;/p&gt;

&lt;p&gt;While looking into &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13766&quot; title=&quot;tgt_grant_check() lsrza-OST000a: cli dfdf1aff-07d9-53b3-5632-c18a78027eb2 claims 1703936 GRANT, real grant 0&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13766&quot;&gt;&lt;del&gt;LU-13766&lt;/del&gt;&lt;/a&gt;, I found that the node with this symptom also has a cur_grant_bytes that is weirdly large; I wonder if it underflowed.  I don&apos;t know if this is related or not, but it seems an unlikely coincidence, so here it is.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@rzgenie28:~]# lctl get_param -n osc.*OST0000*.cur_grant_bytes
18446744073707847680
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

</comment>
                            <comment id="275314" author="ofaaland" created="Tue, 14 Jul 2020 05:09:01 +0000"  >&lt;p&gt;I&apos;ve attached console and debug logs.&lt;/p&gt;</comment>
                            <comment id="275376" author="tappro" created="Tue, 14 Jul 2020 17:27:22 +0000"  >&lt;p&gt;Thanks, Olaf. I am checking logs right now.&lt;/p&gt;</comment>
                            <comment id="275382" author="ofaaland" created="Tue, 14 Jul 2020 18:43:29 +0000"  >&lt;p&gt;In osc_init_grant(), cl_avail_grant() will underflow if &lt;br/&gt;
cl_reserved_grant + (cl_dirty_grant OR cl_dirty_pages&amp;lt;&amp;lt;PAGE_SHIFT) &amp;gt; ocd_grant&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;        cli-&amp;gt;cl_avail_grant = ocd-&amp;gt;ocd_grant;
        if (cli-&amp;gt;cl_import-&amp;gt;imp_state != LUSTRE_IMP_EVICTED) {
                cli-&amp;gt;cl_avail_grant -= cli-&amp;gt;cl_reserved_grant;
                if (OCD_HAS_FLAG(ocd, GRANT_PARAM))
                        cli-&amp;gt;cl_avail_grant -= cli-&amp;gt;cl_dirty_grant;
                else
                        cli-&amp;gt;cl_avail_grant -=
                                        cli-&amp;gt;cl_dirty_pages &amp;lt;&amp;lt; PAGE_SHIFT;
        }
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I don&apos;t know if there&apos;s something to prevent that underflow elsewhere.&lt;/p&gt;</comment>
                            <comment id="275433" author="gerrit" created="Wed, 15 Jul 2020 05:48:56 +0000"  >&lt;p&gt;Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/39380&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/39380&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13763&quot; title=&quot;ptlrpc_invalidate_import()) lsrza-OST0000_UUID: Unregistering RPCs found (0). Network is sluggish? Waiting them to error out.&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13763&quot;&gt;&lt;del&gt;LU-13763&lt;/del&gt;&lt;/a&gt; osc: don&apos;t allow negative grants&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: d707c6ee3a926b06ebb0c2648cb5f1e5a1aaf2d7&lt;/p&gt;</comment>
                            <comment id="275434" author="tappro" created="Wed, 15 Jul 2020 05:55:08 +0000"  >&lt;p&gt;Olaf, I&apos;ve made patch to prevent that underflow. It is worth to do in any case because &lt;tt&gt;ocd_grant&lt;/tt&gt; is received from server so it shouldn&apos;t be trusted blindly to be always greater than local consumed grants.&lt;/p&gt;

&lt;p&gt;E.g. in conjunction with &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12687&quot; title=&quot;Fast ENOSPC on direct I/O&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12687&quot;&gt;&lt;del&gt;LU-12687&lt;/del&gt;&lt;/a&gt;  that looks as real case.&lt;/p&gt;</comment>
                            <comment id="275501" author="ofaaland" created="Wed, 15 Jul 2020 18:12:09 +0000"  >&lt;p&gt;Mikhail,&lt;/p&gt;

&lt;p&gt;I have the node in this state drained.  Should I keep it that way in case you want me to gather information from it, or should I go ahead and crash it and put it back into service?  I won&apos;t be able to send you the crash dump, but I could extract information from the dump for you, potentially (although my crash skills are not great).&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;</comment>
                            <comment id="275800" author="ofaaland" created="Mon, 20 Jul 2020 18:56:55 +0000"  >&lt;p&gt;Hi Mikhail,&lt;br/&gt;
Can I bounce that node?&lt;br/&gt;
Do you have any thoughts on whether the stuck import was really related to the grant underflow?&lt;br/&gt;
Do you have any idea why imp_unregistering was 0 when there was still one RPC on the sending list?&lt;br/&gt;
Thanks!&lt;/p&gt;</comment>
                            <comment id="275823" author="tappro" created="Mon, 20 Jul 2020 22:36:29 +0000"  >&lt;p&gt;Olaf, I have no good idea about what to get from that node right now, so you can bounce it. I don&apos;t see how grant underflow could occur due to this particular situation - for that server should consider that client has less grants than it really has already. I&apos;d say that such situation could occur due to DIO grants problem and then cause grants underflow later. Maybe there are other scenarios exist. As for imp_unregistering 0 I think it is OK if request is in sending list - it is waiting for reply still, so is not unregistered yet and imp counter is still 0.&lt;/p&gt;</comment>
                            <comment id="278092" author="ofaaland" created="Wed, 26 Aug 2020 00:34:13 +0000"  >&lt;p&gt;I rebased the patch as it was too old to be retested.&lt;/p&gt;</comment>
                            <comment id="278474" author="ofaaland" created="Tue, 1 Sep 2020 06:43:07 +0000"  >&lt;p&gt;The patch passes testing now.  I posted a comment listing all the failures.  It never failed a test twice, every test that failed passed when that test group was re-tested, and the failures didn&apos;t look to me like grant accounting problems.&lt;/p&gt;</comment>
                            <comment id="278848" author="gerrit" created="Fri, 4 Sep 2020 07:15:49 +0000"  >&lt;p&gt;Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/39827&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/39827&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13763&quot; title=&quot;ptlrpc_invalidate_import()) lsrza-OST0000_UUID: Unregistering RPCs found (0). Network is sluggish? Waiting them to error out.&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13763&quot;&gt;&lt;del&gt;LU-13763&lt;/del&gt;&lt;/a&gt; osc: don&apos;t allow negative grants&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: bd9497c1690b9c4d2358a287277d1d67864d6735&lt;/p&gt;</comment>
                            <comment id="279445" author="gerrit" created="Sat, 12 Sep 2020 15:46:40 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/39827/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/39827/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13763&quot; title=&quot;ptlrpc_invalidate_import()) lsrza-OST0000_UUID: Unregistering RPCs found (0). Network is sluggish? Waiting them to error out.&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13763&quot;&gt;&lt;del&gt;LU-13763&lt;/del&gt;&lt;/a&gt; osc: don&apos;t allow negative grants&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: e05ccafd6ee214895d01efbb13a3757e3625a859&lt;/p&gt;</comment>
                            <comment id="279454" author="pjones" created="Sat, 12 Sep 2020 15:56:40 +0000"  >&lt;p&gt;Landed for 2.14&lt;/p&gt;</comment>
                            <comment id="279587" author="gerrit" created="Tue, 15 Sep 2020 05:09:35 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/39380/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/39380/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13763&quot; title=&quot;ptlrpc_invalidate_import()) lsrza-OST0000_UUID: Unregistering RPCs found (0). Network is sluggish? Waiting them to error out.&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13763&quot;&gt;&lt;del&gt;LU-13763&lt;/del&gt;&lt;/a&gt; osc: don&apos;t allow negative grants&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: f96aa90548f062e95d2ef4c9ea978ba0e08aae19&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                                        </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                                        </outwardlinks>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="35401" name="console.rzgenie28" size="1390341" author="ofaaland" created="Tue, 14 Jul 2020 05:08:32 +0000"/>
                            <attachment id="35403" name="console.rzgenie28-20200619.gz" size="240401" author="ofaaland" created="Tue, 14 Jul 2020 05:08:30 +0000"/>
                            <attachment id="35402" name="console.rzgenie28-20200705.gz" size="123403" author="ofaaland" created="Tue, 14 Jul 2020 05:08:30 +0000"/>
                            <attachment id="35400" name="dk.rzgenie28.1594702860" size="13320656" author="ofaaland" created="Tue, 14 Jul 2020 05:08:36 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i014on:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>