<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:21:05 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-1949] SWL - mds wedges &apos;still busy with 1 RPC&apos; </title>
                <link>https://jira.whamcloud.com/browse/LU-1949</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Running SWL, MDS gradually goes into a wedged, clients get -EBUSY, MDS nevers clears stuck RPC. Rebooted MDS to recover. &lt;br/&gt;
Typical client:&lt;/p&gt;

&lt;p&gt;Sep 15 15:23:34 hyperion770 kernel: Lustre: 8865:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: &lt;span class=&quot;error&quot;&gt;&amp;#91;sent 1347747046/real 1347747046&amp;#93;&lt;/span&gt;  req@ffff880176204800 x1413191623533468/t0(0) o101-&amp;gt;lustre-MDT0000-mdc-ffff880339d11800@192.168.127.6@o2ib1:12/10 lens 592/1136 e 3 to 1 dl 1347747806 ref 2 fl Rpc:XP/0/ffffffff rc 0/-1&lt;br/&gt;
Sep 15 15:23:34 hyperion770 kernel: Lustre: lustre-MDT0000-mdc-ffff880339d11800: Connection to lustre-MDT0000 (at 192.168.127.6@o2ib1) was lost; in progress operations using this service will wait for recovery to complete&lt;br/&gt;
Sep 15 15:23:42 hyperion770 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.127.6@o2ib1. The mds_connect operation failed with -16&lt;br/&gt;
Sep 15 15:31:12 hyperion770 kernel: LustreError: Skipped 5 previous similar messages&lt;br/&gt;
,,,,,duplicate&lt;br/&gt;
Sep 15 15:35:47 hyperion770 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.127.6@o2ib1. The mds_connect operation failed with -16&lt;br/&gt;
Sep 15 15:35:48 hyperion770 kernel: LustreError: Skipped 10 previous similar messages&lt;br/&gt;
Sep 15 15:44:15 hyperion770 kernel: Lustre: 3342:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request  sent has failed due to network error: &lt;span class=&quot;error&quot;&gt;&amp;#91;sent 1347749019/real 1347749043&amp;#93;&lt;/span&gt;  req@ffff88014f8e2000 x1413191623538306/t0(0) o38-&amp;gt;lustre-MDT0000-mdc-ffff880339d11800@192.168.127.6@o2ib1:12/10 lens 400/544 e 0 to 1 dl 1347749069 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1&lt;br/&gt;
-------&lt;br/&gt;
MDS:&lt;/p&gt;

&lt;p&gt;Sep 15 15:21:09 hyperion-rst6 kernel: req@ffff880254359050 x1413191623533468/t0(0) o101-&amp;gt;d17e0f27-22a5-38fb-14c0-313655de63cd@192.168.117.51@o2ib1:0/0 lens 592/1152 e 3 to 0 dl 1347747674 ref 2 fl Interpret:/0/0 rc 0/0&lt;br/&gt;
Sep 15 15:21:09 hyperion-rst6 kernel: Lustre: 6988:0:(service.c:1260:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message&lt;br/&gt;
Sep 15 15:21:21 hyperion-rst6 kernel: Lustre: 6935:0:(service.c:1260:ptlrpc_at_send_early_reply()) @@@ Couldn&apos;t add any time (5/-23), not sending early reply&lt;br/&gt;
Sep 15 15:21:21 hyperion-rst6 kernel: req@ffff88016d1c6050 x1413191619837173/t0(0) o101-&amp;gt;821224b2-a0a9-0330-9862-198101015e30@192.168.116.111@o2ib1:0/0 lens 592/1152 e 3 to 0 dl 1347747685 ref 2 fl Interpret:/0/0 rc 0/0&lt;br/&gt;
Sep 15 15:21:50 hyperion-rst6 kernel: Lustre: 7165:0:(service.c:1260:ptlrpc_at_send_early_reply()) @@@ Couldn&apos;t add any time (5/-23), not sending early reply&lt;br/&gt;
Sep 15 15:21:50 hyperion-rst6 kernel: req@ffff880268a60050 x1413191612196075/t0(0) o35-&amp;gt;3c2d3508-87e2-4c0e-a013-11a2bf99635a@192.168.116.97@o2ib1:0/0 lens 392/4104 e 3 to 0 dl 1347747715 ref 2 fl Interpret:/0/0 rc 0/0&lt;br/&gt;
Sep 15 15:22:38 hyperion-rst6 kernel: Lustre: 7036:0:(service.c:1260:ptlrpc_at_send_early_reply()) @@@ Couldn&apos;t add any time (5/-23), not sending early reply&lt;br/&gt;
Sep 15 15:22:38 hyperion-rst6 kernel: req@ffff8801152fe450 x1413191612772893/t0(0) o101-&amp;gt;e8d4ba14-4c25-89b1-b0eb-af89de332d73@192.168.116.83@o2ib1:0/0 lens 592/1152 e 3 to 0 dl 1347747763 ref 2 fl Interpret:/0/0 rc 0/0&lt;br/&gt;
Sep 15 15:22:38 hyperion-rst6 kernel: Lustre: 7036:0:(service.c:1260:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages&lt;br/&gt;
Sep 15 15:23:05 hyperion-rst6 kernel: Lustre: lustre-MDT0000: Client 5dfb5850-e495-0463-45aa-51d17e91b47a (at 192.168.116.85@o2ib1) reconnecting&lt;br/&gt;
Sep 15 15:23:05 hyperion-rst6 kernel: Lustre: lustre-MDT0000: Client 5dfb5850-e495-0463-45aa-51d17e91b47a (at 192.168.116.85@o2ib1) refused reconnection, still busy with 1 active RPCs&lt;br/&gt;
Sep 15 15:23:08 hyperion-rst6 kernel: Lustre: lustre-MDT0000: Client a688d9d6-ffb7-c06d-f4d4-541f43b6f1c5 (at 192.168.116.84@o2ib1) reconnecting&lt;br/&gt;
Sep 15 15:23:08 hyperion-rst6 kernel: Lustre: lustre-MDT0000: Client a688d9d6-ffb7-c06d-f4d4-541f43b6f1c5 (at 192.168.116.84@o2ib1) refused reconnection, still busy with 2 active RPCs&lt;br/&gt;
Sep 15 15:23:21 hyperion-rst6 kernel: Lustre: lustre-MDT0000: Client 21e2654d-06e7-726f-ced3-f0c96e204093 (at 192.168.116.133@o2ib1) reconnecting&lt;br/&gt;
Sep 15 15:23:21 hyperion-rst6 kernel: Lustre: lustre-MDT0000: Client 21e2654d-06e7-726f-ced3-f0c96e204093 (at 192.168.116.133@o2ib1) refused reconnection, still busy with 1 active RPCs&lt;/p&gt;

&lt;p&gt;-----------&lt;br/&gt;
Took a log on MDS with debug= -1 while this was happening, attached.&lt;/p&gt;</description>
                <environment>SWL Hyperion LLNL</environment>
        <key id="15992">LU-1949</key>
            <summary>SWL - mds wedges &apos;still busy with 1 RPC&apos; </summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="bobijam">Zhenyu Xu</assignee>
                                    <reporter username="cliffw">Cliff White</reporter>
                        <labels>
                    </labels>
                <created>Sun, 16 Sep 2012 11:58:50 +0000</created>
                <updated>Thu, 2 Jul 2015 17:24:40 +0000</updated>
                            <resolved>Thu, 2 Jul 2015 17:24:40 +0000</resolved>
                                    <version>Lustre 2.3.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="44963" author="pjones" created="Sun, 16 Sep 2012 13:19:06 +0000"  >&lt;p&gt;Bobijam&lt;/p&gt;

&lt;p&gt;Could you please look into this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="45007" author="bobijam" created="Mon, 17 Sep 2012 06:24:35 +0000"  >&lt;p&gt;I think the MDS log is a little bit late for the scenario, I cannot find out why MDS was stucking with the RPC. Can you try to grab MDS logs when it is handling the to-be-timedout request? In this case capture what MDS has done to the client request of &quot;req@ffff880176204800 x1413191623533468/t0(0) o101-&amp;gt;lustre-MDT0000-mdc-ffff880339d11800@192.168.127.6@o2ib1&quot;&lt;/p&gt;</comment>
                            <comment id="45343" author="pjones" created="Fri, 21 Sep 2012 09:47:57 +0000"  >&lt;p&gt;Dropping priority as unable to reproduce&lt;/p&gt;</comment>
                            <comment id="45347" author="yong.fan" created="Fri, 21 Sep 2012 11:21:49 +0000"  >&lt;p&gt;There are some unfinished RPCs on the export which prevented client to reconnect. But I cannot find related RPC processing in the lustre-debug log. If there are &quot;ps&quot; log to show what the RPCs were, or stack trace to show what the RPC services thread were doing, then it is much helpful.&lt;/p&gt;

&lt;p&gt;Anyway, it seems not the duplication of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1976&quot; title=&quot;SWL - mds hard crash &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1976&quot;&gt;&lt;del&gt;LU-1976&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="45776" author="cliffw" created="Sat, 29 Sep 2012 21:41:19 +0000"  >&lt;p&gt;vmcore is at ~cliffw/lu1948/erofs on brent.&lt;/p&gt;</comment>
                            <comment id="120177" author="adilger" created="Thu, 2 Jul 2015 17:24:40 +0000"  >&lt;p&gt;Closing old bug. &lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="15952">LU-1934</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="11863" name="rst6.busy.dk.gz" size="3679480" author="cliffw" created="Sun, 16 Sep 2012 11:58:50 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzw0jr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>10169</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>