<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:27:24 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-16483] Loss of idle ping causes reconnect even if subsequent ping succeeds</title>
                <link>https://jira.whamcloud.com/browse/LU-16483</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;There seems to be some flaw in the idle client ping interval, but maybe I&apos;m just missing something. An idle client sends an OBD_PING to a target every ping_interval seconds (default obd_timeout / 4 = 25 seconds), but there is no consideration of the RPC timeout, so you can end up with multiple ones in flight at same time if RPC timeout is &amp;gt; 25 seconds.&lt;br/&gt;
This can lead to odd behavior. For example, if I drop a single OBD ping on the server, then subsequent OBD pings may succeed, but when the one that was dropped hits timeout this causes a reconnect.&lt;/p&gt;

&lt;p&gt;ping1 -&amp;gt; dropped&lt;br/&gt;
&amp;lt;25 seconds later&amp;gt;&lt;br/&gt;
ping2 -&amp;gt; succeeds&lt;br/&gt;
&amp;lt;some time later&amp;gt;&lt;br/&gt;
ping1 hits timeout, and causes client reconnect&lt;/p&gt;

&lt;p&gt;Example showing 6 pings in flight before the first one hits timeout.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000100:00000040:12.0:1667595564.976159:0:13921:0:(niobuf.c:939:ptl_send_rpc()) @@@ send flags=0  req@000000003cb0eae1 x1748600553999104/t0(0) o400-&amp;gt;lustre-OST0000-osc-ffff9e68576b3800@16@kfi:28/4 lens 224/224 e 0 to 0 dl 1667595717 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:&apos;&apos;
00000100:00000040:12.0:1667595591.600170:0:13918:0:(niobuf.c:939:ptl_send_rpc()) @@@ send flags=0  req@00000000cf5effe6 x1748600553999424/t0(0) o400-&amp;gt;lustre-OST0000-osc-ffff9e68576b3800@16@kfi:28/4 lens 224/224 e 0 to 0 dl 1667595744 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:&apos;&apos;
00000100:00000040:8.0:1667595618.224202:0:13914:0:(niobuf.c:939:ptl_send_rpc()) @@@ send flags=0  req@000000001e6c674d x1748600553999680/t0(0) o400-&amp;gt;lustre-OST0000-osc-ffff9e68576b3800@16@kfi:28/4 lens 224/224 e 0 to 0 dl 1667595771 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:&apos;&apos;
00000100:00000040:12.0:1667595644.848234:0:13918:0:(niobuf.c:939:ptl_send_rpc()) @@@ send flags=0  req@0000000031efedb5 x1748600553999936/t0(0) o400-&amp;gt;lustre-OST0000-osc-ffff9e68576b3800@16@kfi:28/4 lens 224/224 e 0 to 0 dl 1667595797 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:&apos;&apos;
00000100:00000040:8.0:1667595671.472178:0:13914:0:(niobuf.c:939:ptl_send_rpc()) @@@ send flags=0  req@00000000fe59e179 x1748600554000192/t0(0) o400-&amp;gt;lustre-OST0000-osc-ffff9e68576b3800@16@kfi:28/4 lens 224/224 e 0 to 0 dl 1667595824 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:&apos;&apos;
00000100:00000040:12.0:1667595698.096210:0:13918:0:(niobuf.c:939:ptl_send_rpc()) @@@ send flags=0  req@0000000022f3b477 x1748600554000448/t0(0) o400-&amp;gt;lustre-OST0000-osc-ffff9e68576b3800@16@kfi:28/4 lens 224/224 e 0 to 0 dl 1667595851 ref 2 fl Rpc:Nr/0/ffffffff rc 0/-1 job:&apos;&apos;
00000100:00000400:12.0:1667595717.552079:0:13921:0:(client.c:2308:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1667595564/real 1667595564]  req@000000003cb0eae1 x1748600553999104/t0(0) o400-&amp;gt;lustre-OST0000-osc-ffff9e68576b3800@16@kfi:28/4 lens 224/224 e 0 to 1 dl 1667595717 ref 1 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:&apos;&apos;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment></environment>
        <key id="74044">LU-16483</key>
            <summary>Loss of idle ping causes reconnect even if subsequent ping succeeds</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="anikitenko">Alena Nikitenko</assignee>
                                    <reporter username="hornc">Chris Horn</reporter>
                        <labels>
                    </labels>
                <created>Tue, 17 Jan 2023 19:27:21 +0000</created>
                <updated>Fri, 26 Jan 2024 21:19:31 +0000</updated>
                            <resolved>Thu, 27 Jul 2023 12:38:06 +0000</resolved>
                                    <version>Lustre 2.16.0</version>
                                    <fixVersion>Lustre 2.16.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="359887" author="adilger" created="Fri, 20 Jan 2023 17:07:09 +0000"  >&lt;p&gt;Seems like a real bug. The whole point of sending pings with obd_timeout/4?is to ensure at least one makes it through before the timeout, and to catch server failure early enough to reconnect during the recovery window. &lt;/p&gt;

&lt;p&gt;Seems like the later ping should save the successful ping XID (or any successful RPC XID) in the import, and the later timeout of the earlier ping (with lower XID) should be ignored. &lt;/p&gt;</comment>
                            <comment id="360753" author="gerrit" created="Fri, 27 Jan 2023 22:20:51 +0000"  >&lt;p&gt;&quot;Chris Horn &amp;lt;chris.horn@hpe.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/49807&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/49807&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16483&quot; title=&quot;Loss of idle ping causes reconnect even if subsequent ping succeeds&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16483&quot;&gt;&lt;del&gt;LU-16483&lt;/del&gt;&lt;/a&gt; ptlrpc: Track highest reply xid&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 68aa10cc110c7a66d9d2f7561d30208abfb8e5f4&lt;/p&gt;</comment>
                            <comment id="362305" author="gerrit" created="Thu, 9 Feb 2023 16:50:27 +0000"  >&lt;p&gt;&lt;del&gt;&quot;Chris Horn &amp;lt;chris.horn@hpe.com&amp;gt;&quot; uploaded a new patch:&lt;/del&gt; &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/49958&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/49958&lt;/a&gt;&lt;br/&gt;
&lt;del&gt;Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16483&quot; title=&quot;Loss of idle ping causes reconnect even if subsequent ping succeeds&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16483&quot;&gt;&lt;del&gt;LU-16483&lt;/del&gt;&lt;/a&gt; tests: Test patch&lt;/del&gt;&lt;br/&gt;
&lt;del&gt;Project: fs/lustre-release&lt;/del&gt;&lt;br/&gt;
&lt;del&gt;Branch: master&lt;/del&gt;&lt;br/&gt;
&lt;del&gt;Current Patch Set: 1&lt;/del&gt;&lt;br/&gt;
&lt;del&gt;Commit: afe29157cb586727ab57c9493fd1f1e2345a897f&lt;/del&gt;&lt;/p&gt;</comment>
                            <comment id="362912" author="gerrit" created="Wed, 15 Feb 2023 17:09:25 +0000"  >&lt;p&gt;&lt;del&gt;&quot;Chris Horn &amp;lt;chris.horn@hpe.com&amp;gt;&quot; uploaded a new patch:&lt;/del&gt; &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/50011&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/50011&lt;/a&gt;&lt;br/&gt;
&lt;del&gt;Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16483&quot; title=&quot;Loss of idle ping causes reconnect even if subsequent ping succeeds&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16483&quot;&gt;&lt;del&gt;LU-16483&lt;/del&gt;&lt;/a&gt; tests: Test patch&lt;/del&gt;&lt;br/&gt;
&lt;del&gt;Project: fs/lustre-release&lt;/del&gt;&lt;br/&gt;
&lt;del&gt;Branch: master&lt;/del&gt;&lt;br/&gt;
&lt;del&gt;Current Patch Set: 1&lt;/del&gt;&lt;br/&gt;
&lt;del&gt;Commit: 22f83229cf078550f13141dbdafcca75a4e6580b&lt;/del&gt;&lt;/p&gt;</comment>
                            <comment id="366970" author="gerrit" created="Wed, 22 Mar 2023 23:37:13 +0000"  >&lt;p&gt;&lt;del&gt;&quot;Chris Horn &amp;lt;chris.horn@hpe.com&amp;gt;&quot; uploaded a new patch:&lt;/del&gt; &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/50384&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/50384&lt;/a&gt;&lt;br/&gt;
&lt;del&gt;Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16483&quot; title=&quot;Loss of idle ping causes reconnect even if subsequent ping succeeds&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16483&quot;&gt;&lt;del&gt;LU-16483&lt;/del&gt;&lt;/a&gt; tests: Test patch&lt;/del&gt;&lt;br/&gt;
&lt;del&gt;Project: fs/lustre-release&lt;/del&gt;&lt;br/&gt;
&lt;del&gt;Branch: master&lt;/del&gt;&lt;br/&gt;
&lt;del&gt;Current Patch Set: 1&lt;/del&gt;&lt;br/&gt;
&lt;del&gt;Commit: a58c08fe7e7aa99989173d8753695d225a545493&lt;/del&gt;&lt;/p&gt;</comment>
                            <comment id="369710" author="gerrit" created="Tue, 18 Apr 2023 03:22:25 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/49807/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/49807/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16483&quot; title=&quot;Loss of idle ping causes reconnect even if subsequent ping succeeds&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16483&quot;&gt;&lt;del&gt;LU-16483&lt;/del&gt;&lt;/a&gt; ptlrpc: Track highest reply XID&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: eb1f4a5222039be9f728839ec8f9cde904a1273f&lt;/p&gt;</comment>
                            <comment id="369750" author="pjones" created="Tue, 18 Apr 2023 12:12:38 +0000"  >&lt;p&gt;Landed for 2.16&lt;/p&gt;</comment>
                            <comment id="370641" author="adilger" created="Wed, 26 Apr 2023 03:00:28 +0000"  >&lt;p&gt;Hi  Chris,&lt;br/&gt;
it looks like the newly-added replay-single test_200 is failing intermittently during testing since it landed to master on 2023-04-18, could you please take a look:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://testing.whamcloud.com/search?horizon=2332800&amp;amp;status%5B%5D=FAIL&amp;amp;test_set_script_id=f6a12204-32c3-11e0-a61c-52540025f9ae&amp;amp;sub_test_script_id=fea9d884-926c-4e41-b86d-9679d197c5f8&amp;amp;source=sub_tests#redirect&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/search?horizon=2332800&amp;amp;status%5B%5D=FAIL&amp;amp;test_set_script_id=f6a12204-32c3-11e0-a61c-52540025f9ae&amp;amp;sub_test_script_id=fea9d884-926c-4e41-b86d-9679d197c5f8&amp;amp;source=sub_tests#redirect&lt;/a&gt;&lt;/p&gt;
</comment>
                            <comment id="371413" author="hornc" created="Fri, 5 May 2023 17:06:45 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=adilger&quot; class=&quot;user-hover&quot; rel=&quot;adilger&quot;&gt;adilger&lt;/a&gt; Sure, I&apos;ll take a look.&lt;/p&gt;</comment>
                            <comment id="371414" author="gerrit" created="Fri, 5 May 2023 17:08:01 +0000"  >&lt;p&gt;&quot;Chris Horn &amp;lt;chris.horn@hpe.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/50869&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/50869&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16483&quot; title=&quot;Loss of idle ping causes reconnect even if subsequent ping succeeds&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16483&quot;&gt;&lt;del&gt;LU-16483&lt;/del&gt;&lt;/a&gt; tests: Test patch&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: caecbcb8daa3c34811cdfabaf5fddcc2f002df8c&lt;/p&gt;</comment>
                            <comment id="371466" author="hornc" created="Sat, 6 May 2023 20:09:08 +0000"  >&lt;p&gt;I think the issue is the test assumes that idle_timeout is set to some non-zero value for all the targets.&lt;/p&gt;</comment>
                            <comment id="371560" author="gerrit" created="Mon, 8 May 2023 18:21:16 +0000"  >&lt;p&gt;&quot;Chris Horn &amp;lt;chris.horn@hpe.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/50891&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/50891&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16483&quot; title=&quot;Loss of idle ping causes reconnect even if subsequent ping succeeds&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16483&quot;&gt;&lt;del&gt;LU-16483&lt;/del&gt;&lt;/a&gt; tests: replay-single test_200 fixes&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 8002abc25fd6754446790835aa56b8fd0972fde0&lt;/p&gt;</comment>
                            <comment id="376861" author="adilger" created="Thu, 29 Jun 2023 04:47:02 +0000"  >&lt;p&gt;Alena, would you be able to rebase Chris&apos; patch so that this issue can be fixed.&lt;/p&gt;</comment>
                            <comment id="376970" author="hornc" created="Thu, 29 Jun 2023 18:59:14 +0000"  >&lt;p&gt;Sorry for letting this languish, but I have cycles today to pick it up.&lt;/p&gt;</comment>
                            <comment id="380303" author="gerrit" created="Thu, 27 Jul 2023 07:19:55 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/50891/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/50891/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16483&quot; title=&quot;Loss of idle ping causes reconnect even if subsequent ping succeeds&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16483&quot;&gt;&lt;del&gt;LU-16483&lt;/del&gt;&lt;/a&gt; tests: replay-single test_200 fixes&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: fdfdf5c05cf64294068a5cbfe818b64bd9e577f9&lt;/p&gt;</comment>
                            <comment id="380353" author="pjones" created="Thu, 27 Jul 2023 12:38:06 +0000"  >&lt;p&gt;Landed for 2.16&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="75654">LU-16754</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i03a93:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>