<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:20:35 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-1890] Test failure on test suite recovery-small, subtest test_101</title>
                <link>https://jira.whamcloud.com/browse/LU-1890</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;This issue was created by maloo for yujian &amp;lt;yujian@whamcloud.com&amp;gt;&lt;/p&gt;

&lt;p&gt;This issue relates to the following test suite run: &lt;a href=&quot;https://maloo.whamcloud.com/test_sets/ae7cd73a-f760-11e1-8b95-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/ae7cd73a-f760-11e1-8b95-52540035b04c&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Lustre Build: &lt;a href=&quot;http://build.whamcloud.com/job/lustre-b2_3/12&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://build.whamcloud.com/job/lustre-b2_3/12&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The sub-test test_101 failed with the following error:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;test failed to respond and timed out&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Info required for matching: recovery-small 101&lt;/p&gt;

&lt;p&gt;MGS/MDS Nodes: fat-intel-3vm3(10.10.4.88), fat-intel-3vm7(10.10.4.92)&lt;br/&gt;
OSS Nodes:     fat-intel-3vm4(10.10.4.89), fat-intel-3vm8(10.10.4.93)&lt;br/&gt;
Client Nodes:  fat-intel-3vm1(10.10.4.86), fat-intel-3vm5(10.10.4.90), fat-intel-3vm6(10.10.4.91)&lt;/p&gt;

&lt;p&gt;Syslog on OSS fat-intel-3vm8(10.10.4.93):&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Sep  4 18:14:34 fat-intel-3vm8 rshd[4135]: root@fat-intel-3vm1.lab.whamcloud.com as root: cmd=&apos;(PATH=$PATH:/usr/lib64/lustre/utils:/usr/lib64/lustre/tests:/sbin:/usr/sbin; cd /usr/lib64/lustre/tests; LUSTRE=&quot;/usr/lib64/lustre&quot; sh -c &quot;e2label /dev/lvm-OSS/P7 2&amp;gt;/dev/null&quot;);echo XXRETCODE:$?&apos;
Sep  4 18:14:35 fat-intel-3vm8 kernel: Lustre: lustre-OST0006: Will be in recovery for at least 1:00, or until 4 clients reconnect
Sep  4 18:14:39 fat-intel-3vm8 kernel: Lustre: lustre-OST0006: Recovery over after 0:04, of 4 clients 4 recovered and 0 were evicted.
Sep  4 18:14:39 fat-intel-3vm8 kernel: Lustre: lustre-OST0006: received MDS connection from 10.10.4.92@tcp
Sep  4 18:18:07 fat-intel-3vm8 kernel: Lustre: 2075:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1346807862/real 13468078
62]  req@ffff880037705800 x1412229540872552/t0(0) o250-&amp;gt;MGC10.10.4.88@tcp@10.10.4.88@tcp:26/25 lens 400/544 e 0 to 1 dl 1346807887 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Sep  4 18:18:07 fat-intel-3vm8 kernel: Lustre: 2075:0:(client.c:1917:ptlrpc_expire_one_request()) Skipped 8 previous similar messages
Sep  4 18:27:07 fat-intel-3vm8 kernel: Lustre: 2075:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1346808402/real 1346808402]  req@ffff88007c723400 x1412229540872570/t0(0) o250-&amp;gt;MGC10.10.4.88@tcp@10.10.4.88@tcp:26/25 lens 400/544 e 0 to 1 dl 1346808427 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Sep  4 18:27:07 fat-intel-3vm8 kernel: Lustre: 2075:0:(client.c:1917:ptlrpc_expire_one_request()) Skipped 17 previous similar messages
Sep  4 18:37:37 fat-intel-3vm8 kernel: Lustre: 2075:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1346809032/real 1346809032]  req@ffff88007c723c00 x1412229540872591/t0(0) o250-&amp;gt;MGC10.10.4.88@tcp@10.10.4.88@tcp:26/25 lens 400/544 e 0 to 1 dl 1346809057 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Sep  4 18:37:37 fat-intel-3vm8 kernel: Lustre: 2075:0:(client.c:1917:ptlrpc_expire_one_request()) Skipped 20 previous similar messages
Sep  4 18:48:07 fat-intel-3vm8 kernel: Lustre: 2075:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1346809662/real 13468096
62]  req@ffff8800709db800 x1412229540872612/t0(0) o250-&amp;gt;MGC10.10.4.88@tcp@10.10.4.88@tcp:26/25 lens 400/544 e 0 to 1 dl 1346809687 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Sep  4 18:48:07 fat-intel-3vm8 kernel: Lustre: 2075:0:(client.c:1917:ptlrpc_expire_one_request()) Skipped 20 previous similar messages
Sep  4 18:58:37 fat-intel-3vm8 kernel: Lustre: 2075:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1346810292/real 13468102
92]  req@ffff880037705c00 x1412229540872633/t0(0) o250-&amp;gt;MGC10.10.4.88@tcp@10.10.4.88@tcp:26/25 lens 400/544 e 0 to 1 dl 1346810317 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Sep  4 18:58:37 fat-intel-3vm8 kernel: Lustre: 2075:0:(client.c:1917:ptlrpc_expire_one_request()) Skipped 20 previous similar messages
Sep  4 19:04:34 fat-intel-3vm8 kernel: Lustre: lustre-OST0004: haven&apos;t heard from client 85be9246-99f9-3b19-5f81-ad80152fd420 (at 10.10.4.91@tcp) in 47 seconds. I think it&apos;s dead, and I am evicting it. exp ffff88007254fc00, cur 1346810674 expire 1346810644 last 1346810627
Sep  4 19:04:34 fat-intel-3vm8 kernel: Lustre: lustre-OST0004: haven&apos;t heard from client f75916a5-eaec-360e-dafc-5ebb1448c443 (at 10.10.4.90@tcp) in 34 seconds. I think it&apos;s dead, and I am evicting it. exp ffff88007598b000, cur 1346810674 expire 1346810644 last 1346810640
Sep  4 19:04:39 fat-intel-3vm8 kernel: Lustre: lustre-OST0000: haven&apos;t heard from client 85be9246-99f9-3b19-5f81-ad80152fd420 (at 10.10.4.91@tcp) in 52 seconds. I think it&apos;s dead, and I am evicting it. exp ffff8800732a6000, cur 1346810679 expire 1346810649 last 1346810627
Sep  4 19:04:39 fat-intel-3vm8 kernel: Lustre: Skipped 10 previous similar messages
Sep  4 19:04:40 fat-intel-3vm8 kernel: connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4298439482, last ping 4298444482, now 4298449482
Sep  4 19:04:40 fat-intel-3vm8 kernel: connection1:0: detected conn error (1011)
Sep  4 19:04:41 fat-intel-3vm8 iscsid: Kernel reported iSCSI connection 1:0 error (1011 - ISCSI_ERR_CONN_FAILED: iSCSI connection failed) state (3)
Sep  4 19:05:38 fat-intel-3vm8 kernel: MTRR variable ranges enabled:
Sep  4 19:05:38 fat-intel-3vm8 kernel:  0 base 00E0000000 mask FFE0000000 uncachable
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Syslog on client fat-intel-3vm6(10.10.4.91):&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Sep  4 18:03:35 fat-intel-3vm6 kernel: Lustre: DEBUG MARKER: == recovery-small test 101: IR: Make sure IR works w/o normal recovery == 18:03:35 (1346807015)
Sep  4 18:04:07 fat-intel-3vm6 kernel: Lustre: lustre-OST0001-osc-ffff8800749ac400: Connection to lustre-OST0001 (at 10.10.4.89@tcp) was lost; in progress operations using this
 service will wait for recovery to complete
Sep  4 18:04:07 fat-intel-3vm6 kernel: Lustre: Skipped 3 previous similar messages
Sep  4 18:05:07 fat-intel-3vm6 kernel: LustreError: 11-0: an error occurred while communicating with 10.10.4.93@tcp. The ost_connect operation failed with -19
Sep  4 18:06:17 fat-intel-3vm6 kernel: LustreError: 11-0: an error occurred while communicating with 10.10.4.93@tcp. The ost_connect operation failed with -19
Sep  4 18:06:17 fat-intel-3vm6 kernel: LustreError: Skipped 28 previous similar messages
Sep  4 18:06:24 fat-intel-3vm6 kernel: Lustre: lustre-OST0000-osc-ffff8800749ac400: Connection restored to lustre-OST0000 (at 10.10.4.93@tcp)
Sep  4 18:06:24 fat-intel-3vm6 kernel: Lustre: Skipped 6 previous similar messages
Sep  4 18:07:47 fat-intel-3vm6 kernel: Lustre: 2131:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request  sent has failed due to network error: [sent 1346807267/real 13468
07267]  req@ffff880074801800 x1412225718819426/t0(0) o8-&amp;gt;lustre-OST0002-osc-ffff8800749ac400@10.10.4.89@tcp:28/4 lens 400/544 e 0 to 1 dl 1346807293 ref 1 fl Rpc:XN/0/ffffffff 
rc 0/-1
Sep  4 18:07:47 fat-intel-3vm6 kernel: Lustre: 2131:0:(client.c:1917:ptlrpc_expire_one_request()) Skipped 159 previous similar messages
Sep  4 18:08:27 fat-intel-3vm6 kernel: LustreError: 11-0: an error occurred while communicating with 10.10.4.93@tcp. The ost_connect operation failed with -11
Sep  4 18:08:27 fat-intel-3vm6 kernel: LustreError: Skipped 49 previous similar messages
Sep  4 18:13:02 fat-intel-3vm6 kernel: LustreError: 11-0: an error occurred while communicating with 10.10.4.93@tcp. The ost_connect operation failed with -11
Sep  4 18:13:02 fat-intel-3vm6 kernel: LustreError: Skipped 51 previous similar messages
Sep  4 19:04:28 fat-intel-3vm6 kernel: MTRR variable ranges enabled:
Sep  4 19:04:28 fat-intel-3vm6 kernel:  0 base 00E0000000 mask FFE0000000 uncachable
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Syslog on client fat-intel-3vm5(10.10.4.90):&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Sep  4 18:03:35 fat-intel-3vm5 kernel: Lustre: DEBUG MARKER: == recovery-small test 101: IR: Make sure IR works w/o normal recovery == 18:03:35 (1346807015)
Sep  4 18:04:10 fat-intel-3vm5 kernel: Lustre: lustre-OST0001-osc-ffff880037556c00: Connection to lustre-OST0001 (at 10.10.4.89@tcp) was lost; in progress operations using this
 service will wait for recovery to complete
Sep  4 18:04:10 fat-intel-3vm5 kernel: Lustre: Skipped 3 previous similar messages
Sep  4 18:05:20 fat-intel-3vm5 kernel: LustreError: 11-0: an error occurred while communicating with 10.10.4.93@tcp. The ost_connect operation failed with -19
Sep  4 18:06:24 fat-intel-3vm5 kernel: Lustre: lustre-OST0000-osc-ffff880037556c00: Connection restored to lustre-OST0000 (at 10.10.4.93@tcp)
Sep  4 18:06:24 fat-intel-3vm5 kernel: Lustre: Skipped 6 previous similar messages
Sep  4 18:07:30 fat-intel-3vm5 kernel: LustreError: 11-0: an error occurred while communicating with 10.10.4.93@tcp. The ost_connect operation failed with -19
Sep  4 18:07:30 fat-intel-3vm5 kernel: LustreError: Skipped 37 previous similar messages
Sep  4 18:10:06 fat-intel-3vm5 kernel: Lustre: 2138:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1346807380/real 13468073
80]  req@ffff8800794cd000 x1412225718819590/t0(0) o8-&amp;gt;lustre-OST0003-osc-ffff880037556c00@10.10.4.89@tcp:28/4 lens 400/544 e 0 to 1 dl 1346807406 ref 1 fl Rpc:XN/0/ffffffff rc 
0/-1
Sep  4 18:10:06 fat-intel-3vm5 kernel: Lustre: 2138:0:(client.c:1917:ptlrpc_expire_one_request()) Skipped 173 previous similar messages
Sep  4 18:12:20 fat-intel-3vm5 kernel: LustreError: 11-0: an error occurred while communicating with 10.10.4.93@tcp. The ost_connect operation failed with -19
Sep  4 18:12:20 fat-intel-3vm5 kernel: LustreError: Skipped 70 previous similar messages
Sep  4 19:04:42 fat-intel-3vm5 kernel: MTRR variable ranges enabled:
Sep  4 19:04:42 fat-intel-3vm5 kernel:  0 base 00E0000000 mask FFE0000000 uncachable
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Syslog on client fat-intel-3vm1(10.10.4.86):&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Sep  4 18:03:35 fat-intel-3vm1 kernel: Lustre: DEBUG MARKER: == recovery-small test 101: IR: Make sure IR works w/o normal recovery == 18:03:35 (1346807015)
Sep  4 18:05:08 fat-intel-3vm1 kernel: LustreError: 11-0: an error occurred while communicating with 10.10.4.93@tcp. The ost_connect operation failed with -19
Sep  4 18:05:08 fat-intel-3vm1 kernel: LustreError: Skipped 6 previous similar messages
Sep  4 18:06:54 fat-intel-3vm1 kernel: Lustre: 3295:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1346807188/real 13468071
88]  req@ffff88002a4e5000 x1412225453747374/t0(0) o8-&amp;gt;lustre-OST0005-osc-ffff880047d0f000@10.10.4.89@tcp:28/4 lens 400/544 e 0 to 1 dl 1346807213 ref 1 fl Rpc:XN/0/ffffffff rc 
0/-1
Sep  4 18:06:54 fat-intel-3vm1 kernel: Lustre: 3295:0:(client.c:1917:ptlrpc_expire_one_request()) Skipped 119 previous similar messages
Sep  4 18:11:35 fat-intel-3vm1 kernel: Lustre: lustre-OST0004-osc-ffff880047d0f000: Connection restored to lustre-OST0004 (at 10.10.4.93@tcp)
Sep  4 18:11:35 fat-intel-3vm1 kernel: Lustre: Skipped 5 previous similar messages
Sep  4 19:05:09 fat-intel-3vm1 kernel: MTRR variable ranges enabled:
Sep  4 19:05:09 fat-intel-3vm1 kernel:  0 base 00E0000000 mask FFE0000000 uncachable
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment></environment>
        <key id="15878">LU-1890</key>
            <summary>Test failure on test suite recovery-small, subtest test_101</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="laisiyao">Lai Siyao</assignee>
                                    <reporter username="maloo">Maloo</reporter>
                        <labels>
                    </labels>
                <created>Tue, 11 Sep 2012 10:24:48 +0000</created>
                <updated>Wed, 12 Jun 2013 16:13:31 +0000</updated>
                            <resolved>Wed, 12 Jun 2013 16:13:31 +0000</resolved>
                                    <version>Lustre 2.3.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>8</watches>
                                                                            <comments>
                            <comment id="45030" author="yujian" created="Mon, 17 Sep 2012 09:33:37 +0000"  >&lt;p&gt;Lustre build: &lt;a href=&quot;http://build.whamcloud.com/job/lustre-b2_3/19&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://build.whamcloud.com/job/lustre-b2_3/19&lt;/a&gt;&lt;br/&gt;
Test group: failover&lt;/p&gt;

&lt;p&gt;The same issue occurred: &lt;a href=&quot;https://maloo.whamcloud.com/test_sets/4b58e1bc-fff3-11e1-9f3c-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/4b58e1bc-fff3-11e1-9f3c-52540035b04c&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="45069" author="pjones" created="Mon, 17 Sep 2012 14:11:19 +0000"  >&lt;p&gt;Lai&lt;/p&gt;

&lt;p&gt;Could you please look into this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="45250" author="laisiyao" created="Thu, 20 Sep 2012 04:05:08 +0000"  >&lt;p&gt;When I ran this single subtest 101 in a loop, it always passed. Now I&apos;m running full test to try to reproduce. The log shows MDS suddenly rebooted, but doesn&apos;t give much details, is it right, Yujian?&lt;/p&gt;</comment>
                            <comment id="45254" author="yujian" created="Thu, 20 Sep 2012 05:25:03 +0000"  >&lt;blockquote&gt;&lt;p&gt;When I ran this single subtest 101 in a loop, it always passed. Now I&apos;m running full test to try to reproduce. The log shows MDS suddenly rebooted, but doesn&apos;t give much details, is it right, Yujian?&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;For report &lt;a href=&quot;https://maloo.whamcloud.com/test_sets/ae7cd73a-f760-11e1-8b95-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/ae7cd73a-f760-11e1-8b95-52540035b04c&lt;/a&gt;, I only found out the attached console log for MDS/MGS fat-intel-3vm7(10.10.4.92) from the autotest logdir on brent node.&lt;/p&gt;

&lt;p&gt;The issue only occurred while running the test under failover configuration. So, if you wanna reproduce or debug it, you can upload a patch to Gerrit with the following commit messages:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Test-Parameters: fortestonly envdefinitions=SLOW=yes \
clientcount=4 osscount=2 mdscount=2 austeroptions=-R \
failover=true useiscsi=true testlist=recovery-small
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="45345" author="laisiyao" created="Fri, 21 Sep 2012 11:07:04 +0000"  >&lt;p&gt;I reproduced in my test environment. This is a Imperative Recovery test, and the client is not able to reconnect in recovery window, and finally failed. I&apos;m not clear of the cause yet, and am collecting more logs to debug.&lt;/p&gt;</comment>
                            <comment id="45409" author="yujian" created="Sun, 23 Sep 2012 23:04:51 +0000"  >&lt;p&gt;Here is a link to the historical Maloo reports for recovery-small test 101 in &lt;b&gt;failover&lt;/b&gt; test group:&lt;br/&gt;
&lt;a href=&quot;http://tinyurl.com/8adbnzz&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://tinyurl.com/8adbnzz&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As we can see:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;before 2012-05-31, the test has passed some times but &lt;b&gt;not&lt;/b&gt; with failover configuration&lt;/li&gt;
	&lt;li&gt;between 2012-05-31 and 2012-08-20, the test was run with failover configuration but hit TT-765&lt;/li&gt;
	&lt;li&gt;after TT-765 was fixed, since 2012-09-05, &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1890&quot; title=&quot;Test failure on test suite recovery-small, subtest test_101&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1890&quot;&gt;&lt;del&gt;LU-1890&lt;/del&gt;&lt;/a&gt; has been occurring regularly&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="45416" author="laisiyao" created="Mon, 24 Sep 2012 04:04:40 +0000"  >&lt;p&gt;Jinshan, the test log shows that after OST failover, client is denied to connect, while MDS is successfully connected. Do you have any clue?&lt;/p&gt;</comment>
                            <comment id="45462" author="jay" created="Mon, 24 Sep 2012 15:55:27 +0000"  >&lt;p&gt;Siyao, can you please check if this is because the client data was not written into persistent storage of last_rcvd so the OST thought that was a new client so denied it to reconnect? I saw this kind of issue several times and if this is the case, we can fix it by issuing a sync IO before failing OST.&lt;/p&gt;</comment>
                            <comment id="45508" author="laisiyao" created="Tue, 25 Sep 2012 08:04:19 +0000"  >&lt;p&gt;Chris, could you describe how the shared disk is configured for failover test?&lt;/p&gt;

&lt;p&gt;In my manual test, the failure cause is that I used /dev/sdx like device (iscsi disk) as OST device, but the disk sequence tends to change on different nodes, i.e, /dev/sdc on OST node may be /dev/sdd on the failover OST node, so after failover, they access different device, so client is not able to connect. After I use /dev/disk/by-id/iscsi-xxxx to accss the shared disk, I&apos;ve run this test successfully more than 1 hour now.&lt;/p&gt;</comment>
                            <comment id="45517" author="pjones" created="Tue, 25 Sep 2012 11:30:03 +0000"  >&lt;p&gt;I think that there is enough evidence to suggest that this is purely a testing issue. We should still continue to work on resolving this and include a fix for an RC2 if one is ready and we need one, but it would not warrant holding the release on its own merits&lt;/p&gt;</comment>
                            <comment id="45551" author="laisiyao" created="Wed, 26 Sep 2012 02:11:37 +0000"  >&lt;p&gt;There is another issue for failover testing: only logs on the failed node is collected on maloo, the logs of the failover node is nowhere to find.&lt;/p&gt;</comment>
                            <comment id="46467" author="yujian" created="Fri, 12 Oct 2012 09:42:46 +0000"  >&lt;p&gt;Lustre Tag: v2_3_0_RC2&lt;br/&gt;
Lustre Build: &lt;a href=&quot;http://build.whamcloud.com/job/lustre-b2_3/32&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://build.whamcloud.com/job/lustre-b2_3/32&lt;/a&gt;&lt;br/&gt;
Distro/Arch: RHEL6.3/x86_64(server), RHEL5.8/x86_64(client)&lt;br/&gt;
Test Group: failover&lt;/p&gt;

&lt;p&gt;The issue occurred again: &lt;a href=&quot;https://maloo.whamcloud.com/test_sets/8f2e24c4-146b-11e2-af8d-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/8f2e24c4-146b-11e2-af8d-52540035b04c&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="60447" author="adilger" created="Wed, 12 Jun 2013 16:13:31 +0000"  >&lt;p&gt;This is no longer being seen in testing, since about 2012/11. Closing it because it is incorrectly being marked as the cause of many other failures, including &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3453&quot; title=&quot;LBUG in mdt_intent_opc(), (layout.c:1916:__req_capsule_get()) ASSERTION( msg != ((void *)0) ) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3453&quot;&gt;&lt;del&gt;LU-3453&lt;/del&gt;&lt;/a&gt;. &lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="11841" name="fat-intel-3vm8.syslog.log" size="83815" author="yujian" created="Tue, 11 Sep 2012 10:27:51 +0000"/>
                            <attachment id="11879" name="recovery-small.test_101.console.fat-intel-3vm7.log" size="6773" author="yujian" created="Thu, 20 Sep 2012 05:25:03 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzv3sv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>4143</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>