<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:11:21 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-885] recovery-mds-scale (FLAVOR=mds) fail, network is not avaliable</title>
                <link>https://jira.whamcloud.com/browse/LU-885</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Running recovery-mds-scale FLAVOR=mds for about 2 hours(MDS fail over 14 times), network is not available for standby MDS server and it cannot be access after that even doing power cycle. I got this similar issue twice.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt; 
==== Checking the clients loads AFTER  failover -- failure NOT OK
mds1 has failed over 14 times, and counting...
sleeping 501 seconds ... 
==== Checking the clients loads BEFORE failover -- failure NOT OK     ELAPSED=7904 DURATION=86400 PERIOD=600
Wait mds1 recovery complete before doing next failover ....
affected facets: mds1
client-6: *.lustre-MDT0000.recovery_status status: COMPLETE
Checking clients are in FULL state before doing next failover
client-18: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
client-12: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
client-18: cannot run remote command on client-12,client-13,client-17,client-18 with 
client-12: cannot run remote command on client-12,client-13,client-17,client-18 with 
client-17: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
client-17: cannot run remote command on client-12,client-13,client-17,client-18 with 
client-13: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
client-13: cannot run remote command on client-12,client-13,client-17,client-18 with 
Starting failover on mds1
Failing mds1 on node client-6
+ pm -h powerman --off client-6
Command completed successfully
affected facets: mds1
+ pm -h powerman --on client-6
Command completed successfully
Failover mds1 to client-2
15:35:30 (1322609730) waiting for client-2 network 900 secs ...
waiting ping -c 1 -w 3 client-2, 895 secs left ...
waiting ping -c 1 -w 3 client-2, 890 secs left ...
waiting ping -c 1 -w 3 client-2, 885 secs left ...
waiting ping -c 1 -w 3 client-2, 880 secs left ...
waiting ping -c 1 -w 3 client-2, 875 secs left ...
waiting ping -c 1 -w 3 client-2, 870 secs left ...
waiting ping -c 1 -w 3 client-2, 865 secs left ...
waiting ping -c 1 -w 3 client-2, 860 secs left ...
waiting ping -c 1 -w 3 client-2, 855 secs left ...
waiting ping -c 1 -w 3 client-2, 850 secs left ...
waiting ping -c 1 -w 3 client-2, 845 secs left ...
waiting ping -c 1 -w 3 client-2, 840 secs left ...
waiting ping -c 1 -w 3 client-2, 835 secs left ...
waiting ping -c 1 -w 3 client-2, 830 secs left ...
waiting ping -c 1 -w 3 client-2, 825 secs left ...
waiting ping -c 1 -w 3 client-2, 820 secs left ...
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt; </description>
                <environment>lustre-master build #353 RHEL6-x8_64 for both server and client</environment>
        <key id="12568">LU-885</key>
            <summary>recovery-mds-scale (FLAVOR=mds) fail, network is not avaliable</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="sarah">Sarah Liu</reporter>
                        <labels>
                    </labels>
                <created>Tue, 29 Nov 2011 19:01:26 +0000</created>
                <updated>Thu, 3 Oct 2019 17:41:18 +0000</updated>
                            <resolved>Mon, 29 May 2017 02:52:04 +0000</resolved>
                                    <version>Lustre 2.3.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="23606" author="sarah" created="Thu, 1 Dec 2011 14:14:55 +0000"  >&lt;p&gt;recovery-radom-scale failed after running about 4 hours(mds1 failed  over  24 times) hit the similar issue. the standby mds server is not usable.&lt;/p&gt;

&lt;p&gt;waiting ping -c 1 -w 3 client-7, 5 secs left ...&lt;br/&gt;
Network not available!&lt;br/&gt;
2011-12-01 01:39:58 Terminating clients loads ...&lt;br/&gt;
Duration:                86400&lt;br/&gt;
Server failover period: 600 seconds&lt;br/&gt;
Exited after:           14119 seconds&lt;br/&gt;
Number of failovers before exit:&lt;br/&gt;
mds1 failed  over  24 times&lt;br/&gt;
Status: FAIL: rc=1&lt;/p&gt;</comment>
                            <comment id="25339" author="green" created="Tue, 3 Jan 2012 07:47:44 +0000"  >&lt;p&gt;So it appears that the failover node is not coming up for some reason (pings not working and such), somebody need to reproduce this and then check into what&apos;s going on at the failover node? vm failed to start? Some rootfs corruption so that it&apos;s stuck during booting waiting for root password or whatever?&lt;/p&gt;

&lt;p&gt;Potentially a TT ticket if the node fails to start.&lt;/p&gt;</comment>
                            <comment id="25353" author="sarah" created="Tue, 3 Jan 2012 13:08:46 +0000"  >&lt;p&gt;Actually in both recovery-mds-scale(FLAVOR=mds) and recovery-radom-scale tests, the failover node was not accessible after rebooting for several times. Robert was in the lab and helped on this. As he seen, the nodes need a physical power on and it seems nothing special.&lt;/p&gt;</comment>
                            <comment id="41859" author="pjones" created="Sun, 15 Jul 2012 09:13:47 +0000"  >&lt;p&gt;Hongchao&lt;/p&gt;

&lt;p&gt;Could you please look into this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="41927" author="hongchao.zhang" created="Tue, 17 Jul 2012 09:34:12 +0000"  >&lt;p&gt;this seems to be a problem related to the tool pm, for the nodes were not stuck and only need a power on&lt;/p&gt;</comment>
                            <comment id="42322" author="chris" created="Thu, 26 Jul 2012 09:59:50 +0000"  >&lt;p&gt;Hongchao how did you replicate this, and have we seen this recently under autotest? If so can you post links to the results here.&lt;/p&gt;</comment>
                            <comment id="42442" author="pjones" created="Mon, 30 Jul 2012 12:25:57 +0000"  >&lt;p&gt;Adding Hongchao as a watcher so he sees Chris&apos;s question&lt;/p&gt;</comment>
                            <comment id="42476" author="hongchao.zhang" created="Mon, 30 Jul 2012 22:47:29 +0000"  >&lt;p&gt;Hi Chris,&lt;br/&gt;
I can&apos;t replicate this issue, and there is no new occurrence under autotest recently.&lt;/p&gt;</comment>
                            <comment id="42519" author="pjones" created="Tue, 31 Jul 2012 17:23:22 +0000"  >&lt;p&gt;As per Sarah has not reocurred for last three tags so removing as a blocker&lt;/p&gt;</comment>
                            <comment id="197347" author="adilger" created="Mon, 29 May 2017 02:52:04 +0000"  >&lt;p&gt;Close old ticket.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="12589">LU-893</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="10640" name="recovery-mds-scale-1322611170.tar.bz2" size="4990568" author="sarah" created="Tue, 29 Nov 2011 19:31:36 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzw0tz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>10215</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>