<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:40:42 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4214] Hyperion - OST never recovers on failover node</title>
                <link>https://jira.whamcloud.com/browse/LU-4214</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;On Hyperion, doing manual failover. OSTs are formatted thusly:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;mkfs.lustre --reformat --ost --fsname lustre --mgsnode=$MGSNODE --index=$stinx --servicenode=${PRI[$i]} --servicenode=${SEC[$i]} --mkfsoptions=&lt;span class=&quot;code-quote&quot;&gt;&apos;-t ext4 -J size=2048 -O extents -G 256 -i 69905&apos;&lt;/span&gt; /dev/sd${DISK[$i]}&quot; &amp;amp;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Result on disk:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;   Permanent disk data:
Target:     lustre-OST0013
Index:      19
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x1002
              (OST no_primnode )
Persistent mount opts: errors=remount-ro
Parameters: mgsnode=192.168.120.5@o2ib failover.node=192.168.127.62@o2ib failover.node=192.168.127.66@o2ib
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Proceedure:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;power off dit31&lt;/li&gt;
	&lt;li&gt;run script which mounts OSTs on dit35&lt;br/&gt;
Result:&lt;br/&gt;
MGS gives this message: (.62 is primary, .66 is failover. .62 is STONITH dead at this time)
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;h-agb5: Lustre: lustre-MDT0000: Client lustre-MDT0000-lwp-OST0013_UUID seen on &lt;span class=&quot;code-keyword&quot;&gt;new&lt;/span&gt; nid 192.168.127.66@o2ib1 when existing nid 192.168.127.62@o2ib1 is already connected
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;MGS/MDS thereafter ignores these OSTs, continuing to give error messages pointing at primary NID:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;Nov  5 16:18:16 hyperion-agb5 kernel: Lustre: 6143:0:(client.c:1897:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1383697096/real 1383697096]  req@ffff8807bf317c00 x1450911076896768/t0(0) o8-&amp;gt;lustre-OST0013-osc-MDT0000@192.168.127.62@o2ib:28/4 lens 400/544 e 0 to 1 dl 1383697151 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This condition persists despite powercycle/restart of MGS/MDS&lt;br/&gt;
OSS node:&lt;br/&gt;
Reports one error&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;LDISKFS-fs (sdb): mounted filesystem with ordered data mode. quota=on. Opts: 
LustreError: 13a-8: Failed to get MGS log params and no local copy.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Never enters recovery, despite outputting Imperative Recovery message.&lt;br/&gt;
This condition persists despite repeated remount, remount with abort_recov, etc. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Clients continue to timeout on primary NID. &lt;/p&gt;

&lt;p&gt;System remains in this state for further data gathering, suggestions appreciated.&lt;/p&gt;</description>
                <environment></environment>
        <key id="21872">LU-4214</key>
            <summary>Hyperion - OST never recovers on failover node</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="tappro">Mikhail Pershin</assignee>
                                    <reporter username="cliffw">Cliff White</reporter>
                        <labels>
                    </labels>
                <created>Wed, 6 Nov 2013 00:25:08 +0000</created>
                <updated>Fri, 3 Nov 2017 15:41:26 +0000</updated>
                            <resolved>Wed, 11 Jun 2014 13:44:30 +0000</resolved>
                                    <version>Lustre 2.5.0</version>
                                    <fixVersion>Lustre 2.6.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="72496" author="tappro" created="Thu, 28 Nov 2013 18:26:50 +0000"  >&lt;p&gt;Cliff, bug is set as &apos;related&apos; to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2059&quot; title=&quot;mgc to backup configuration on osd-based llogs&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2059&quot;&gt;&lt;del&gt;LU-2059&lt;/del&gt;&lt;/a&gt;, does it really happen due to 2059 or that is just suspicion? Another question are there logs from MDT?&lt;/p&gt;</comment>
                            <comment id="72616" author="cliffw" created="Mon, 2 Dec 2013 16:58:30 +0000"  >&lt;p&gt;I have no idea why that is marked as related. Not done by me. There was very little information in the logs, I posted it into the bug. The lack of any error messages in this situation is rather frustrating. &lt;/p&gt;</comment>
                            <comment id="72693" author="tappro" created="Tue, 3 Dec 2013 13:38:24 +0000"  >&lt;p&gt;OK, I see. I have no good idea about what is wrong there yet, but I have one about that message: &lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt; Lustre: lustre-MDT0000: Client lustre-MDT0000-lwp-OST0013_UUID seen on new nid 192.168.127.66@o2ib1 when existing nid 192.168.127.62@o2ib1 is already connected &lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;That looks like we need to fix target_handle_connect() to establish new connection for LWP client if NID was changes like we are doing for MDS connection. Patch is here &lt;a href=&quot;http://review.whamcloud.com/#/c/8465/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/8465/&lt;/a&gt; and I am waiting for Johann reply on that.&lt;/p&gt;</comment>
                            <comment id="82512" author="adilger" created="Fri, 25 Apr 2014 17:48:13 +0000"  >&lt;p&gt;Mike, Johann commented on the patch &lt;a href=&quot;http://review.whamcloud.com/8465&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/8465&lt;/a&gt; so it needs to be refreshed.&lt;/p&gt;</comment>
                            <comment id="86315" author="jlevi" created="Wed, 11 Jun 2014 13:44:30 +0000"  >&lt;p&gt;Patch landed to Master.&lt;/p&gt;</comment>
                            <comment id="106603" author="gerrit" created="Wed, 11 Feb 2015 09:36:13 +0000"  >&lt;p&gt;Mike Pershin (mike.pershin@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/13726&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/13726&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4214&quot; title=&quot;Hyperion - OST never recovers on failover node&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4214&quot;&gt;&lt;del&gt;LU-4214&lt;/del&gt;&lt;/a&gt; lwp: fix LWP client connect logic&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 8abaa93afd61f6c28e15e035a1a06ecf7f6d748e&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                                        </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="28816">LU-6273</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="36484">LU-8089</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzw83j:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>11463</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>