<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:34:04 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-3456] Remove or refactor &quot;ost_connect failed&quot; message</title>
                <link>https://jira.whamcloud.com/browse/LU-3456</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;I see this message at startup time on the MDS. If it&apos;s safe to ignore, it should be removed. If it&apos;s important, it should be refactored to be understandable by an admin (I don&apos;t even know what it means, and it&apos;s a console message).&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2013-06-11 12:53:52 LustreError: 11-0: lc2-OST0007-osc-MDT0000: Communicating with 10.1.1.48@o2ib9, operation ost_connect failed with -19.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment></environment>
        <key id="19372">LU-3456</key>
            <summary>Remove or refactor &quot;ost_connect failed&quot; message</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bobijam">Zhenyu Xu</assignee>
                                    <reporter username="prakash">Prakash Surya</reporter>
                        <labels>
                            <label>shh</label>
                    </labels>
                <created>Tue, 11 Jun 2013 20:35:31 +0000</created>
                <updated>Thu, 26 Feb 2015 21:56:44 +0000</updated>
                            <resolved>Thu, 26 Feb 2015 21:56:44 +0000</resolved>
                                    <version>Lustre 2.4.0</version>
                                    <fixVersion>Lustre 2.7.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="60405" author="pjones" created="Wed, 12 Jun 2013 00:18:13 +0000"  >&lt;p&gt;Bobijam&lt;/p&gt;

&lt;p&gt;Could you please help with this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="60515" author="bobijam" created="Thu, 13 Jun 2013 03:11:33 +0000"  >&lt;p&gt;it&apos;s from ptlrpc_check_status(), indicating when MDS is start up, it tries to connect OST while at the time OST device is not available.&lt;/p&gt;

&lt;p&gt;The comment in ptlrpc_console_allow() reveals that the error happens in the initial connection is not suppressed, while reconnect request error messages will be suppressed.&lt;/p&gt;</comment>
                            <comment id="81932" author="pjones" created="Fri, 18 Apr 2014 12:58:04 +0000"  >&lt;p&gt;Bobi&lt;/p&gt;

&lt;p&gt;So could you propose an alternative wording for the message that would be more intuitive?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="81940" author="bobijam" created="Fri, 18 Apr 2014 13:52:22 +0000"  >&lt;p&gt;This message indicates that MDS tries to connect OST0007 while the OSS hasn&apos;t set up OST0007 yet or the OST0007 is failed for the time being (-19 == -ENODEV)&lt;/p&gt;</comment>
                            <comment id="82063" author="prakash" created="Mon, 21 Apr 2014 17:10:36 +0000"  >&lt;p&gt;So then, that sounds like &quot;normal&quot; operation to me. I don&apos;t think it warrants a console message. It&apos;s probably an artifact of how I sometimes power cycle all server nodes in a test filesystem. If the MDS comes up before the OSS nodes, then this message will appear?&lt;/p&gt;</comment>
                            <comment id="82215" author="adilger" created="Tue, 22 Apr 2014 22:25:28 +0000"  >&lt;p&gt;Prakash, you are correct that this can happen if the MDS is started before the OSS.  The message is printed to the console to alert the sysadmin in case the target OST is not starting up properly, but I agree it is a distraction if it is printed due to some transient condition.&lt;/p&gt;

&lt;p&gt;That said, when Brian submitted the patch to update this console message he left in the printing of errors during the initial connection attempt.  I think it would make sense to avoid printing this error if there are just a small number of failed initial connection attempts, but still print something if the connection is failing for a long time.  It seems reasonable to only print out such messages when there are persistent problems on the connection.&lt;/p&gt;

&lt;p&gt;I&apos;ve pushed an RFC patch &lt;a href=&quot;http://review.whamcloud.com/10057&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/10057&lt;/a&gt; but I haven&apos;t tested it at all.  In particular, I&apos;m not sure if the same request is used repeatedly for the initial connection (which means rq_nr_resends is properly incremented) or if a new request is used each time (which means my attempt at squashing the initial connect messages will fail).  Bobijam, could you please take a look at this?&lt;/p&gt;</comment>
                            <comment id="82224" author="prakash" created="Wed, 23 Apr 2014 00:57:42 +0000"  >&lt;blockquote&gt;
&lt;p&gt;I think it would make sense to avoid printing this error if there are just a small number of failed initial connection attempts, but still print something if the connection is failing for a long time. It seems reasonable to only print out such messages when there are persistent problems on the connection.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Yes, I think I agree. Since we don&apos;t have any better infrastructure for reporting things like this, I&apos;m more OK with the message if we just try and suppressed the &quot;noise&quot;.&lt;/p&gt;

&lt;p&gt;I still don&apos;t think the console is the &quot;right&quot; place for it, but that&apos;s all we have at the moment. It would be really cool to be able to, instead, post some sort of event that a another process (e.g. userspace daemon) could consume and then decide what to do (e.g. ignore, ping monitoring software, send email, etc). But that&apos;s a whole &apos;nother can of worms.&lt;/p&gt;

&lt;p&gt;I think having some sort of timer (or number of resends) to suppress the message would go a long way in this particular case.&lt;/p&gt;</comment>
                            <comment id="82236" author="bobijam" created="Wed, 23 Apr 2014 04:03:04 +0000"  >&lt;p&gt;Andreas,&lt;/p&gt;

&lt;p&gt;It does not use the same request for initial connection, it changes the import status to DISCONN then CONNECTING and constructs new request (since it could possible select different connection for the import) for connection.&lt;/p&gt;</comment>
                            <comment id="100628" author="gerrit" created="Thu, 4 Dec 2014 02:30:25 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/10057/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/10057/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3456&quot; title=&quot;Remove or refactor &amp;quot;ost_connect failed&amp;quot; message&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3456&quot;&gt;&lt;del&gt;LU-3456&lt;/del&gt;&lt;/a&gt; ptlrpc: quiet errors on initial connection&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: d1cf226d04884a102e17a7d4109764c24572983f&lt;/p&gt;</comment>
                            <comment id="108174" author="adilger" created="Thu, 26 Feb 2015 21:56:44 +0000"  >&lt;p&gt;Patch landed to 2.7.0 to improve console message and quiet it down considerably.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvt2n:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>8640</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>