<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:03:20 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-61] MDT can&apos;t connect to OST after hardware event: oscc recovery failed: -116</title>
                <link>https://jira.whamcloud.com/browse/LU-61</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Hi WC,&lt;/p&gt;

&lt;p&gt;There was a hardware failure at Purdue today that took out a 6620 controller. After fixing the issue, the MDT fails to connect to one OST and have intermittent connections with another. fid2dentry is getting passed an obd_id of 0 which causes it to return a ESTALE to the MDT. I looked in bz, but I couldn&apos;t find anything similar. Have you seen anything or do you have any ideas on how to get it online?&lt;/p&gt;

&lt;p&gt;We&apos;ve tried rebooting the MDS and OSS, but after recovery it still has this issue. Would aborting recovery help? How about the CATALOGS trick? Let me know if other logs would help. &lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Kit&lt;/p&gt;

&lt;p&gt;Relevant MDT logs: &lt;br/&gt;
Feb  4 17:50:53 mds-a01 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;13485616.075071&amp;#93;&lt;/span&gt; LustreError: 12085:0:(osc_create.c:585:osc_create()) lustrefatal: invalid object id A-OST0001-osc: oscc recovery failed: -116&lt;br/&gt;
Feb  4 17:50:53 mds-a01 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;13485616.075526&amp;#93;&lt;/span&gt; LustreError: 12085:0:(lov_obd.c:1131:lov_clear_orphans()) error in orphan recovery on OST idx 1/36: rc = -116&lt;br/&gt;
Feb  4 17:50:53 mds-a01 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;13485616.076025&amp;#93;&lt;/span&gt; LustreError: 12085:0:(mds_lov.c:1062:__mds_lov_synchronize()) lustreA-OST0001_UUID failed at mds_lov_clear_orphans: -116&lt;br/&gt;
Feb  4 17:50:53 mds-a01 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;13485616.076482&amp;#93;&lt;/span&gt; LustreError: 12085:0:(mds_lov.c:1071:__mds_lov_synchronize()) lustreA-OST0001_UUID sync failed -116, deactivating&lt;br/&gt;
Feb  4 17:51:39 mds-a01 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;13485661.612612&amp;#93;&lt;/span&gt; LustreError: 12408:0:(osc_create.c:585:osc_create()) lustreA-OST0001-osc: oscc recovery failed: -116&lt;/p&gt;

&lt;p&gt;lctl dl&lt;br/&gt;
...&lt;br/&gt;
 28 UP osc lustreA-OST000b-osc lustreA-mdtlov_UUID 5&lt;br/&gt;
 29 UP osc lustreA-OST0000-osc lustreA-mdtlov_UUID 5&lt;br/&gt;
 30 IN osc lustreA-OST0001-osc lustreA-mdtlov_UUID 5&lt;br/&gt;
 31 UP osc lustreA-OST0002-osc lustreA-mdtlov_UUID 5&lt;br/&gt;
 32 UP osc lustreA-OST0003-osc lustreA-mdtlov_UUID 5&lt;br/&gt;
...&lt;/p&gt;

&lt;p&gt;Relevant OST logs:&lt;br/&gt;
Feb  4 17:43:53 oss-a01 kernel: [ 1333.618994] LustreError: 10635:0:(filter.c:1428:filter_fid2dentry()) lustreA-OST0001: object 2283250:0 lookup error: rc -116&lt;br/&gt;
Feb  4 17:43:53 oss-a01 kernel: [ 1333.619430] LustreError: 10635:0:(filter.c:1428:filter_fid2dentry()) Skipped 1 previous similar message&lt;br/&gt;
Feb  4 17:43:55 oss-a01 kernel: [ 1336.503075] LustreError: 9981:0:(filter_lvb.c:90:filter_lvbo_init()) lustreA-OST0001: bad object 2283250/0: rc -116&lt;br/&gt;
Feb  4 17:43:55 oss-a01 kernel: [ 1336.503630] LustreError: 9981:0:(ldlm_resource.c:860:ldlm_resource_add()) lvbo_init failed for resource 2283250: rc -116&lt;br/&gt;
Feb  4 17:43:55 oss-a01 kernel: [ 1336.504092] LustreError: 9981:0:(ldlm_resource.c:860:ldlm_resource_add()) Skipped 37 previous similar messages&lt;/p&gt;

</description>
                <environment></environment>
        <key id="10331">LU-61</key>
            <summary>MDT can&apos;t connect to OST after hardware event: oscc recovery failed: -116</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="6" iconUrl="https://jira.whamcloud.com/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="cliffw">Cliff White</assignee>
                                    <reporter username="kitwestneat">Kit Westneat</reporter>
                        <labels>
                    </labels>
                <created>Fri, 4 Feb 2011 16:07:21 +0000</created>
                <updated>Tue, 28 Jun 2011 15:01:37 +0000</updated>
                            <resolved>Fri, 4 Feb 2011 17:36:34 +0000</resolved>
                                                    <fixVersion>Lustre 1.8.6</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>0</watches>
                                                                            <comments>
                            <comment id="10529" author="cliffw" created="Fri, 4 Feb 2011 16:18:01 +0000"  >&lt;p&gt;Yes, I would try aborting recovery. It would be best to have the full log for a mount attempt.&lt;/p&gt;</comment>
                            <comment id="10530" author="kitwestneat" created="Fri, 4 Feb 2011 16:23:43 +0000"  >&lt;p&gt;logs from MDS and OSS before MDS and OSS reboot&lt;/p&gt;</comment>
                            <comment id="10531" author="cliffw" created="Fri, 4 Feb 2011 16:44:39 +0000"  >&lt;p&gt;There are 4 days of logs here. Can you tell me exactly when the issue started? When did you  have the hardware failure?&lt;/p&gt;</comment>
                            <comment id="10532" author="cliffw" created="Fri, 4 Feb 2011 16:48:16 +0000"  >&lt;p&gt;I am consulting with engineering - have you run fsck on the OSTs? This issue may indicate an issue there. &lt;/p&gt;</comment>
                            <comment id="10533" author="kitwestneat" created="Fri, 4 Feb 2011 16:52:22 +0000"  >&lt;p&gt;The OSSes see IO errors around Feb  3 13:37:10. After the first reboot at Feb  3 14:46:41, Lustre isn&apos;t started again until Feb  4 17:27:43. That&apos;s when you can first see the -ESTALE errors. &lt;/p&gt;

&lt;p&gt;I&apos;ll ask about the customer if they have done an fsck, I thought they had, but maybe not. &lt;/p&gt;</comment>
                            <comment id="10534" author="cliffw" created="Fri, 4 Feb 2011 16:52:36 +0000"  >&lt;p&gt;Engineering confirms - you should run fsck the object 2283250 may be damaged.&lt;/p&gt;</comment>
                            <comment id="10535" author="kitwestneat" created="Fri, 4 Feb 2011 17:28:59 +0000"  >&lt;p&gt;That was it, oops! Sorry for not checking that earlier. Hopefully others can learn from my mistakes... Thanks for your help!&lt;/p&gt;</comment>
                            <comment id="10536" author="cliffw" created="Fri, 4 Feb 2011 17:36:04 +0000"  >&lt;p&gt;Great! glad it sorted out so easy, I will close this&lt;/p&gt;</comment>
                            <comment id="10537" author="cliffw" created="Fri, 4 Feb 2011 17:36:34 +0000"  >&lt;p&gt;Customer fsck&apos;d OST, problem solved&lt;/p&gt;</comment>
                            <comment id="10538" author="pjones" created="Fri, 4 Feb 2011 17:38:16 +0000"  >&lt;p&gt;As per Kit ok to close&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="10104" name="LU-61.tar.gz" size="150555" author="kitwestneat" created="Fri, 4 Feb 2011 16:23:43 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvzxr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>10068</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10020"><![CDATA[1]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>