<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:07:37 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-497] DDN failure - Now can&apos;t find a valid superblock</title>
                <link>https://jira.whamcloud.com/browse/LU-497</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;A tray on one of our 99K enclosures biffed last night causing the OSS to panic. When we got things more or less back in order we attempted fscks on all the LUNs associated with that server and succeeded on all but one. &lt;/p&gt;

&lt;p&gt;When I attempt to run the fsck the system complains about fsck.ext4 not being found. When I run fsck.ldiskfs on the trouble LUN I get the following:&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@aoss11 ~&amp;#93;&lt;/span&gt;# fsck.ldiskfs /dev/sdg&lt;br/&gt;
fsck-sdg&lt;span class=&quot;error&quot;&gt;&amp;#91;7235&amp;#93;&lt;/span&gt;: running (null)&lt;br/&gt;
fsck-sdg&lt;span class=&quot;error&quot;&gt;&amp;#91;7235&amp;#93;&lt;/span&gt;: fsck.ldiskfs 1.41.10.sun2-4chaos (23-Jun-2010)&lt;br/&gt;
fsck-sdg&lt;span class=&quot;error&quot;&gt;&amp;#91;7235&amp;#93;&lt;/span&gt;: fsck.ldiskfs: MMP: fsck being run while trying to open /dev/sdg&lt;br/&gt;
fsck-sdg&lt;span class=&quot;error&quot;&gt;&amp;#91;7235&amp;#93;&lt;/span&gt;: &lt;br/&gt;
fsck-sdg&lt;span class=&quot;error&quot;&gt;&amp;#91;7235&amp;#93;&lt;/span&gt;: The superblock could not be read or does not describe a correct ext2&lt;br/&gt;
fsck-sdg&lt;span class=&quot;error&quot;&gt;&amp;#91;7235&amp;#93;&lt;/span&gt;: filesystem.  If the device is valid and it really contains an ext2&lt;br/&gt;
fsck-sdg&lt;span class=&quot;error&quot;&gt;&amp;#91;7235&amp;#93;&lt;/span&gt;: filesystem (and not swap or ufs or something else), then the superblock&lt;br/&gt;
fsck-sdg&lt;span class=&quot;error&quot;&gt;&amp;#91;7235&amp;#93;&lt;/span&gt;: is corrupt, and you might try running e2fsck with an alternate superblock:&lt;br/&gt;
fsck-sdg&lt;span class=&quot;error&quot;&gt;&amp;#91;7235&amp;#93;&lt;/span&gt;:     e2fsck -b 32768 &amp;lt;device&amp;gt;&lt;br/&gt;
fsck-sdg&lt;span class=&quot;error&quot;&gt;&amp;#91;7235&amp;#93;&lt;/span&gt;: &lt;br/&gt;
fsck-sdg&lt;span class=&quot;error&quot;&gt;&amp;#91;7235&amp;#93;&lt;/span&gt;: exit code 8 (operational error)&lt;/p&gt;

&lt;p&gt;When I go to the alternate superblocks (only three get listed) I get the same error.&lt;/p&gt;

&lt;p&gt;The odd thing is if I do a tunefs.lustre on the device I gives me all the information on the OST. &lt;/p&gt;

&lt;p&gt;If I try to run dumpe2fs it spits out some of the disk info then just waits. I can break out of the command but even if I run the command on one of the good LUNs I get the same results. I don&apos;t know how to try to find any additional superblocks. &lt;/p&gt;

&lt;p&gt;This is a production file system so we are obviously down and critical. Any assistance would be greaty appreciated.&lt;/p&gt;</description>
                <environment>chaos version 4.4-2 on dell R710 servers connection via IB to DDN S2A9900.</environment>
        <key id="11319">LU-497</key>
            <summary>DDN failure - Now can&apos;t find a valid superblock</summary>
                <type id="3" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11318&amp;avatarType=issuetype">Task</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="6" iconUrl="https://jira.whamcloud.com/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="adilger">Andreas Dilger</assignee>
                                    <reporter username="jamervi">Joe Mervini</reporter>
                        <labels>
                    </labels>
                <created>Fri, 8 Jul 2011 15:08:06 +0000</created>
                <updated>Wed, 26 Oct 2011 19:52:56 +0000</updated>
                            <resolved>Fri, 8 Jul 2011 23:39:33 +0000</resolved>
                                    <version>Lustre 1.8.6</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="17490" author="jamervi" created="Fri, 8 Jul 2011 15:14:55 +0000"  >&lt;p&gt;Quick follow-up: so there&apos;s no confusion, we used fsck.ldiskfs on all the other devices successfully. &lt;/p&gt;</comment>
                            <comment id="17491" author="pjones" created="Fri, 8 Jul 2011 15:14:58 +0000"  >&lt;p&gt;Joe&lt;/p&gt;

&lt;p&gt;I am looking for an engineer to help you with this issue. Can I just confirm on the version of Lustre code that you are running. Is it really Lustre 1.8.6-wc1 or is it the Lustre 1.8.5 + patches bundled with the latest Chaos releae?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="17495" author="jamervi" created="Fri, 8 Jul 2011 15:30:32 +0000"  >&lt;p&gt;It is the Lustre 1.8.5 + patches bundled with the latest Chaos releas.e&lt;/p&gt;</comment>
                            <comment id="17510" author="adilger" created="Fri, 8 Jul 2011 16:47:17 +0000"  >&lt;p&gt;Older versions of e2fsck have some issues like this with the MMP block being left in a state where it reports e2fsck is still being run.  Those problems have been fixed with newer e2fsck releases.&lt;/p&gt;

&lt;p&gt;In order to clear this flag in the MMP block you need to run:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;    tune2fs -f -E clear_mmp /dev/sdg
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and then run e2fsck as normal.  I would separately recommend upgrading e2fsprogs to 1.41.12.ora2, which contains several MMP fixes.&lt;/p&gt;</comment>
                            <comment id="17558" author="jamervi" created="Fri, 8 Jul 2011 21:16:50 +0000"  >&lt;p&gt;Andreas,&lt;/p&gt;

&lt;p&gt;Thank you so much for the help. tun2fs -f -e clear_mmp /dev/sdg indeed cleared the way to run fsck. The delay in feedback was because we allowed one of the failed drives in the array to complete its rebuild before doing anything else. After that, we got real anal and shutdown the servers attached to the controller pair and restarted them.&lt;/p&gt;

&lt;p&gt;We then ran fsck will -n to see what got reported, then with -yDf. Both checks took more that 2 hours to complete combined. We then ran ll_recover_lost_found_objs on the LUN mounted ldiskfs and which restored all objects in lost+found.&lt;/p&gt;

&lt;p&gt;We then brought the whole file system back online and everything is back to normal.&lt;/p&gt;

&lt;p&gt;Thanks again for the quick response.&lt;/p&gt;</comment>
                            <comment id="17559" author="pjones" created="Fri, 8 Jul 2011 23:39:33 +0000"  >&lt;p&gt;Joe&lt;/p&gt;

&lt;p&gt;Glad to hear that normal service has been resumed&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzw2kn:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>10499</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>