<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:56:41 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-6041] Recover OST specific directories after e2fsck</title>
                <link>https://jira.whamcloud.com/browse/LU-6041</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;A backend storage issue caused an OST to get mounted read-only. Remounting showed that the journal entry in the superblock was corrupt. The journal was removed and e2fsck continued; fixing every issue moved all entries into lost+found directory (when mounting the OST as ldiskfs). Next the ext3 internal journal was added back using tune2fs -j /dev/sdd; and another e2fsck was run. It completed successfully.  Using ll_recover_lost_found_objs, it appears that the object data is back in place, but there is no CONFIGS, quota_slave, or REMOTE_PARENT_DIR entries. Additionally the oi.* entries are missing as well as the health_check and last_rcvd files. &lt;/p&gt;

&lt;p&gt;Is there any way to recover these entries? In the e2fsck output we saw things that look like &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2638&quot; title=&quot;corruption of MDT &amp;quot;..&amp;quot; entry in some ldiskfs directories&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2638&quot;&gt;&lt;del&gt;LU-2638&lt;/del&gt;&lt;/a&gt; and also &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2901&quot; title=&quot;Duplicate filename on the same ldiskfs directory on MDS&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2901&quot;&gt;&lt;del&gt;LU-2901&lt;/del&gt;&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;At this time e2fsck runs to completion and complains about block bitmap differences (lots of them), and Free blocks count (12519 of these messages). &lt;/p&gt;</description>
                <environment></environment>
        <key id="27958">LU-6041</key>
            <summary>Recover OST specific directories after e2fsck</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="yujian">Jian Yu</assignee>
                                    <reporter username="hilljjornl">Jason Hill</reporter>
                        <labels>
                    </labels>
                <created>Wed, 17 Dec 2014 19:49:11 +0000</created>
                <updated>Mon, 22 Dec 2014 19:11:52 +0000</updated>
                            <resolved>Mon, 22 Dec 2014 19:11:51 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="101863" author="hilljjornl" created="Wed, 17 Dec 2014 20:05:57 +0000"  >&lt;p&gt;We&apos;re running the following versions of e2fsprogs:&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;lost+found&amp;#93;&lt;/span&gt;# rpm -qa | grep e2fs&lt;br/&gt;
e2fsprogs-libs-1.42.9.wc1-7.el6.x86_64&lt;br/&gt;
e2fsprogs-devel-1.42.9.wc1-7.el6.x86_64&lt;br/&gt;
e2fsprogs-static-1.42.9.wc1-7.el6.x86_64&lt;br/&gt;
e2fsprogs-1.42.9.wc1-7.el6.x86_64&lt;br/&gt;
e2fsprogs-debuginfo-1.42.9.wc1-7.el6.x86_64&lt;/p&gt;
</comment>
                            <comment id="101865" author="pjones" created="Wed, 17 Dec 2014 20:19:48 +0000"  >&lt;p&gt;Jian&lt;/p&gt;

&lt;p&gt;Could you please advise?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="101924" author="adilger" created="Thu, 18 Dec 2014 06:28:31 +0000"  >&lt;p&gt;There are newer e2fsprogs-1.42.12-wc1, but they won&apos;t fix the missing files. &lt;/p&gt;

&lt;p&gt;That said, they should all be recreated when the OST is mounted again. Most of the files are not needed on OSTs in any case. &lt;/p&gt;</comment>
                            <comment id="101935" author="hilljjornl" created="Thu, 18 Dec 2014 11:38:53 +0000"  >&lt;p&gt;Andreas,&lt;/p&gt;

&lt;p&gt;Something I didn&apos;t mention above (grave oversight, my fault) is that mount -t lustre /dev/sdd /tmp/lustre/fs/ost11 fails with &quot;This device hasn&apos;t been formatted by Lustre&quot;. &lt;/p&gt;</comment>
                            <comment id="101942" author="hilljjornl" created="Thu, 18 Dec 2014 14:33:18 +0000"  >&lt;p&gt;Would something as simple as a tunefs.lustre --writeconf (with the correct parameters) be sufficient to get this correct? Or would it be better to copy the CONFIGS directory from another OST and then do the writeconf be better?&lt;/p&gt;</comment>
                            <comment id="101959" author="hilljjornl" created="Thu, 18 Dec 2014 16:05:42 +0000"  >&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;oss3 /&amp;#93;&lt;/span&gt;# mount -t lustre /dev/sdd /mnt&lt;br/&gt;
mount.lustre: /dev/sdd has not been formatted with mkfs.lustre or the backend filesystem type is not supported by this tool&lt;/p&gt;

&lt;p&gt;dmesg shows no errors. &lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;oss3 /&amp;#93;&lt;/span&gt;# dmesg | grep -i lustre&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;oss3 /&amp;#93;&lt;/span&gt;# uptime&lt;br/&gt;
 11:04:37 up 2 days, 50 min,  2 users,  load average: 0.13, 0.07, 0.01&lt;/p&gt;</comment>
                            <comment id="101960" author="adilger" created="Thu, 18 Dec 2014 16:33:32 +0000"  >&lt;p&gt;Is the CONFIGS/ directory possibly still in &lt;tt&gt;lost+found&lt;/tt&gt;? There shouldn&apos;t be much left there after ll_recover_lost_found_objs moved all the objects back to their respective directories. &lt;/p&gt;

&lt;p&gt;In particular, CONFIGS/mountdata is one file that isn&apos;t recreated automatically at mount time, since that contains info on how to mount tge filesystem, and is what mount.lustre is looking for. &lt;/p&gt;

&lt;p&gt;We&apos;ve recreated this file in the past by copying it from another OST and binary editing the OST index (two places in struct lustre_disk_data: ldd_svname and ldd_svindex). &lt;/p&gt;

&lt;p&gt;It might be just as easy to create a small test filesystem on a test node like:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;OSTCOUNT=1 FSNAME={name} sh llmount.sh
touch /tmp/ostN
mkfs.lustre --ost --mgsname=$HOSTNAME --index={index of broken OST} /tmp/ostN
losetup /dev/loop4 /tmp/ostN
mkdir /mnt/ostN
mount -t lustre /dev/loop4 /mnt/ostN
umount /mnt/ostN
mount -t ldiskfs /dev/loop4 /mnt/ostN
cp /mnt/ostN/CONFIGS/mountdata /tmp
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="102185" author="hilljjornl" created="Mon, 22 Dec 2014 17:33:26 +0000"  >&lt;p&gt;So I tried the copy from another OST path, modified CONFIGS/mountdata and renamed CONFIGS/sithfs-OST000a to CONFIGS/sithfs-OST000b and then made the modifications there as well. &lt;/p&gt;

&lt;p&gt;Mounting now I get errors to stdout saying bad MGS specification and dmesg shows the following:&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@sith-oss3 CONFIGS&amp;#93;&lt;/span&gt;# dmesg&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;333605.012087&amp;#93;&lt;/span&gt; LDISKFS-fs (sdd): mounted filesystem with ordered data mode. quota=off. Opts: &lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;333607.639176&amp;#93;&lt;/span&gt; LustreError: 11-0: MGC10.36.227.244@o2ib: Communicating with 10.36.227.244@o2ib, operation mgs_target_reg failed with -107.&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;333607.652202&amp;#93;&lt;/span&gt; LustreError: 166-1: MGC10.36.227.244@o2ib: Connection to MGS (at 10.36.227.244@o2ib) was lost; in progress operations using this service will fail&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;333607.667622&amp;#93;&lt;/span&gt; LustreError: 3510:0:(obd_mount_server.c:1120:server_register_target()) sithfs-OST000b: error registering with the MGS: rc = -107 (not fatal)&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;333607.668041&amp;#93;&lt;/span&gt; Lustre: Evicted from MGS (at MGC10.36.227.244@o2ib_0) after server handle changed from 0xddbecfbc8c068220 to 0xddbecfbc8c068506&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;333607.668349&amp;#93;&lt;/span&gt; Lustre: MGC10.36.227.244@o2ib: Connection restored to MGS (at 10.36.227.244@o2ib)&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;333607.706663&amp;#93;&lt;/span&gt; LustreError: 3510:0:(llog_osd.c:254:llog_osd_read_header()) sithfs-OST000b-osd: bad log sithfs-OST000b &lt;span class=&quot;error&quot;&gt;&amp;#91;0xa:0xa:0x0&amp;#93;&lt;/span&gt; header magic: 0x32303020 (expected 0x10645539)&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;333607.723413&amp;#93;&lt;/span&gt; LustreError: 3510:0:(llog_osd.c:254:llog_osd_read_header()) Skipped 1 previous similar message&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;333607.733782&amp;#93;&lt;/span&gt; LustreError: 3510:0:(mgc_request.c:1707:mgc_llog_local_copy()) MGC10.36.227.244@o2ib: failed to copy remote log sithfs-OST000b: rc = -5&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;333607.747895&amp;#93;&lt;/span&gt; LustreError: 13a-8: Failed to get MGS log sithfs-OST000b and no local copy.&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;333607.756602&amp;#93;&lt;/span&gt; LustreError: 15c-8: MGC10.36.227.244@o2ib: The configuration from log &apos;sithfs-OST000b&apos; failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;333607.781476&amp;#93;&lt;/span&gt; LustreError: 3510:0:(obd_mount_server.c:1252:server_start_targets()) failed to start server sithfs-OST000b: -2&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;333607.793410&amp;#93;&lt;/span&gt; LustreError: 3510:0:(obd_mount_server.c:1723:server_fill_super()) Unable to start targets: -2&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;333607.803769&amp;#93;&lt;/span&gt; LustreError: 3510:0:(obd_mount_server.c:845:lustre_disconnect_lwp()) sithfs-MDT0000-lwp-OST000b: Can&apos;t end config log sithfs-client.&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;333607.817767&amp;#93;&lt;/span&gt; LustreError: 3510:0:(obd_mount_server.c:1420:server_put_super()) sithfs-OST000b: failed to disconnect lwp. (rc=-2)&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;333607.830024&amp;#93;&lt;/span&gt; LustreError: 3510:0:(obd_mount_server.c:1450:server_put_super()) no obd sithfs-OST000b&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;333607.949653&amp;#93;&lt;/span&gt; Lustre: server umount sithfs-OST000b complete&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;333607.955474&amp;#93;&lt;/span&gt; LustreError: 3510:0:(obd_mount.c:1325:lustre_fill_super()) Unable to mount  (-2)&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;333707.624293&amp;#93;&lt;/span&gt; LDISKFS-fs (sdd): mounted filesystem with ordered data mode. quota=off. Opts:&lt;/p&gt;

&lt;p&gt;I either missed something or need to remove something from the local (OST) CONFIGS directory, correct? It looks like Lustre is trying to do the right thing and copy the remote log down:&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;333607.733782&amp;#93;&lt;/span&gt; LustreError: 3510:0:(mgc_request.c:1707:mgc_llog_local_copy()) MGC10.36.227.244@o2ib: failed to copy remote log sithfs-OST000b: rc = -5&lt;/p&gt;

&lt;p&gt;Is it just a permission issue? (perms look the same as the OST I copied the CONFIGS directory from..&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@sith-oss3 CONFIGS&amp;#93;&lt;/span&gt;# ls -la&lt;br/&gt;
total 104&lt;br/&gt;
drwxr-xr-x 2 root root  4096 Dec 22 12:29 .&lt;br/&gt;
drwxr-xr-x 7 root root  4096 Dec 18 15:38 ..&lt;br/&gt;
&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;- 1 root root 12289 Dec 18 15:38 mountdata&lt;br/&gt;
&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;- 1 root root     0 Dec 18 15:53 params&lt;br/&gt;
&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;- 1 root root 46081 Dec 18 16:01 sithfs-OST000b&lt;br/&gt;
&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;- 1 root root 30384 Dec 18 15:53 sithfs-client&lt;/p&gt;</comment>
                            <comment id="102188" author="hilljjornl" created="Mon, 22 Dec 2014 18:09:32 +0000"  >&lt;p&gt;I removed the CONFIGS/sithfs-OST000b file and remounted as lustre; got the &quot;correct&quot; version from the MGS. Now to see if the e2fsck completely screwed up the data or not. &lt;/p&gt;

&lt;p&gt;Thanks for your help!&lt;/p&gt;</comment>
                            <comment id="102192" author="adilger" created="Mon, 22 Dec 2014 19:11:52 +0000"  >&lt;p&gt;The CONFIGS/$fsname-OSTnnnn file is unique to each OST, so copying it from the other OST wouldn&apos;t help as you saw.  It is fetched from the MGS at mount.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzx2q7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>16839</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>