<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:43:06 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4481] Impossible to start changelogs after corruption</title>
                <link>https://jira.whamcloud.com/browse/LU-4481</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;On a customer cluster, changelogs refuse to start, probably because of an internal data corruption.&lt;br/&gt;
Here are the messages we can see when mounting the MDT:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;1373184833 2013 Jul 7 10:13:53 bcluster111 kern err kernel LustreError: 14143:0:(llog_lvfs.c:199:llog_lvfs_read_header()) bad log header magic: 0x10670000 (expected 0x10645539)
1373184833 2013 Jul 7 10:13:53 bcluster111 kern err kernel LustreError: 14143:0:(llog_obd.c:320:cat_cancel_cb()) Cannot find handle for log 0x1490186b: -5
1373184833 2013 Jul 7 10:13:53 bcluster111 kern err kernel LustreError: 14133:0:(llog_obd.c:393:llog_obd_origin_setup()) llog_process() with cat_cancel_cb failed: -5
1373184833 2013 Jul 7 10:13:53 bcluster111 kern err kernel LustreError: 14133:0:(llog_obd.c:220:llog_setup_named()) obd mdd_obd-scratch3-MDT0000 ctxt 14 lop_setup=ffffffffa0501cc0 failed -5
1373184833 2013 Jul 7 10:13:53 bcluster111 kern err kernel LustreError: 14133:0:(mds_log.c:218:mds_changelog_llog_init()) changelog users llog setup failed -5
1373184835 2013 Jul 7 10:13:55 bcluster111 kern err kernel LustreError: 14133:0:(mdd_device.c:216:mdd_changelog_llog_init()) no changelog user context
1373184835 2013 Jul 7 10:13:55 bcluster111 kern err kernel LustreError: 14133:0:(mdd_device.c:254:mdd_changelog_init()) Changelog setup during init failed -22
1373184835 2013 Jul 7 10:13:55 bcluster111 kern warning kernel Lustre: scratch3-MDT0000: used disk, loading
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So the MDt is started, but without changelogs.&lt;/p&gt;

&lt;p&gt;And if we try to look at changelog_users with lctl:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# lctl get_param mdd.scratch3-MDT0000.changelog_users
error: get_param: read(&apos;/proc/fs/lustre/mdd/scratch3-MDT0000/changelog_users&apos;) failed: No such device or address
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The problem is the customer needs Lustre changelogs because they are consumed by Robinhood to monitor activity on the file system.&lt;/p&gt;

&lt;p&gt;So the first thing we need is a way to restart changelogs as soon as possible. We already tried any administrative lustre command (lfs or lctl) to cleanup things, but it did not work due to the fact that the feature did not start. Manually cleaning OBJETS files is not a thing we tried, for fear of making the situation even worse.&lt;/p&gt;

&lt;p&gt;After the changelogs will be restarted on site, we will need a fix so that changelogs can deal with corrupted data and start afresh in that case.&lt;/p&gt;

&lt;p&gt;But again, the really first thing we need is a helping hand to clean things on site and restart changelogs ASAP.&lt;/p&gt;

&lt;p&gt;TIA,&lt;br/&gt;
Sebastien.&lt;/p&gt;</description>
                <environment></environment>
        <key id="22730">LU-4481</key>
            <summary>Impossible to start changelogs after corruption</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bfaccini">Bruno Faccini</assignee>
                                    <reporter username="sebastien.buisson">Sebastien Buisson</reporter>
                        <labels>
                    </labels>
                <created>Tue, 14 Jan 2014 08:56:14 +0000</created>
                <updated>Wed, 13 Oct 2021 03:04:39 +0000</updated>
                            <resolved>Wed, 13 Oct 2021 03:04:39 +0000</resolved>
                                    <version>Lustre 2.1.6</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="74915" author="bfaccini" created="Tue, 14 Jan 2014 10:34:23 +0000"  >&lt;p&gt;Hello Seb,&lt;br/&gt;
Can you run debugfs on the concerned MDT and do a &quot;stat changelog_users&quot; and &quot;stat changelog_catalog&quot; to provide their infos?&lt;br/&gt;
Then can you mount the MDT as ldiskfs and do a &quot;xxd changelog_users&quot; to provide its corrupted content?&lt;br/&gt;
BTW, the &quot;0x10670000&quot; wrong value, vs expected LLOG_HDR_MAGIC, looks like CHANGELOG_USER_REC &#8230; &lt;br/&gt;
In the same time, I am trying to setup a platform to test a &lt;span class=&quot;error&quot;&gt;&amp;#91;manual?&amp;#93;&lt;/span&gt; way for you to recover from this.&lt;/p&gt;</comment>
                            <comment id="75073" author="sebastien.buisson" created="Thu, 16 Jan 2014 09:24:23 +0000"  >&lt;p&gt;Hi Bruno,&lt;/p&gt;

&lt;p&gt;Please find attached two files:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;debugfs_stat: stat of the 2 files changelog_users and changelog_catalog in debugfs;&lt;/li&gt;
	&lt;li&gt;changelog_users_xxd: xxd of the file changelog_users with the MDT mounted in ldiskfs.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;HTH,&lt;br/&gt;
Sebastien.&lt;/p&gt;</comment>
                            <comment id="75075" author="sebastien.buisson" created="Thu, 16 Jan 2014 10:08:06 +0000"  >&lt;p&gt;Bruno,&lt;/p&gt;

&lt;p&gt;As there is this message &quot;Cannot find handle for log 0x1490186b: -5&quot; in the logs, people on site have also stated this file and taken an od dump:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@bcluster111 OBJECTS] # pwd
/mnt/scratch3/mdt/0_ldiskfs/OBJECTS
[root@bcluster111 OBJECTS] # od -tx4 1490186b:dbc122a8 | more
0000000 00000028 00000001 10670000 00000000
0000020 00000001 00000000 b3ef6c29 00000000
0000040 00000028 00000001 00000002 00000058
0000060 00000000 00000005 00000001 00000000
0000100 00000000 00000000 00000000 00000000
*
0000140 00000000 00000000 00000003 00000000
0000160 00000000 00000000 00000000 00000000
*
0020000 00000000 00000000 00002000 00000001
0020020 00000001 00000000 b3ef4519 00000000
0020040 00000028 00000001
0020050

[root@bcluster111 OBJECTS] # od -tx4 149025cb:672131de | head -5
0000000 00002000 00000000 10645539 00000000
0000020 51001a9c 00000000 00000012 00000058
0000040 00000040 00000002 00000001 00000000
0000060 00000000 00000000 00000000 00000000
*

[root@bcluster111 OBJECTS] # stat 1490186b:dbc122a8
  File: `1490186b:dbc122a8&apos;
  Size: 8232 Blocks: 24 IO Block: 4096 regular file
Device: fd01h/64769d Inode: 344987755 Links: 1
Access: (0666/-rw-rw-rw-) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2013-07-12 11:59:30.727996524 +0200
Modify: 2012-08-21 14:42:09.320548695 +0200
Change: 2012-08-21 14:42:09.320548695 +0200
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Maybe you could find this helpful.&lt;/p&gt;

&lt;p&gt;Sebastien.&lt;/p&gt;</comment>
                            <comment id="75077" author="bfaccini" created="Thu, 16 Jan 2014 11:09:18 +0000"  >&lt;p&gt;Yes thanks, it was the next step/need I would have requested since changelog_user is ok and pointing to OBJECTS/149025cb:672131de !!&lt;br/&gt;
And this is this last file which is corrupted, as the &quot;Cannot find handle for log 0x1490186b: -5&quot; msg you point indicated.&lt;/p&gt;

&lt;p&gt;Will try to get back soon with a bypass/reconstruct procedure.&lt;/p&gt;</comment>
                            <comment id="75270" author="bfaccini" created="Mon, 20 Jan 2014 14:52:35 +0000"  >&lt;p&gt;Hello Seb,&lt;br/&gt;
I made some more work/tests in order for you to be able to restart Change-Logs.&lt;br/&gt;
A &quot;normal&quot; OBJECTS file pointed by changelog_users with a single/cl1 id registered looks like :&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;0000000 00002000 00000000 10645539 00000000
0000020 52dd1c70 00000000 00000002 00000058
0000040 00000000 00000005 00000001 00000000
0000060 00000000 00000000 00000000 00000000
*
0000120 00000000 00000000 00000003 00000000
0000140 00000000 00000000 00000000 00000000
*
0017760 00000000 00000000 00002000 00000001
0020000 00000028 00000001 10670000 00000000
0020020 00000001 00000000 00000dd7 00000000
0020040 00000028 00000001
0020050
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;So your is badly corrupted and lacks its full header record !!&#8230; But BTW, there is no need to try reconstruct it, since as I understood you restarted and used the MDT without Change-Logs enabled, so a full RobinHood scan will be required to re-populate its database from scratch.&lt;/p&gt;

&lt;p&gt;Then, to be able to restart Change-Logs, you will need to umount/stop the MDT, mount it as LDISKFS, move/mv both CONFIGS/changelog_&lt;span class=&quot;error&quot;&gt;&amp;#91;catalog,users&amp;#93;&lt;/span&gt; to new-names, re-start/mount the MDT, and re-register the change-log user/id configured in RobinHood.&lt;/p&gt;

&lt;p&gt;Moving both CONFIGS/changelog_&lt;span class=&quot;error&quot;&gt;&amp;#91;catalog,users&amp;#93;&lt;/span&gt; to new-names is strongly required since it will allow for later OBJECTS/* related files clean-up.&lt;/p&gt;

&lt;p&gt;Additionally, when you say &quot;we will need a fix so that changelogs can deal with corrupted data and start afresh in that case&quot;, do you mean that the procedure I described before should be automatic during MDT mount/start ?&lt;/p&gt;</comment>
                            <comment id="75278" author="sebastien.buisson" created="Mon, 20 Jan 2014 16:22:16 +0000"  >&lt;p&gt;Hi Bruno,&lt;/p&gt;

&lt;p&gt;Thanks for the procedure, I have forwarded it to onsite support team.&lt;/p&gt;

&lt;p&gt;Concerning the fix, what I meant was that MDT should be able to cope with a corrupted OBJECTS file and start changelog feature even in that case, for instance by ignoring it. But after reading your last comment, it seems that this &quot;restart in degraded mode&quot; way would lead to an implicit reset of changelogs config (users and catalog). So this may be a little bit too strong...&lt;br/&gt;
On the other hand, I think resetting changelogs config (via lctl or lfs command) should be possible even if the feature is not started on the MDT. That would avoid mounting the MDT as ldiskfs and manually moving files.&lt;br/&gt;
What do you think?&lt;/p&gt;

&lt;p&gt;Thanks!&lt;br/&gt;
Sebastien.&lt;/p&gt;</comment>
                            <comment id="75856" author="sebastien.buisson" created="Wed, 29 Jan 2014 15:53:02 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;Concerning the procedure, I have confirmation from the Support team that it worked fine. Changelogs are now functional on site, thanks!&lt;/p&gt;

&lt;p&gt;Cheers,&lt;br/&gt;
Sebastien.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="13989" name="changelog_users_xxd" size="34572" author="sebastien.buisson" created="Thu, 16 Jan 2014 09:24:23 +0000"/>
                            <attachment id="13990" name="debugfs_stat" size="14710" author="sebastien.buisson" created="Thu, 16 Jan 2014 09:24:23 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10490" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>End date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Wed, 29 Jan 2014 08:56:14 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwcrr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>12272</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10493" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>Start date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Tue, 14 Jan 2014 08:56:14 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    </customfields>
    </item>
</channel>
</rss>