<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:20:06 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-15645] gap in recovery llog should not be a fatal error</title>
                <link>https://jira.whamcloud.com/browse/LU-15645</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;A gap in the MDT recovery llog (of unknown origin) was hit during recovery.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;log_process_thread()) lfs02-MDT001e-osp-MDT0000: [0x3:0x1b70:0x4] Invalid record: index 16123 but expected 16122
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;and this was later confirmed with &lt;tt&gt;llog_reader&lt;/tt&gt;:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;rec #15221 type=106a0000 len=1160 offset 17231040
rec #16097 type=106a0000 len=1160 offset 18220168
rec #16098 type=106a0000 len=1160 offset 18221328
rec #16099 type=106a0000 len=1160 offset 18222488
rec #16100 type=106a0000 len=1160 offset 18223648
Previous index is 16121, current 16123, offset 18249168
rec #18718 type=106a0000 len=1160 offset 21180888
rec #20278 type=106a0000 len=1160 offset 22943400
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This caused the MDT recovery to fail and all of the clients were evicted from that MDT.  It isn&apos;t clear whether the global eviction is necessary, or if this should be handled more gracefully?  Other MDTs likely have a copy of that operation for replay, and if not then it would be lost.&lt;/p&gt;

&lt;p&gt;What is more problematic is that this recovery llog error is persistent, and the same problem happens on every recovery for that MDT.  If the clients (and MDTs?) are evicted from recovery, the llog records should at a minimum be cancelled, or the llog file should be cleared.  Better yet would be to not treat this gap as a fatal error, since I don&apos;t think there is anything that can be done about it at this point.&lt;/p&gt;</description>
                <environment></environment>
        <key id="69092">LU-15645</key>
            <summary>gap in recovery llog should not be a fatal error</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bzzz">Alex Zhuravlev</assignee>
                                    <reporter username="adilger">Andreas Dilger</reporter>
                        <labels>
                    </labels>
                <created>Sun, 13 Mar 2022 00:04:33 +0000</created>
                <updated>Thu, 8 Dec 2022 00:02:23 +0000</updated>
                            <resolved>Thu, 5 May 2022 19:04:47 +0000</resolved>
                                    <version>Lustre 2.14.0</version>
                                    <fixVersion>Lustre 2.15.0</fixVersion>
                    <fixVersion>Lustre 2.12.10</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>8</watches>
                                                                            <comments>
                            <comment id="329108" author="bzzz" created="Mon, 14 Mar 2022 09:18:05 +0000"  >&lt;p&gt;I think that VBR checks should ensure that there is no real gap in the transaction (otherwise recovery abort is unavoidable). so there are two major scenario here:&lt;br/&gt;
1) there is a gap in one or few llogs, but the corresponding transaction is duplicated in another llog. in this case skipping such a gap should be transparent.&lt;br/&gt;
2) the gap is &quot;global&quot; (i.e. the corresponding transaction is missing in all the llogs), then we have to abort recovery and cancel all subsequent llogs recods so they don&apos;t cause any problem on the next mount&lt;/p&gt;</comment>
                            <comment id="329178" author="adilger" created="Mon, 14 Mar 2022 18:13:07 +0000"  >&lt;p&gt;I was wondering about the potential sources of a gap in the recovery llog. As you wrote, if there was an actual gap in the updates applied to the MDT objects, then that &lt;em&gt;should&lt;/em&gt; be caught by VBR.&lt;/p&gt;

&lt;p&gt;I think this is a gap in the numerbering of the OUT records in the llog, which seems different. That might be caused by the llog header being written non-atomically with the llog body, which I recall was a bug that was fixed by Mike a while ago. However, it isn&apos;t clear if this gap in the llog numbering is a &quot;real&quot; problem or not? If there are clients waiting on the recovery of this transaction, wouldn&apos;t they have it pending replay in their own recovery logs also?&lt;/p&gt;

&lt;p&gt;In either case, if the clients are evicted, then definitely the recovery log needs to be cleaned up so that this gap does not cause future problems. &lt;/p&gt;</comment>
                            <comment id="329350" author="gerrit" created="Wed, 16 Mar 2022 09:13:53 +0000"  >&lt;p&gt;&quot;Alex Zhuravlev &amp;lt;bzzz@whamcloud.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/46837&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/46837&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15645&quot; title=&quot;gap in recovery llog should not be a fatal error&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15645&quot;&gt;&lt;del&gt;LU-15645&lt;/del&gt;&lt;/a&gt; obdclass: llog to handle gaps&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 7fa151797bbba83d231828963fa66a288ede1de0&lt;/p&gt;</comment>
                            <comment id="329574" author="eaujames" created="Fri, 18 Mar 2022 09:52:26 +0000"  >&lt;p&gt;Hello,&lt;br/&gt;
What is the behavior when the corrupted llog block is rewritten?&lt;br/&gt;
Is there a risk where a write on the missing record overwrite an existing one?&lt;/p&gt;</comment>
                            <comment id="329622" author="adilger" created="Fri, 18 Mar 2022 16:29:02 +0000"  >&lt;p&gt;Etienne, I don&apos;t think there is anything done to &lt;b&gt;rewrite&lt;/b&gt; the blog with the gap, it is just skipped without causing the recovery to fail. &lt;/p&gt;</comment>
                            <comment id="331289" author="gerrit" created="Thu, 7 Apr 2022 08:47:25 +0000"  >&lt;p&gt;&quot;Mike Pershin &amp;lt;mpershin@whamcloud.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/47011&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/47011&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15645&quot; title=&quot;gap in recovery llog should not be a fatal error&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15645&quot;&gt;&lt;del&gt;LU-15645&lt;/del&gt;&lt;/a&gt; obdclass: llog to handle gaps&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 9e8fd74dc5d3e60163884cf51ad27dc6dba7c72f&lt;/p&gt;</comment>
                            <comment id="333930" author="gerrit" created="Thu, 5 May 2022 18:44:50 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/46837/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/46837/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15645&quot; title=&quot;gap in recovery llog should not be a fatal error&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15645&quot;&gt;&lt;del&gt;LU-15645&lt;/del&gt;&lt;/a&gt; obdclass: llog to handle gaps&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 903f2f663956fef380b9f383e73a05b7beb0baa5&lt;/p&gt;</comment>
                            <comment id="333946" author="pjones" created="Thu, 5 May 2022 19:04:47 +0000"  >&lt;p&gt;Landed for 2.15&lt;/p&gt;</comment>
                            <comment id="347146" author="gerrit" created="Tue, 20 Sep 2022 03:35:35 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/47011/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/47011/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15645&quot; title=&quot;gap in recovery llog should not be a fatal error&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15645&quot;&gt;&lt;del&gt;LU-15645&lt;/del&gt;&lt;/a&gt; obdclass: llog to handle gaps&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 4a4e38a2769089ddf2430983c2d607683cd12986&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="69090">LU-15644</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="69804">LU-15761</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="69093">LU-15646</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="70717">LU-15938</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i02krb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>