<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:01:14 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-6556] changelog catalog corruption if all possible records is define </title>
                <link>https://jira.whamcloud.com/browse/LU-6556</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;After our last lustre upgrade, On tera100 and tgcc site, some &lt;br/&gt;
lustre fs have meet the same corruption on the changelog_catalog &lt;br/&gt;
The robinhood node panic like in the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6471&quot; title=&quot;Unexpected Lustre Client LBUG in llog_write()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6471&quot;&gt;&lt;del&gt;LU-6471&lt;/del&gt;&lt;/a&gt; and the crash analyze &lt;br/&gt;
show that is the changelog-catalog file that have a corruption. &lt;br/&gt;
The file is too big than the maximum size of this type of file and &lt;br/&gt;
the record who produces the panic is not in the right place.&lt;/p&gt;</description>
                <environment>redhat kernel 2.6.32_504.8.1.el6&lt;br/&gt;
lustre 2;5.3.90 + &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5740&quot; title=&quot;Kernel upgrade [RHEL6.6 2.6.32-504.el6]&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5740&quot;&gt;&lt;strike&gt;LU-5740&lt;/strike&gt;&lt;/a&gt; &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4582&quot; title=&quot;After failing over Lustre MGS node to the secondary, client mount fails with -5&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4582&quot;&gt;&lt;strike&gt;LU-4582&lt;/strike&gt;&lt;/a&gt; &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5678&quot; title=&quot;kernel crash due to NULL pointer dereference in kiblnd_pool_alloc_node()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5678&quot;&gt;&lt;strike&gt;LU-5678&lt;/strike&gt;&lt;/a&gt; &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5393&quot; title=&quot;LBUG: (ost_handler.c:882:ost_brw_read()) ASSERTION( local_nb[i].rc == 0 ) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5393&quot;&gt;&lt;strike&gt;LU-5393&lt;/strike&gt;&lt;/a&gt; &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3727&quot; title=&quot;LBUG (llite_nfs.c:281:ll_get_parent()) ASSERTION(body-&amp;gt;valid &amp;amp; OBD_MD_FLID) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3727&quot;&gt;&lt;strike&gt;LU-3727&lt;/strike&gt;&lt;/a&gt; &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4528&quot; title=&quot;osd_trans_exec_op()) ASSERTION( oti-&amp;gt;oti_declare_ops_rb[rb] &amp;gt; 0 ) failed: rb = 0&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4528&quot;&gt;&lt;strike&gt;LU-4528&lt;/strike&gt;&lt;/a&gt; &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5522&quot; title=&quot;ofd_prolong_extent_locks()) ASSERTION( lock-&amp;gt;l_flags &amp;amp; 0x0000000000000020ULL ) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5522&quot;&gt;&lt;strike&gt;LU-5522&lt;/strike&gt;&lt;/a&gt; &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5264&quot; title=&quot;ASSERTION( info-&amp;gt;oti_r_locks == 0 ) at OST umount&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5264&quot;&gt;&lt;strike&gt;LU-5264&lt;/strike&gt;&lt;/a&gt; &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6049&quot; title=&quot;General Protection Fault at echo_session_key_fini+0xa9&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6049&quot;&gt;&lt;strike&gt;LU-6049&lt;/strike&gt;&lt;/a&gt; &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6084&quot; title=&quot;Tests are failed due to &amp;#39;recovery is aborted by hard timeout&amp;#39;&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6084&quot;&gt;&lt;strike&gt;LU-6084&lt;/strike&gt;&lt;/a&gt; &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5764&quot; title=&quot;Crash of MDS on &amp;quot;apparent buffer overflow&amp;quot;&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5764&quot;&gt;&lt;strike&gt;LU-5764&lt;/strike&gt;&lt;/a&gt;</environment>
        <key id="29826">LU-6556</key>
            <summary>changelog catalog corruption if all possible records is define </summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bfaccini">Bruno Faccini</assignee>
                                    <reporter username="apercher">Antoine Percher</reporter>
                        <labels>
                    </labels>
                <created>Sat, 2 May 2015 15:46:04 +0000</created>
                <updated>Thu, 7 Jan 2016 05:43:11 +0000</updated>
                            <resolved>Wed, 28 Oct 2015 18:30:01 +0000</resolved>
                                    <version>Lustre 2.5.3</version>
                                    <fixVersion>Lustre 2.8.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>18</watches>
                                                                            <comments>
                            <comment id="114097" author="pjones" created="Sat, 2 May 2015 16:32:47 +0000"  >&lt;p&gt;Bruno &lt;/p&gt;

&lt;p&gt;Can you please confirm whether this does indeed meet the profile of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6471&quot; title=&quot;Unexpected Lustre Client LBUG in llog_write()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6471&quot;&gt;&lt;del&gt;LU-6471&lt;/del&gt;&lt;/a&gt;?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="114098" author="bfaccini" created="Sat, 2 May 2015 16:40:26 +0000"  >&lt;p&gt;Assigning to me since I have been working on this issue when being on-site with Antoine.&lt;/p&gt;

&lt;p&gt;In fact, after our joint analysis, it seems that the crash has occurred a few time after the upgrade+reboot of all nodes because of the 2 combined things :&lt;br/&gt;
1) a ChangeLog Catalog that has already looped-back&lt;br/&gt;
2) integrating/running with &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4528&quot; title=&quot;osd_trans_exec_op()) ASSERTION( oti-&amp;gt;oti_declare_ops_rb[rb] &amp;gt; 0 ) failed: rb = 0&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4528&quot;&gt;&lt;del&gt;LU-4528&lt;/del&gt;&lt;/a&gt; patch that seems to have introduced a regression where looped-back Catalogs are not handled correctly and are only expected to grow when it is not the case.&lt;/p&gt;

&lt;p&gt;Will push a patch soon to master where the problem seems to be also present (but still undetected due to the need of a looped-back Catalog situation to trigger which should not occur so frequently ...).&lt;/p&gt;
</comment>
                            <comment id="114099" author="bfaccini" created="Sat, 2 May 2015 16:43:27 +0000"  >&lt;p&gt;Peter, no in this particular case the similar LBUG than for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6471&quot; title=&quot;Unexpected Lustre Client LBUG in llog_write()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6471&quot;&gt;&lt;del&gt;LU-6471&lt;/del&gt;&lt;/a&gt; is valid since it is triggered due to the ChangeLog Catalog corruption I just explained before in my previous comment.&lt;/p&gt;</comment>
                            <comment id="114849" author="apercher" created="Mon, 11 May 2015 14:06:58 +0000"  >&lt;p&gt;Please find in attachment customer trace analyze file (log_lu-6556_b.txt)&lt;br/&gt;
to understand the changelog-catalog coruption &lt;/p&gt;</comment>
                            <comment id="116126" author="gerrit" created="Thu, 21 May 2015 18:11:02 +0000"  >&lt;p&gt;Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/14912&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/14912&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6556&quot; title=&quot;changelog catalog corruption if all possible records is define &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6556&quot;&gt;&lt;del&gt;LU-6556&lt;/del&gt;&lt;/a&gt; obdclass: re-allow catalog loopback&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 8112eb1a2d3dd5beed00c75d76295ae1362ab19a&lt;/p&gt;</comment>
                            <comment id="116280" author="bfaccini" created="Sat, 23 May 2015 22:03:51 +0000"  >&lt;p&gt;I have created &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6634&quot; title=&quot;(osd_handler.c:901:osd_trans_start()) ASSERTION( get_current()-&amp;gt;journal_info == ((void *)0) ) failed: when reaching Catalog full condition&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6634&quot;&gt;&lt;del&gt;LU-6634&lt;/del&gt;&lt;/a&gt; because during my testing of my patch for this/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6556&quot; title=&quot;changelog catalog corruption if all possible records is define &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6556&quot;&gt;&lt;del&gt;LU-6556&lt;/del&gt;&lt;/a&gt; ticket, I wanted to check what happen (expected ENOSPC return!) after Catalog has looped-back and fills up, but I got a &quot;(osd_handler.c:901:osd_trans_start()) ASSERTION( get_current()-&amp;gt;journal_info == ((void *)0) ) failed:&quot; LBUG. I have identified this to be caused by the fact that in the error path for llog_cat_new_log(), llog_destroy() is called to destroy the new plain LLOG for which the reference can&apos;t be recorded into Catalog because there is no slot available to do so, and this triggers the Assertion because there is already a started transaction from llog_cat_add(), when llog_destroy() wants to start its own transaction.&lt;/p&gt;</comment>
                            <comment id="127260" author="adilger" created="Mon, 14 Sep 2015 17:59:46 +0000"  >&lt;p&gt;I think the other thing that is needed here is to automatically unregister ChangeLog watcher(s) if the changelog is full or the MDS runs out of space (by default), or block all MDS operations until the ChangeLog can be written (if /proc tunable is set to make ChangeLog updates mandatory).  It should unregister starting with the oldest watcher on the assumption that the older watcher was forgotten and newer ones are still running, and that this will release the most space.  The unregistration should cancel records up to the next watcher, or all remaining records if no other watchers are left.&lt;/p&gt;</comment>
                            <comment id="127266" author="rread" created="Mon, 14 Sep 2015 18:30:49 +0000"  >&lt;p&gt;I suggest going a step further and proactively remove stale watchers after a configurable period or when hitting a max watermark to try o avoid running out of space. Also, being unregistered is a reasonable notification to the application that they&apos;ve lost their changelog feed and need to resync. &lt;/p&gt;</comment>
                            <comment id="130718" author="gerrit" created="Sat, 17 Oct 2015 23:32:22 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/14912/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/14912/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6556&quot; title=&quot;changelog catalog corruption if all possible records is define &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6556&quot;&gt;&lt;del&gt;LU-6556&lt;/del&gt;&lt;/a&gt; obdclass: re-allow catalog to wrap around&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 4691290f6d39bffaa3e463697fbc3ac351015e76&lt;/p&gt;</comment>
                            <comment id="131443" author="bfaccini" created="Fri, 23 Oct 2015 22:11:21 +0000"  >&lt;p&gt;Andreas, Robert,&lt;br/&gt;
I also think that your concerns are really good points for more ChangeLogs related enhancements, but also that they should addressed in a separate ticket, when this ticket could now be closed.&lt;br/&gt;
Do you agree ?&lt;/p&gt;</comment>
                            <comment id="131471" author="adilger" created="Sun, 25 Oct 2015 07:30:00 +0000"  >&lt;p&gt;Bruno, that is fine.  Please file a separate bug and copy over relevant comments before closing this one, so that they are not forgotten.&lt;/p&gt;</comment>
                            <comment id="131589" author="bfaccini" created="Mon, 26 Oct 2015 17:54:18 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7340&quot; title=&quot;ChangeLogs catalog full condition should be handled more gracefully&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7340&quot;&gt;&lt;del&gt;LU-7340&lt;/del&gt;&lt;/a&gt; has been created to address previous ChangeLogs related and more graceful handling of ENOSPC conditions.&lt;/p&gt;</comment>
                            <comment id="131890" author="jgmitter" created="Wed, 28 Oct 2015 18:30:01 +0000"  >&lt;p&gt;Landed to 2.8&lt;/p&gt;</comment>
                            <comment id="135873" author="adilger" created="Thu, 10 Dec 2015 17:40:41 +0000"  >&lt;p&gt;Backported of the &lt;a href=&quot;http://review.whamcloud.com/14912&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/14912&lt;/a&gt; patch also need the patch &lt;a href=&quot;http://review.whamcloud.com/17052&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/17052&lt;/a&gt; &quot;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7329&quot; title=&quot;sanity test_60a timeouts with &#8220;* invoking oom-killer&#8221; &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7329&quot;&gt;&lt;del&gt;LU-7329&lt;/del&gt;&lt;/a&gt; obdclass: sync device to flush journal callbacks&quot; patch to avoid introducing test failures in sanity test_60a. &lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="22851">LU-4528</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="32777">LU-7329</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="32056">LU-7138</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="30350">LU-6634</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="32426">LU-7241</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="31350">LU-6954</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="17718" name="log_lu-6556_b.txt" size="21823" author="apercher" created="Mon, 11 May 2015 14:06:58 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzxcdz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>