<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:43:18 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-11373] CERROR/CWARN messages are not throttled</title>
                <link>https://jira.whamcloud.com/browse/LU-11373</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;It appears that CDEBUG_LIMIT is not working properly, since test logs in Maloo are full of repeated error messages. This would be a nightmare for a large cluster if there are many clients and servers spewing repeated messages.&lt;/p&gt;

&lt;p&gt;For example, sanity test_60a on the MDS console:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[ 2786.154265] Lustre: 30107:0:(llog_cat.c:98:llog_cat_new_log()) MGS: there are no more free slots in catalog e973af35
[ 2786.155427] Lustre: 30107:0:(llog_cat.c:98:llog_cat_new_log()) MGS: there are no more free slots in catalog e973af35
[ 2786.156482] Lustre: 30107:0:(llog_cat.c:98:llog_cat_new_log()) MGS: there are no more free slots in catalog e973af35
[ 2786.157628] Lustre: 30107:0:(llog_cat.c:98:llog_cat_new_log()) MGS: there are no more free slots in catalog e973af35
[ 2786.158671] Lustre: 30107:0:(llog_cat.c:98:llog_cat_new_log()) MGS: there are no more free slots in catalog e973af35
[ 2786.159789] Lustre: 30107:0:(llog_cat.c:98:llog_cat_new_log()) MGS: there are no more free slots in catalog e973af35
[ 2786.160824] Lustre: 30107:0:(llog_cat.c:98:llog_cat_new_log()) MGS: there are no more free slots in catalog e973af35
[ 2786.161934] Lustre: 30107:0:(llog_cat.c:98:llog_cat_new_log()) MGS: there are no more free slots in catalog e973af35
[ 2786.162977] Lustre: 30107:0:(llog_cat.c:98:llog_cat_new_log()) MGS: there are no more free slots in catalog e973af35
[ 2786.164074] Lustre: 30107:0:(llog_cat.c:98:llog_cat_new_log()) MGS: there are no more free slots in catalog e973af35
[ 2786.165111] Lustre: 30107:0:(llog_cat.c:98:llog_cat_new_log()) MGS: there are no more free slots in catalog e973af35
[repeats hundreds of times]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It &lt;em&gt;might&lt;/em&gt; relate to ktime_t patches that James landed previously, but that is just speculation as I haven&apos;t investigated it yet.&lt;/p&gt;

&lt;p&gt;The sanity &lt;tt&gt;test_60b&lt;/tt&gt; should be catching the failure of &lt;tt&gt;CDEBUG_LIMIT()&lt;/tt&gt; but it is checking the logs on the client, while the test is being run on the MGS.&lt;/p&gt;</description>
                <environment></environment>
        <key id="53297">LU-11373</key>
            <summary>CERROR/CWARN messages are not throttled</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="adilger">Andreas Dilger</assignee>
                                    <reporter username="adilger">Andreas Dilger</reporter>
                        <labels>
                    </labels>
                <created>Wed, 12 Sep 2018 21:52:51 +0000</created>
                <updated>Wed, 31 Oct 2018 06:15:35 +0000</updated>
                            <resolved>Tue, 30 Oct 2018 03:54:36 +0000</resolved>
                                    <version>Lustre 2.12.0</version>
                                    <fixVersion>Lustre 2.12.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                            <comments>
                            <comment id="233424" author="gerrit" created="Wed, 12 Sep 2018 21:58:16 +0000"  >&lt;p&gt;Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/33154&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/33154&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11373&quot; title=&quot;CERROR/CWARN messages are not throttled&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11373&quot;&gt;&lt;del&gt;LU-11373&lt;/del&gt;&lt;/a&gt; tests: fix sanity test_60b to run on the MGS&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 1c37077c2d5340a56914908e5e3f469b3d1d8394&lt;/p&gt;</comment>
                            <comment id="234167" author="gerrit" created="Mon, 1 Oct 2018 14:01:02 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/33154/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/33154/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11373&quot; title=&quot;CERROR/CWARN messages are not throttled&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11373&quot;&gt;&lt;del&gt;LU-11373&lt;/del&gt;&lt;/a&gt; libcfs: fix wrong check in libcfs_debug_vmsg2()&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 4037c1462730c120af00bd0e95b08fe0669e0271&lt;/p&gt;</comment>
                            <comment id="234184" author="pjones" created="Mon, 1 Oct 2018 18:54:11 +0000"  >&lt;p&gt;Landed for 2.12&lt;/p&gt;</comment>
                            <comment id="234922" author="bruno" created="Mon, 15 Oct 2018 14:15:10 +0000"  >&lt;p&gt;Just in case, I have just experienced an Oops due to soft-lockup (i.e. with&#160;/proc/sys/kernel/softlockup_panic set to 1) where my crash-dump analysis (all cores but one stuck in&#160;stop_machine()/stop_cpu() during module unload, when the last core seems to print Console msgs since too long time, right after sanity/test_60a/llog_test_10() and&#160;due to catalog full simulation)&#160;has permitted me to also identify that the root cause could be this Console msgs rate limitation failure/regression!&lt;/p&gt;

&lt;p&gt;But just in case a working rate limiting will not be enough to fix, I also wonder if the number of LLOG record creation/write attempts could be slightly reduced in&#160;sanity/test_60a/llog_test_10() and I have also found a specific module-unload/stop_machine() scenario and fix report at &lt;a href=&quot;https://lists.openwall.net/linux-kernel/2015/08/19/367&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://lists.openwall.net/linux-kernel/2015/08/19/367&lt;/a&gt;&#160;which could be of interest.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="235383" author="jamesanunez" created="Tue, 23 Oct 2018 19:54:34 +0000"  >&lt;p&gt;Reopened ticket because we have at least two recent sanity test 60b failures after the patch for this ticket was landed; &lt;a href=&quot;https://testing.whamcloud.com/test_sets/e51fa37e-d6db-11e8-ad90-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/e51fa37e-d6db-11e8-ad90-52540065bddc&lt;/a&gt; and &lt;a href=&quot;https://testing.whamcloud.com/test_sets/45b22ad4-d6d4-11e8-ad90-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/45b22ad4-d6d4-11e8-ad90-52540065bddc&lt;/a&gt; .&lt;/p&gt;</comment>
                            <comment id="235448" author="adilger" created="Wed, 24 Oct 2018 22:45:59 +0000"  >&lt;p&gt;It looks like the failure is with 103 debug messages vs. expected 100 messages. I can expect DNE is more verbose (it looks like the other MDTs are still mounted and complaining), so bumping this limit up a bit is fine. &lt;/p&gt;</comment>
                            <comment id="235456" author="jamesanunez" created="Thu, 25 Oct 2018 00:35:32 +0000"  >&lt;p&gt;Looking at the failures so far, it looks like ZFS is the factor in these failures. Looking at all sanity test 60b from the past 3 days, none of the ldiskfs test sessions, including DNE test sessions, had any failures. &lt;/p&gt;

&lt;p&gt;All failures are for ZFS test sessions. &lt;/p&gt;

&lt;p&gt;It looks like all the failures fail with 102 or 103 debug lines. Although all of these failures count two extra lines from test_60aa (&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11566&quot; title=&quot;sanity test_60aa: llog_print_cb()) not enough space for print log records&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11566&quot;&gt;&lt;del&gt;LU-11566&lt;/del&gt;&lt;/a&gt;), I&apos;ll submit a patch to increase the check from 100 to 103. &lt;/p&gt;</comment>
                            <comment id="235457" author="gerrit" created="Thu, 25 Oct 2018 00:49:50 +0000"  >&lt;p&gt;James Nunez (jnunez@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/33474&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/33474&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11373&quot; title=&quot;CERROR/CWARN messages are not throttled&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11373&quot;&gt;&lt;del&gt;LU-11373&lt;/del&gt;&lt;/a&gt; tests: increase debug limit sanity 60b&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 8a6f5af4b66815f97e7437a1d4d98b49f88a43e2&lt;/p&gt;</comment>
                            <comment id="235858" author="gerrit" created="Tue, 30 Oct 2018 01:04:14 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/33474/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/33474/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11373&quot; title=&quot;CERROR/CWARN messages are not throttled&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11373&quot;&gt;&lt;del&gt;LU-11373&lt;/del&gt;&lt;/a&gt; tests: increase debug limit sanity 60b&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 4afee32188d38d2969ea160ddec64cd8359e12ca&lt;/p&gt;</comment>
                            <comment id="235870" author="pjones" created="Tue, 30 Oct 2018 03:54:37 +0000"  >&lt;p&gt;Landed for 2.12&lt;/p&gt;</comment>
                            <comment id="236018" author="adilger" created="Wed, 31 Oct 2018 06:15:35 +0000"  >&lt;p&gt;Any patches failing due to this issue need to rebase to get the fix.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="53332">LU-11384</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="53329">LU-11383</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="53809">LU-11566</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i002bz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>