<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:45:11 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4711] Console rate limiting logic sets cdls_delay smaller than the minimum in libcfs_debug_vmsg2</title>
                <link>https://jira.whamcloud.com/browse/LU-4711</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;The console message rate limiting logic which is used by CDEBUG_LIMIT, and therefore by CWARN, CERROR, CNETERR, and CEMERG, has a bug which causes some messages to be printed when they should be skipped.&lt;/p&gt;

&lt;p&gt;For example, here are three messages from the same function on one node:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2013-05-24T22:37:30.674942-05:00 c2-0c0s7n3 Lustre: 7197:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1369453043/real 1369453043]  req@ffff880fef070c00 x1435963081072428/t0(0) o400-&amp;gt;snx11014-OST0006-osc-ffff881039030c00@10.10.100.6@o2ib1003:28/4 lens 224/224 e 0 to 1 dl 1369453050 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
2013-05-24T22:37:30.683792-05:00 c2-0c0s7n3 Lustre: 7197:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 11 previous similar messages
2013-05-24T22:37:30.800884-05:00 c2-0c0s7n3 Lustre: 7197:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1369453043/real 1369453043]  req@ffff880fef070800 x1435963081072424/t0(0) o400-&amp;gt;snx11014-OST0005-osc-ffff881039030c00@10.10.100.6@o2ib1003:28/4 lens 224/224 e 0 to 1 dl 1369453050 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;These messages are all from the same function and the same thread.  The first message is printed at 22:37.30.  At the same time, it prints &quot;Skipped %d previous similar messages.&quot;  That is correct.  What is not correct is that the next ptlrpc_expire_one_request message is printed at 22:37.30.  This message should have been skipped by the rate limiting logic.&lt;/p&gt;

&lt;p&gt;I have identified the problem in the code, and I have tested a fix.  The problem is with this code:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (cfs_time_after(cfs_time_current(), cdls-&amp;gt;cdls_next +
                                                       libcfs_console_max_delay
                                                       + cfs_time_seconds(10))) {
                        &lt;span class=&quot;code-comment&quot;&gt;/* last timeout was a &lt;span class=&quot;code-object&quot;&gt;long&lt;/span&gt; time ago */&lt;/span&gt;
                        cdls-&amp;gt;cdls_delay /= libcfs_console_backoff * 4;
                } &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt; {
                        cdls-&amp;gt;cdls_delay *= libcfs_console_backoff;

                        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (cdls-&amp;gt;cdls_delay &amp;lt; libcfs_console_min_delay)
                                cdls-&amp;gt;cdls_delay = libcfs_console_min_delay;
                        &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (cdls-&amp;gt;cdls_delay &amp;gt; libcfs_console_max_delay)
                                cdls-&amp;gt;cdls_delay = libcfs_console_max_delay;
                }
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The code which clamps cdls_delay in between the libcfs_console_min_delay and libcfs_console_max_delay should be moved outside of the else statement.  Otherwise, the cdls_delay can be set lower than the libcfs_console_min_delay, causing the next message to be printed sooner than it should.&lt;/p&gt;

&lt;p&gt;I have also noticed that this bug causes the first two messages through libcfs_debug_vmsg2 for a given cdls struct to be printed, no matter how close they are in time.&lt;/p&gt;

&lt;p&gt;I will upload my patch for this issue.&lt;/p&gt;</description>
                <environment></environment>
        <key id="23461">LU-4711</key>
            <summary>Console rate limiting logic sets cdls_delay smaller than the minimum in libcfs_debug_vmsg2</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="cliffw">Cliff White</assignee>
                                    <reporter username="haasken">Ryan Haasken</reporter>
                        <labels>
                            <label>patch</label>
                    </labels>
                <created>Tue, 4 Mar 2014 21:58:29 +0000</created>
                <updated>Tue, 20 May 2014 13:46:48 +0000</updated>
                            <resolved>Mon, 12 May 2014 15:56:50 +0000</resolved>
                                                    <fixVersion>Lustre 2.6.0</fixVersion>
                    <fixVersion>Lustre 2.5.2</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>1</watches>
                                                                            <comments>
                            <comment id="78467" author="haasken" created="Wed, 5 Mar 2014 15:36:32 +0000"  >&lt;p&gt;The patch has been uploaded: &lt;a href=&quot;http://review.whamcloud.com/#/c/9503/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/9503/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="78470" author="haasken" created="Wed, 5 Mar 2014 16:00:12 +0000"  >&lt;p&gt;In order to test this patch, I added a readable /proc interface at /proc/fs/lustre/test-console-limit that just did a CERROR and then returned a string.  (I could have added this CERROR to an existing /proc interface or to some other part of the code, but I wanted it to be quick and easy to execute that code.)&lt;/p&gt;

&lt;p&gt;I also changed the module parameters libcfs_console_min/max_delay to be set in seconds and converted to jiffies.  Then I set them to 5 and 20 seconds, respectively.  For debugging purposes, I also added a printk to libcfs_debug_vmsg2 that printed the value of cdls_delay if it was less than libcfs_console_min_delay.&lt;/p&gt;

&lt;p&gt;I executed a test script that read the proc interface in a way that would demonstrate the bug.&lt;/p&gt;

&lt;p&gt;Here are the relevant console messages from that test script:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2014-03-04T13:25:36.828018-06:00 c0-0c0s2n2 cdls_delay=0 is less thanlibcfs_console_min_delay=1250
2014-03-04T13:25:36.865375-06:00 c0-0c0s2n2 LustreError: 8624:0:(linux-module.c:255:obd_proc_read_console_limit()) haasken is testing the console rate limiting logic.
2014-03-04T13:25:36.865405-06:00 c0-0c0s2n2 LustreError: 8625:0:(linux-module.c:255:obd_proc_read_console_limit()) haasken is testing the console rate limiting logic.
2014-03-04T13:25:41.930256-06:00 c0-0c0s2n2 LustreError: 8635:0:(linux-module.c:255:obd_proc_read_console_limit()) haasken is testing the console rate limiting logic.
2014-03-04T13:25:41.930305-06:00 c0-0c0s2n2 LustreError: 8635:0:(linux-module.c:255:obd_proc_read_console_limit()) Skipped 8 previous similar messages
2014-03-04T13:26:21.963698-06:00 c0-0c0s2n2 cdls_delay=312 is less thanlibcfs_console_min_delay=1250
2014-03-04T13:26:21.963749-06:00 c0-0c0s2n2 LustreError: 8638:0:(linux-module.c:255:obd_proc_read_console_limit()) haasken is testing the console rate limiting logic.
2014-03-04T13:26:56.977699-06:00 c0-0c0s2n2 cdls_delay=39 is less thanlibcfs_console_min_delay=1250
2014-03-04T13:26:57.018967-06:00 c0-0c0s2n2 LustreError: 8641:0:(linux-module.c:255:obd_proc_read_console_limit()) haasken is testing the console rate limiting logic.
2014-03-04T13:27:32.035007-06:00 c0-0c0s2n2 cdls_delay=4 is less thanlibcfs_console_min_delay=1250
2014-03-04T13:27:32.035057-06:00 c0-0c0s2n2 LustreError: 8644:0:(linux-module.c:255:obd_proc_read_console_limit()) haasken is testing the console rate limiting logic.
2014-03-04T13:28:07.043974-06:00 c0-0c0s2n2 cdls_delay=0 is less thanlibcfs_console_min_delay=1250
2014-03-04T13:28:07.084192-06:00 c0-0c0s2n2 LustreError: 8646:0:(linux-module.c:255:obd_proc_read_console_limit()) haasken is testing the console rate limiting logic.
2014-03-04T13:28:42.100546-06:00 c0-0c0s2n2 cdls_delay=0 is less thanlibcfs_console_min_delay=1250
2014-03-04T13:28:42.100595-06:00 c0-0c0s2n2 LustreError: 8649:0:(linux-module.c:255:obd_proc_read_console_limit()) haasken is testing the console rate limiting logic.
2014-03-04T13:29:17.134376-06:00 c0-0c0s2n2 cdls_delay=0 is less thanlibcfs_console_min_delay=1250
2014-03-04T13:29:17.134425-06:00 c0-0c0s2n2 LustreError: 8653:0:(linux-module.c:255:obd_proc_read_console_limit()) haasken is testing the console rate limiting logic.
2014-03-04T13:29:17.248881-06:00 c0-0c0s2n2 LustreError: 8655:0:(linux-module.c:255:obd_proc_read_console_limit()) haasken is testing the console rate limiting logic.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As you can see, the first two messages occur almost at the same time, so the second one should have been skipped.  Then we can see that if we wait a while between messages, the cdls_delay continues decreasing below libcfs_console_min_delay.  This allows the last message to be printed when it should have been skipped.&lt;/p&gt;

&lt;p&gt;I then applied the fix from 9503 and re-ran the test script (and executed one extra &quot;cat /proc/fs/lustre/test-console-limit&quot; after ~6 minutes).  Here are the relevant console messages from that run:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2014-03-04T14:18:03.556798-06:00 c0-0c0s2n2 LustreError: 9119:0:(linux-module.c:255:obd_proc_read_console_limit()) haasken is testing the console rate limiting logic.
2014-03-04T14:18:08.601399-06:00 c0-0c0s2n2 LustreError: 9130:0:(linux-module.c:255:obd_proc_read_console_limit()) haasken is testing the console rate limiting logic.
2014-03-04T14:18:08.601442-06:00 c0-0c0s2n2 LustreError: 9130:0:(linux-module.c:255:obd_proc_read_console_limit()) Skipped 9 previous similar messages
2014-03-04T14:18:48.639191-06:00 c0-0c0s2n2 LustreError: 9134:0:(linux-module.c:255:obd_proc_read_console_limit()) haasken is testing the console rate limiting logic.
2014-03-04T14:19:23.667557-06:00 c0-0c0s2n2 LustreError: 9138:0:(linux-module.c:255:obd_proc_read_console_limit()) haasken is testing the console rate limiting logic.
2014-03-04T14:19:58.690327-06:00 c0-0c0s2n2 LustreError: 9140:0:(linux-module.c:255:obd_proc_read_console_limit()) haasken is testing the console rate limiting logic.
2014-03-04T14:20:33.717131-06:00 c0-0c0s2n2 LustreError: 9142:0:(linux-module.c:255:obd_proc_read_console_limit()) haasken is testing the console rate limiting logic.
2014-03-04T14:21:08.738812-06:00 c0-0c0s2n2 LustreError: 9146:0:(linux-module.c:255:obd_proc_read_console_limit()) haasken is testing the console rate limiting logic.
2014-03-04T14:21:43.764265-06:00 c0-0c0s2n2 LustreError: 9148:0:(linux-module.c:255:obd_proc_read_console_limit()) haasken is testing the console rate limiting logic.
2014-03-04T14:27:06.534620-06:00 c0-0c0s2n2 LustreError: 9156:0:(linux-module.c:255:obd_proc_read_console_limit()) haasken is testing the console rate limiting logic.
2014-03-04T14:27:06.534666-06:00 c0-0c0s2n2 LustreError: 9156:0:(linux-module.c:255:obd_proc_read_console_limit()) Skipped 1 previous similar message
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As you can see, none of my CERROR messages were printed within libcfs_console_min_delay jiffies of each other.&lt;/p&gt;

&lt;p&gt;I have attached my patch which adds the proc interface, allows you to set libcfs_console_min/max_delay in seconds, and adds the extra printk.  I have also attached my test script.&lt;/p&gt;</comment>
                            <comment id="78473" author="haasken" created="Wed, 5 Mar 2014 16:04:41 +0000"  >&lt;p&gt;This is the bash test script which writes to the new /proc interface.&lt;/p&gt;</comment>
                            <comment id="78475" author="haasken" created="Wed, 5 Mar 2014 16:08:52 +0000"  >&lt;p&gt;This is the patch that adds the test-console-limit proc interface, changes the libcfs_console_min/max_delay to seconds, and adds a printk statement to libcfs_debug_vmsg2.&lt;/p&gt;</comment>
                            <comment id="82053" author="haasken" created="Mon, 21 Apr 2014 15:00:15 +0000"  >&lt;p&gt;The patch has landed, and this ticket can be resolved.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="14225" name="test-console-limit.diff" size="2957" author="haasken" created="Wed, 5 Mar 2014 16:08:52 +0000"/>
                            <attachment id="14224" name="test-limit.sh" size="1353" author="haasken" created="Wed, 5 Mar 2014 16:04:41 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwgr3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>12950</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>