<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:31:03 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-3110] Disable osd declaration tracking for 2.4 release</title>
                <link>https://jira.whamcloud.com/browse/LU-3110</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Currently OSD code is very strict about op declaration matching actual number of operations and crashes if they don&apos;t match (e.g. &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2991&quot; title=&quot;(osd_internal.h:909:osd_trans_exec_op()) ASSERTION( oti-&amp;gt;oti_declare_ops_rb[rb] &amp;gt; 0 ) failed: rb = 2&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2991&quot;&gt;&lt;del&gt;LU-2991&lt;/del&gt;&lt;/a&gt;) which was pretty important early on to highlight such issues, but now that we transition to an actual release,we need to relax these checks.&lt;br/&gt;
Sure we still throw out a loud warning, but since the condition is normally not fatal (we way-overreserve transaction credits anyway) there is no point in crashing in this case on real customer systems.&lt;/p&gt;

&lt;p&gt;It seems there is a compile-time switch OSD_TRACK_DECLARES, but I think it would be even better to make it into a proc variable, disabled by default, that we then can turn on at runtime during e.g. testing (to better highlight the issues), but actual customers don&apos;t get any crashes.&lt;/p&gt;</description>
                <environment></environment>
        <key id="18252">LU-3110</key>
            <summary>Disable osd declaration tracking for 2.4 release</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bfaccini">Bruno Faccini</assignee>
                                    <reporter username="green">Oleg Drokin</reporter>
                        <labels>
                    </labels>
                <created>Thu, 4 Apr 2013 23:29:16 +0000</created>
                <updated>Thu, 12 Sep 2013 23:56:01 +0000</updated>
                            <resolved>Thu, 13 Jun 2013 23:35:51 +0000</resolved>
                                    <version>Lustre 2.4.0</version>
                                    <fixVersion>Lustre 2.4.1</fixVersion>
                    <fixVersion>Lustre 2.5.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="55564" author="pjones" created="Thu, 4 Apr 2013 23:47:41 +0000"  >&lt;p&gt;Bruno&lt;/p&gt;

&lt;p&gt;Could you please take care of this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="55583" author="bfaccini" created="Fri, 5 Apr 2013 07:10:55 +0000"  >&lt;p&gt;Sure, I already fight against OSD_TRACK_DECLARE as part of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2991&quot; title=&quot;(osd_internal.h:909:osd_trans_exec_op()) ASSERTION( oti-&amp;gt;oti_declare_ops_rb[rb] &amp;gt; 0 ) failed: rb = 2&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2991&quot;&gt;&lt;del&gt;LU-2991&lt;/del&gt;&lt;/a&gt; !!...&lt;/p&gt;</comment>
                            <comment id="55600" author="bfaccini" created="Fri, 5 Apr 2013 13:13:43 +0000"  >&lt;p&gt;Oleg,&lt;br/&gt;
do you know some equivalent already existing in the code ? It may help me to implement a new one ...&lt;br/&gt;
Thank&apos;s.&lt;/p&gt;</comment>
                            <comment id="55819" author="green" created="Tue, 9 Apr 2013 03:16:52 +0000"  >&lt;p&gt;Well, basically check any code that does a binary switch like sync_journal variable.&lt;/p&gt;

&lt;p&gt;Then in the place that does assertion on op credit mismatch, convert assertion to CERROR, and then if (crash_on_op_mismatch) LBUG();&lt;/p&gt;</comment>
                            <comment id="55915" author="adilger" created="Tue, 9 Apr 2013 18:26:42 +0000"  >&lt;p&gt;There is already a check in &lt;tt&gt;lustre/osd-ldiskfs/osd_internal.h&lt;/tt&gt; to disable this code after 2.3.90:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;#&lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; LUSTRE_VERSION_CODE &amp;lt; OBD_OCD_VERSION(2, 3, 90, 0)
# define OSD_TRACK_DECLARES
#endif
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It would still be nice to be able to enable this at runtime, but it also has some extra memory overhead per transaction, so we might consider moving the tracking structure into a separate allocation only when this tracking is enabled.&lt;/p&gt;

&lt;p&gt;Bruno, in the short term, can you verify that changing the build version to 2.3.90 in lustre/autoconf/lustre_version.ac causes this code to actually be disabled and does not cause other problems?&lt;/p&gt;
</comment>
                            <comment id="55983" author="bfaccini" created="Wed, 10 Apr 2013 09:50:11 +0000"  >&lt;p&gt;Andreas, during &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2991&quot; title=&quot;(osd_internal.h:909:osd_trans_exec_op()) ASSERTION( oti-&amp;gt;oti_declare_ops_rb[rb] &amp;gt; 0 ) failed: rb = 2&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2991&quot;&gt;&lt;del&gt;LU-2991&lt;/del&gt;&lt;/a&gt; testing/debug we already ran with undefined OSD_TRACK_DECLARES, and only compile-time errors have been encountered which are now fixed as part of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2991&quot; title=&quot;(osd_internal.h:909:osd_trans_exec_op()) ASSERTION( oti-&amp;gt;oti_declare_ops_rb[rb] &amp;gt; 0 ) failed: rb = 2&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2991&quot;&gt;&lt;del&gt;LU-2991&lt;/del&gt;&lt;/a&gt; changes that have been already landed. Then, no regression was encountered.&lt;/p&gt;

&lt;p&gt;I am in the process to double-check, as you requested.&lt;/p&gt;

&lt;p&gt;On the other hand, I am working on a patch to allow disable/enable of transaction ops tracking, and will try to integrate the dynamic allocation you pointed.&lt;/p&gt;

</comment>
                            <comment id="55996" author="bfaccini" created="Wed, 10 Apr 2013 13:04:37 +0000"  >&lt;p&gt;Humm, no way, trying to undefine OSD_TRACK_DECLARES by setting build version to 2.3.90 causes further  build to fail when it triggers the following sequence in lustre/obdclass/local_storage.c :&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;720 #if LUSTRE_VERSION_CODE &amp;gt;= OBD_OCD_VERSION(2, 3, 90, 0)
721 #error &quot;fix this before release&quot;
722 #endif
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;thus if we want to build with OSD_TRACK_DECLARES undefined, actually the best way seems to delete the following lines in lustre/osd-ldiskfs/osd_internal.h :&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt; 322 #if LUSTRE_VERSION_CODE &amp;lt; OBD_OCD_VERSION(2, 3, 90, 0)
 323 # define OSD_TRACK_DECLARES
 324 #endif
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;as we did with Minh during our debugging work for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2991&quot; title=&quot;(osd_internal.h:909:osd_trans_exec_op()) ASSERTION( oti-&amp;gt;oti_declare_ops_rb[rb] &amp;gt; 0 ) failed: rb = 2&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2991&quot;&gt;&lt;del&gt;LU-2991&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="56012" author="bfaccini" created="Wed, 10 Apr 2013 16:08:57 +0000"  >&lt;p&gt;BTW, patch is under local-testing now, hope to be able to push it soon.&lt;/p&gt;</comment>
                            <comment id="56038" author="adilger" created="Wed, 10 Apr 2013 18:17:50 +0000"  >&lt;p&gt;I&apos;m going to file a separate bug for the LUSTRE_VERSION_CODE check.&lt;/p&gt;</comment>
                            <comment id="56042" author="adilger" created="Wed, 10 Apr 2013 18:32:14 +0000"  >&lt;p&gt;Filed &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3149&quot; title=&quot;LUSTRE_VERSION_CODE checks break 2.3.90&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3149&quot;&gt;&lt;del&gt;LU-3149&lt;/del&gt;&lt;/a&gt; for LUSTRE_VERSION_CODE breakage in local_storage.c.&lt;/p&gt;</comment>
                            <comment id="56102" author="bfaccini" created="Thu, 11 Apr 2013 15:58:20 +0000"  >&lt;p&gt;Patch for master pushed at &lt;a href=&quot;http://review.whamcloud.com/6032&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/6032&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I tried to implement the dynamic way you wanted Andreas.&lt;/p&gt;

&lt;p&gt;So basically here is what I did :&lt;/p&gt;

&lt;p&gt;     _ removed compile-time OSD_TRACK_DECLARES way.&lt;br/&gt;
     _ created a new per-device lprocfs &quot;track_declares&quot; boolean (default is tracking enabled).&lt;br/&gt;
     _ created a new struct oti_track_declares to be malloc&apos;ed and pointed by new oti_declares field/pointer in osd_thread_info struct.&lt;br/&gt;
     _ tracking now only to occur when enabled and oti_declares malloc&apos;ed, upon each transaction beeing created.&lt;/p&gt;

&lt;p&gt;I did local intensive tests/checks of the patch running racer and concurrent enable/disable of tracking, also verified no mem-leak was introduced.&lt;/p&gt;

&lt;p&gt;Thank&apos;s to Johann to discuss with me on possible implementation ways and have refreshed me on the transaction&apos;s life-cycle !!..&lt;/p&gt;

&lt;p&gt;This ticket+patch is a direct follow-on of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2991&quot; title=&quot;(osd_internal.h:909:osd_trans_exec_op()) ASSERTION( oti-&amp;gt;oti_declare_ops_rb[rb] &amp;gt; 0 ) failed: rb = 2&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2991&quot;&gt;&lt;del&gt;LU-2991&lt;/del&gt;&lt;/a&gt; and &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2640&quot; title=&quot;deactivate OSD_EXEC_OP() operation accounting if operation is being undone&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2640&quot;&gt;&lt;del&gt;LU-2640&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="56745" author="bfaccini" created="Mon, 22 Apr 2013 22:06:52 +0000"  >&lt;p&gt;Patch-set #3 tries to implement more strictly what you requested, and also tries to take care of reviewers comments :&lt;/p&gt;

&lt;p&gt;     _ again use of old define/compile-time OSD_TRACK_DECLARES is fully removed.&lt;br/&gt;
     _ global/single &quot;track_declares_assert&quot; parameter available via lprocfs and as a module-parameter.&lt;br/&gt;
     _ to enable/disable either LBUGs/Asserts or CWARNs&lt;/p&gt;

&lt;p&gt;Again I did local+intensive testing of the patch in normal conditions, will try to inject some code to induce tracking declares overflow/disfunction, and see what happen ...&lt;/p&gt;
</comment>
                            <comment id="56924" author="bfaccini" created="Wed, 24 Apr 2013 13:32:27 +0000"  >&lt;p&gt;Since I am not really proud with my main mistake in patch-set #3 ..., I am currently in the process to better check/test the new patch-set #4 where I try to implement all my reviewers very constructive and helpful comments.&lt;/p&gt;</comment>
                            <comment id="57029" author="bfaccini" created="Thu, 25 Apr 2013 12:37:48 +0000"  >&lt;p&gt;Here is patch-set #4, I re-wrote all conditional statements in osd_internal.h according to reviewers advices, only assuming in addition that LASSERTs are no-return (ie, panic() or schedule() for ever, depending on panic_on_lbug).&lt;/p&gt;

&lt;p&gt;The env variable add in test-famework.sh/check_and_setup_lustre() may also need some filtering (MDSs/OSSs devices of ldiskfs type only ...), but we can assume that people using this know what they want to do.&lt;/p&gt;

&lt;p&gt;I also did some intensive testing error-injection allowing either CWARNs or LBUGs to occur ...&lt;/p&gt;</comment>
                            <comment id="57131" author="bfaccini" created="Fri, 26 Apr 2013 15:40:32 +0000"  >&lt;p&gt;I am stuck with the fact that checkpatch.pl complains an &quot;ERROR: do not initialise globals to 0 or NULL&quot; when I initialize my global ldiskfs_track_declares_assert to the 0/OFF value during its declaration, because to initialize it later conflicts with its also possible setting as a module-parameter.&lt;/p&gt;

&lt;p&gt;This comes from changes in checkpatch.pl from commit 0b554ff2/I26fc5a5c for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1347&quot; title=&quot;Lustre coding style change&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1347&quot;&gt;&lt;del&gt;LU-1347&lt;/del&gt;&lt;/a&gt;, introducing Kernel rules.&lt;/p&gt;

&lt;p&gt;I may try to revert the meaning of my flag to something like ldiskfs_track_declares_noassert and be able to initialize it to 1, and then change allrelated tests, but is it really a rule we need to follow ??&lt;/p&gt;</comment>
                            <comment id="57138" author="bfaccini" created="Fri, 26 Apr 2013 16:26:06 +0000"  >&lt;p&gt;Ok, I did it finally ... So patch-set #5 just submitted with a reverted flag ( &lt;span class=&quot;error&quot;&gt;&amp;#91;ldiskfs_&amp;#93;&lt;/span&gt;track_declares_noassert) to make checkpatch.pl and Kernel rules happy !!&lt;/p&gt;

&lt;p&gt;BTW, I submitted it without exposure to my own/local testing and rely on reviewers to check the reverted logic ;-}&lt;/p&gt;</comment>
                            <comment id="57170" author="adilger" created="Fri, 26 Apr 2013 23:54:51 +0000"  >&lt;p&gt;Bruno, just for future reference, the reason that checkpatch complains about initializing the parameter to zero is because all global variables are automatically initialized to zero already.  If they are explicitly initialized to zero, then they make the binary larger (they are stored in a special code segment), while the automatically-initialized-to-zero variables do not need this.&lt;/p&gt;</comment>
                            <comment id="57183" author="bfaccini" created="Sat, 27 Apr 2013 07:52:19 +0000"  >&lt;p&gt;Andreas, thanks for the explanation. In fact, when trying to understand, I found some contradictory (seems it is an old subject/debate) infos about this and thus was unsure if I can assume the automatic zero initialization or not.&lt;/p&gt;

&lt;p&gt;Thus what should I do keep the current and reverted logic or back to the original and assume zero initialisation ??&lt;/p&gt;</comment>
                            <comment id="57184" author="adilger" created="Sat, 27 Apr 2013 08:38:08 +0000"  >&lt;p&gt;I found the new inverted logic to be more confusing to read, and hasn&apos;t read the comments here about why it was being done this way. My preference would be to go back to the original &quot;default zero&quot; logic.  It is definitely good to keep this checking enabled in the test framework, but default to off for regular users. &lt;/p&gt;

&lt;p&gt;Please also fix the /proc save/restore problem I mentioned. It would be fine to just use &quot;0&quot; and &quot;1&quot; for the output if you don&apos;t have time to add support to the write handler to accept &quot;on&quot; and &quot;off&quot; in addition to &quot;1&quot; and &quot;0&quot;.&lt;/p&gt;</comment>
                            <comment id="57210" author="bfaccini" created="Sun, 28 Apr 2013 19:32:05 +0000"  >&lt;p&gt;Patch-set #6 just submitted, back to original logic (and zero initialization of globals assumption!), with cosmetic fixes to comply with Andreas last comments.&lt;/p&gt;</comment>
                            <comment id="57225" author="bfaccini" created="Mon, 29 Apr 2013 10:24:54 +0000"  >&lt;p&gt;Patch-set #7 submitted with print of stack-trace upon warning. I did not put some rate-limiting code since it is assumed that these tracking of declares errors are very unlikely to occur.&lt;/p&gt;</comment>
                            <comment id="57257" author="adilger" created="Mon, 29 Apr 2013 19:42:50 +0000"  >&lt;p&gt;Patch set #7 is going to be landed, so this bug is no longer a blocker.  However, another patch should be submitted so the checking is enabled by default during testing, and only for osd-ldiskfs.&lt;/p&gt;</comment>
                            <comment id="57509" author="bfaccini" created="Thu, 2 May 2013 13:15:53 +0000"  >&lt;p&gt;A follow-on patch will implement latest comments from Andreas with the following add-ons :&lt;/p&gt;

&lt;p&gt;       _ set OSD_TRACK_DECLARES_LBUG by default in the test-framework.sh init_test_env() so that this is always being tested.&lt;/p&gt;

&lt;p&gt;       _ in test-famework.sh/check_and_setup_lustre(), set the track_declares_assert tunable only be set for osd-ldiskfs, and not osd-*, since this functionality is not present in osd-zfs. This will need some filtering (MDSs/OSSs devices of ldiskfs type only ...) to be implemented to prevent warnings.&lt;/p&gt;
</comment>
                            <comment id="57811" author="bfaccini" created="Tue, 7 May 2013 08:52:28 +0000"  >&lt;p&gt;Follow-on patch just pushed at &lt;a href=&quot;http://review.whamcloud.com/6280&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/6280&lt;/a&gt;.&lt;br/&gt;
Since it is my first real one in the tests area/scripts, comments are welcome !!...&lt;/p&gt;</comment>
                            <comment id="57822" author="bfaccini" created="Tue, 7 May 2013 15:07:44 +0000"  >&lt;p&gt;Oops sorry, I did not expose 1st patch version of &lt;a href=&quot;http://review.whamcloud.com/6280&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/6280&lt;/a&gt; change to a multi-node configuration, and thus I missed the fact the nodes-list I provided to do_nodes was without commas.&lt;br/&gt;
New/2nd version takes care of that now ...&lt;/p&gt;</comment>
                            <comment id="60607" author="adilger" created="Thu, 13 Jun 2013 23:35:51 +0000"  >&lt;p&gt;The test-framework change was landed for 2.5.0 to enable track_declares_assert.  However, this caused &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3449&quot; title=&quot;Interop failure on many testsuites: error: set_param: track_declares_assert: Found no match&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3449&quot;&gt;&lt;del&gt;LU-3449&lt;/del&gt;&lt;/a&gt; to be hit during interop testing.  Closing this bug, and the test-framework interop issue will be fixed in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3449&quot; title=&quot;Interop failure on many testsuites: error: set_param: track_declares_assert: Found no match&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3449&quot;&gt;&lt;del&gt;LU-3449&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="19359">LU-3449</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvn5b:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>7560</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>