<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:28:35 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-9714] Changelog consumer test reports &apos;Local llog found corrupted&apos;</title>
                <link>https://jira.whamcloud.com/browse/LU-9714</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;admin@snx11253n002 ~&amp;#93;&lt;/span&gt;$ lctl get_param mdd.snx11253-MDT0000.changelog_users;&lt;br/&gt;
 mdd.snx11253-MDT0000.changelog_users=&lt;br/&gt;
 current index: 8936329&lt;br/&gt;
 ID index&lt;br/&gt;
 cl2 7049068&lt;br/&gt;
 cl31347 3348651&lt;br/&gt;
 cl31349 3348651&lt;br/&gt;
 cl31351 3348651&lt;br/&gt;
 cl33628 6330946&lt;br/&gt;
 cl33632 6335474&lt;br/&gt;
 cl2 7049068&lt;br/&gt;
 cl33962 7379382&lt;br/&gt;
 cl33963 7379382&lt;br/&gt;
 cl33964 7379382&lt;/p&gt;

&lt;p&gt;TEST PROCEDURE:&lt;br/&gt;
 For each rank:&lt;br/&gt;
 (I) CONSUME_LOGS (STARTREC=&amp;lt;arg&amp;gt; or 1)&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;1&amp;#93;&lt;/span&gt; Set endrec = startrec + params.chunksize&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;2&amp;#93;&lt;/span&gt; Read CL in parallel with N threads with range (startrec, endrec), and extract &lt;br/&gt;
 actual last record read (lastrec) .&lt;br/&gt;
 a] If no new files are found ...&lt;br/&gt;
 i] Retry until allowed (by refresh timeout, refresh retries). &lt;br/&gt;
 Otherwise quit.&lt;br/&gt;
 b] If new files are found ...&lt;br/&gt;
 i] If clearing is enabled, clear logs up to lastrec.&lt;br/&gt;
 ii] Increment record ranges &lt;br/&gt;
 (startrec = lastrec, endrec = startrec + chunksize)&lt;br/&gt;
 d] repeat step &lt;span class=&quot;error&quot;&gt;&amp;#91;2&amp;#93;&lt;/span&gt;&lt;br/&gt;
TEST EXECUTION:&lt;br/&gt;
 The test was executed with the follow arguments:&lt;br/&gt;
$ changelog-consumer.py -v --read-clear --clusers cl2 cl2 cl2 cl2 --mdt snx11253-MDT0000 --chunksize 9999 --read-threads 8 --update-retries 6 --update-after 30&lt;br/&gt;
Basically this is reading 9999 Changelogs in 8 threads, then clearing 9999 Changelogs in 1 thread (implied, not show). These actions are being duplicated across 4 separate nodes, and against the &quot;cl2&quot; user.&lt;/p&gt;


&lt;p&gt;The root cause is parallel processing llog with modification.&lt;br/&gt;
 The llog has llog_handle which include&lt;/p&gt;

&lt;p&gt;lgh_cur_idx; /* used during llog_process */&lt;br/&gt;
 lgh_cur_offset; /* used during llog_process */&lt;/p&gt;

&lt;p&gt;They both are modified by llog_process_thread every time we process record from llog.&lt;/p&gt;

&lt;p&gt;Those fields are used at llog_osd_write_rec for modification/rewrite llog record. The race exist when two or more threads are processing the same llog and at least one of them do modification.&lt;br/&gt;
 1) llog_process_thread lgh_cur_idx=10 lgh_cur_offset=2000,&lt;br/&gt;
 2) llog_process_thread lgh_cur_idx=5 lgh_cur_offset=1000,&lt;/p&gt;

&lt;p&gt;1) log_process_thread &amp;gt;lpi_cb&amp;gt;mdd_changelog_user_purge_cb-&amp;gt;llog_write... llog_osd_write_rec and then&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;if (idx != loghandle-&amp;gt;lgh_cur_idx) {                         
                                CERROR(&quot;%s: modify index mismatch %d %d\n&quot;,          
                                       o-&amp;gt;do_lu.lo_dev-&amp;gt;ld_obd-&amp;gt;obd_name, idx,       
                                       loghandle-&amp;gt;lgh_cur_idx);                      
                               RETURN(-EFAULT);                                     
          }  


&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;From logs&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;May 4 19:55:29 snx11253n002 kernel: LustreError: 29737:0:(llog_osd.c:441:llog_osd_write_rec()) snx11253-MDT0000-osd: modify index mismatch 2 34132&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;This was the first case.&lt;/p&gt;

&lt;p&gt;(2) Lets imagine that second thread modify lgh_cur_idx/lgh_cur_offset right after this check&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;if (idx != loghandle-&amp;gt;lgh_cur_idx)&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;then&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;lgi-&amp;gt;lgi_off = loghandle-&amp;gt;lgh_cur_offset;&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;lgi-&amp;gt;lgi_off became 1000 instead of 2000. And the wrong modification will happen.&lt;/p&gt;</description>
                <environment></environment>
        <key id="46877">LU-9714</key>
            <summary>Changelog consumer test reports &apos;Local llog found corrupted&apos;</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="aboyko">Alexander Boyko</assignee>
                                    <reporter username="aboyko">Alexander Boyko</reporter>
                        <labels>
                    </labels>
                <created>Tue, 27 Jun 2017 09:50:05 +0000</created>
                <updated>Sat, 29 Jan 2022 10:10:03 +0000</updated>
                            <resolved>Sat, 29 Jan 2022 10:10:03 +0000</resolved>
                                                    <fixVersion>Lustre 2.11.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="200332" author="gerrit" created="Tue, 27 Jun 2017 09:51:24 +0000"  >&lt;p&gt;Alexander Boyko (alexander.boyko@seagate.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/27838&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/27838&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9714&quot; title=&quot;Changelog consumer test reports &amp;#39;Local llog found corrupted&amp;#39;&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9714&quot;&gt;&lt;del&gt;LU-9714&lt;/del&gt;&lt;/a&gt; llog: fix llog_process_thread race&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 8c8ffad1a2e01b371351bec82037958fe8b282b3&lt;/p&gt;</comment>
                            <comment id="201823" author="tappro" created="Wed, 12 Jul 2017 13:15:07 +0000"  >&lt;p&gt;what kind of modification is used in that test? You&apos;ve mentioned only &apos;clearing&apos; of changelog, is it done with llog_cancel or you are using llog_write() to wipe these records entirely?&lt;/p&gt;</comment>
                            <comment id="201826" author="aboyko" created="Wed, 12 Jul 2017 13:30:00 +0000"  >&lt;p&gt;user mode test, so it was&lt;/p&gt;

&lt;p&gt;lfs changelog_clear&lt;/p&gt;</comment>
                            <comment id="208614" author="gerrit" created="Mon, 18 Sep 2017 12:08:45 +0000"  >&lt;p&gt;Alexander Boyko (alexander.boyko@seagate.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/29035&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/29035&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9714&quot; title=&quot;Changelog consumer test reports &amp;#39;Local llog found corrupted&amp;#39;&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9714&quot;&gt;&lt;del&gt;LU-9714&lt;/del&gt;&lt;/a&gt; test: checking llog_process_thread race&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: cd7ad26a3208ea69e393a632b3496ed77d767d52&lt;/p&gt;</comment>
                            <comment id="217038" author="gerrit" created="Fri, 22 Dec 2017 06:48:26 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/27838/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/27838/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9714&quot; title=&quot;Changelog consumer test reports &amp;#39;Local llog found corrupted&amp;#39;&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9714&quot;&gt;&lt;del&gt;LU-9714&lt;/del&gt;&lt;/a&gt; llog: fix llog_process_thread race&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 52b693c588555c55dd44fe3a27a1bf8c8cccac31&lt;/p&gt;</comment>
                            <comment id="217106" author="pjones" created="Fri, 22 Dec 2017 12:37:38 +0000"  >&lt;p&gt;Alex&lt;/p&gt;

&lt;p&gt;The functional change tracked here has landed to master. Do you still intend to land the accompanying test? If so, please could you rebase it so that it can complete testing and reviews?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10030" key="com.atlassian.jira.plugin.system.customfieldtypes:labels">
                        <customfieldname>Epic/Theme</customfieldname>
                        <customfieldvalues>
                                        <label>patch</label>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzfqf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>