<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:29:26 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-16720] large-scale test_3a osp_precreate_rollover_new_seq()) ASSERTION( fid_seq(fid) != fid_seq(last_fid) ) failed: fid [0x240000bd0:0x1:0x0], last_fid [0x240000bd0:0x3fff:0x0]</title>
                <link>https://jira.whamcloud.com/browse/LU-16720</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Starting from March 30, right after landings on that day, a new assertion crash appeared in large-scale test 3a (only gets run in full testing I guess, so flew under radar)&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;LustreError: 676976:0:(osp_precreate.c:488:osp_precreate_rollover_new_seq()) ASSERTION( fid_seq(fid) != fid_seq(last_fid) ) failed: fid [0x240000bd0:0x1:0x0], last_fid [0x240000bd0:0x3fff:0x0]
LustreError: 676976:0:(osp_precreate.c:488:osp_precreate_rollover_new_seq()) LBUG
Pid: 676976, comm: osp-pre-0-0 4.18.0-425.10.1.el8_lustre.x86_64 #1 SMP Thu Mar 2 00:54:22 UTC 2023
Call Trace TBD:
[&amp;lt;0&amp;gt;] libcfs_call_trace+0x6f/0xa0 [libcfs]
[&amp;lt;0&amp;gt;] lbug_with_loc+0x3f/0x70 [libcfs]
[&amp;lt;0&amp;gt;] osp_precreate_thread+0x121d/0x1230 [osp]
[&amp;lt;0&amp;gt;] kthread+0x10b/0x130
[&amp;lt;0&amp;gt;] ret_from_fork+0x35/0x40 &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Example crashes:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://testing.whamcloud.com/test_sets/5173c0c5-ff80-4f5b-aec2-d6e1419cbd85&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/5173c0c5-ff80-4f5b-aec2-d6e1419cbd85&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://testing.whamcloud.com/test_sets/68c90481-1450-4526-a659-b6d5d6b97f0a&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/68c90481-1450-4526-a659-b6d5d6b97f0a&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://testing.whamcloud.com/test_sets/20a4a76a-e1bf-4f46-985c-b8cbed94e51b&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/20a4a76a-e1bf-4f46-985c-b8cbed94e51b&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I suspect this is due to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11912&quot; title=&quot;reduce number of OST objects created per MDS Sequence&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11912&quot;&gt;&lt;del&gt;LU-11912&lt;/del&gt;&lt;/a&gt; patch landing, the timing checks out.&lt;/p&gt;</description>
                <environment></environment>
        <key id="75479">LU-16720</key>
            <summary>large-scale test_3a osp_precreate_rollover_new_seq()) ASSERTION( fid_seq(fid) != fid_seq(last_fid) ) failed: fid [0x240000bd0:0x1:0x0], last_fid [0x240000bd0:0x3fff:0x0]</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="green">Oleg Drokin</reporter>
                        <labels>
                    </labels>
                <created>Fri, 7 Apr 2023 00:48:09 +0000</created>
                <updated>Wed, 12 Apr 2023 00:06:41 +0000</updated>
                                            <version>Lustre 2.16.0</version>
                                    <fixVersion>Lustre 2.16.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="368741" author="adilger" created="Fri, 7 Apr 2023 02:50:59 +0000"  >&lt;p&gt;Dongyang, this shouldn&apos;t be a case with replay_barrier, just creating a lot of files. It isn&apos;t exactly the same as &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16692&quot; title=&quot;replay-single: test_70c osp_fid_diff()) ASSERTION( fid_seq(fid1) == fid_seq(fid2) )&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16692&quot;&gt;LU-16692&lt;/a&gt;, since this is LASSERT that the sequences are different, while that ticket is LASSERT that they are the same. &lt;/p&gt;

&lt;p&gt;It seems like there is an off-by-one in the rollover?  Also, it may be that we need to replace the LASSERT with error handling, since they seem too easily hit. &lt;/p&gt;</comment>
                            <comment id="368968" author="dongyang" created="Mon, 10 Apr 2023 23:50:53 +0000"  >&lt;p&gt;This is a different issue to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16692&quot; title=&quot;replay-single: test_70c osp_fid_diff()) ASSERTION( fid_seq(fid1) == fid_seq(fid2) )&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16692&quot;&gt;LU-16692&lt;/a&gt;. Looks like the LASSERT happened in osp_precreate_rollover_new_seq()&lt;br/&gt;
During SEQ rollover we get a new SEQ id, and then it has to be different to the previous using SEQ saved in last_used_fid, note the object id from the last_used_fid is 0x3fff(the reduced SEQ width), which means the SEQ is used up and due to be changed.&lt;br/&gt;
I feel like this is actually a bug found by changing the SEQ more frequently, maybe a race when changing the SEQ?&lt;/p&gt;</comment>
                            <comment id="369003" author="dongyang" created="Tue, 11 Apr 2023 04:56:08 +0000"  >&lt;p&gt;I think I know what&apos;s going on.&lt;br/&gt;
before large-scale, it was replay-ost-single, and it does replay_barrier on ost1, and from the logs the MDT0 osp  got a new SEQ after the replay_barrier on ost1.&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
[ 9541.509199] Lustre: DEBUG MARKER: == replay-ost-single test 12b: write after OST failover to a missing object ========================================================== 03:08:10 (1680059290)
[ 9545.683083] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n debug
[ 9546.092712] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param -n debug=0
[ 9546.795469] Lustre: lustre-OST0000-osc-MDT0000: update sequence from 0x240000401 to 0x240000bd0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;the replay_barrier on ost1 dropping writes so we lost the seq range update, after that as we progress to large-scale when we need to allocate new SEQ from ofd we still got the old one because the seq range update is lost.&lt;br/&gt;
forcing new seq on all mdts in replay-ost-single should fix this, I&apos;ve updated &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/50478&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/50478&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="54735">LU-11912</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i03icv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>