<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:34:39 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-3524] Lustre 2.1.3: lov_io.c:212:lov_sub_get()) ASSERTION( stripe &lt; lio-&gt;lis_stripe_count ) failed</title>
                <link>https://jira.whamcloud.com/browse/LU-3524</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;At TGCC site, which is currently running Lustre 2.1.3, time to time, customer get crashes with the following assertion :&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;LustreError: 23580:0:(lov_io.c:212:lov_sub_get()) ASSERTION( stripe &amp;lt; lio-&amp;gt;lis_stripe_count ) failed:
LustreError: 23580:0:(lov_io.c:212:lov_sub_get()) LBUG
Pid: 23580, comm: IMB-IO

Call Trace:
 [&amp;lt;ffffffffa034d7f5&amp;gt;] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [&amp;lt;ffffffffa034de07&amp;gt;] lbug_with_loc+0x47/0xb0 [libcfs]
 [&amp;lt;ffffffffa0917a8f&amp;gt;] lov_sub_get+0x47f/0x6f0 [lov]
 [&amp;lt;ffffffffa0913ca2&amp;gt;] lov_sublock_env_get+0xd2/0x140 [lov]
 [&amp;lt;ffffffffa0914e61&amp;gt;] lov_sublock_alloc+0xf1/0x470 [lov]
 [&amp;lt;ffffffffa09162fc&amp;gt;] lov_lock_init_raid0+0x3dc/0xe30 [lov]
 [&amp;lt;ffffffffa090eab4&amp;gt;] lov_lock_init+0x54/0xe0 [lov]
 [&amp;lt;ffffffffa049215c&amp;gt;] cl_lock_hold_mutex+0x37c/0x6b0 [obdclass]
 [&amp;lt;ffffffffa04925ee&amp;gt;] cl_lock_request+0x5e/0x1c0 [obdclass]
 [&amp;lt;ffffffffa09ee9bf&amp;gt;] cl_glimpse_lock+0x16f/0x410 [lustre]
 [&amp;lt;ffffffffa09f2f0a&amp;gt;] ccc_prep_size+0x10a/0x290 [lustre]
 [&amp;lt;ffffffffa09f8425&amp;gt;] vvp_io_read_start+0xb5/0x3e0 [lustre]
 [&amp;lt;ffffffffa04938da&amp;gt;] cl_io_start+0x6a/0x140 [obdclass]
 [&amp;lt;ffffffffa0497bbc&amp;gt;] cl_io_loop+0xcc/0x190 [obdclass]
 [&amp;lt;ffffffffa09a7f07&amp;gt;] ll_file_io_generic+0x3a7/0x560 [lustre]
 [&amp;lt;ffffffffa09a81f9&amp;gt;] ll_file_aio_read+0x139/0x2c0 [lustre]
 [&amp;lt;ffffffffa09a86b9&amp;gt;] ll_file_read+0x169/0x2a0 [lustre]
 [&amp;lt;ffffffff81163a15&amp;gt;] vfs_read+0xb5/0x1a0
 [&amp;lt;ffffffff81163b51&amp;gt;] sys_read+0x51/0x90
 [&amp;lt;ffffffff81487d7e&amp;gt;] ? do_device_not_available+0xe/0x10
 [&amp;lt;ffffffff810030f2&amp;gt;] system_call_fastpath+0x16/0x1b
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;After some investigation, it seems to be &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2652&quot; title=&quot;lov_io.c:222:lov_sub_get()) ASSERTION( stripe &amp;lt; lio-&amp;gt;lis_stripe_count ) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2652&quot;&gt;&lt;del&gt;LU-2652&lt;/del&gt;&lt;/a&gt;, and we tried a backport of &lt;a href=&quot;http://review.whamcloud.com/5157&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/5157&lt;/a&gt;, &lt;a href=&quot;http://review.whamcloud.com/5158&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/5158&lt;/a&gt; and &lt;a href=&quot;http://review.whamcloud.com/5159&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/5159&lt;/a&gt;.&lt;br/&gt;
But there was a lot of changes in the corresponding files since lustre 2.1 (layout lock), and 33/45 chuncks are failing.&lt;br/&gt;
Moreover, it seems that these 3 patches are to fix deadlocks introduced by &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1876&quot; title=&quot;Layout Lock Server Patch Landings to Master&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1876&quot;&gt;&lt;del&gt;LU-1876&lt;/del&gt;&lt;/a&gt; (Layout Lock Server Patch Landings to Master).&lt;/p&gt;</description>
                <environment></environment>
        <key id="19601">LU-3524</key>
            <summary>Lustre 2.1.3: lov_io.c:212:lov_sub_get()) ASSERTION( stripe &lt; lio-&gt;lis_stripe_count ) failed</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="bfaccini">Bruno Faccini</assignee>
                                    <reporter username="patrick.valentin">Patrick Valentin</reporter>
                        <labels>
                    </labels>
                <created>Fri, 28 Jun 2013 09:24:04 +0000</created>
                <updated>Thu, 18 Sep 2014 16:21:42 +0000</updated>
                            <resolved>Thu, 18 Sep 2014 16:21:41 +0000</resolved>
                                    <version>Lustre 2.1.3</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="61472" author="pjones" created="Fri, 28 Jun 2013 13:06:11 +0000"  >&lt;p&gt;Bruno is looking into this one&lt;/p&gt;</comment>
                            <comment id="61475" author="bfaccini" created="Fri, 28 Jun 2013 13:27:11 +0000"  >&lt;p&gt;Patrick,&lt;br/&gt;
Do you know if the different crashes occured when running with the application/workload ?&lt;br/&gt;
Do we have any details on how the &quot;IMB-IO&quot; process/application works and particularly if it uses some  stripping specifics?&lt;br/&gt;
Moreover do you know if this crash could be forced to reproduce ?&lt;/p&gt;
</comment>
                            <comment id="61478" author="lustre-bull" created="Fri, 28 Jun 2013 13:56:08 +0000"  >&lt;p&gt;Hi bruno,&lt;/p&gt;

&lt;p&gt;I don&apos;t have anymore information about this LBUG. I forward you questions to Bull support team to have more details.&lt;/p&gt;</comment>
                            <comment id="61558" author="louveta" created="Mon, 1 Jul 2013 08:14:38 +0000"  >&lt;p&gt;I guess it is a standard IMB-IO but with a lustre aware mpi-io library. I have asked final user to provide fine details and will keep you updated.&lt;/p&gt;

&lt;p&gt;Alex.&lt;/p&gt;</comment>
                            <comment id="61829" author="bfaccini" created="Thu, 4 Jul 2013 19:53:27 +0000"  >&lt;p&gt;On my side and in the meantime I investigate patches from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2652&quot; title=&quot;lov_io.c:222:lov_sub_get()) ASSERTION( stripe &amp;lt; lio-&amp;gt;lis_stripe_count ) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2652&quot;&gt;&lt;del&gt;LU-2652&lt;/del&gt;&lt;/a&gt;/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2766&quot; title=&quot;lov_object.c:635:lov_layout_change()) ASSERTION( atomic_read(&amp;amp;lov-&amp;gt;lo_active_ios) == 0 ) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2766&quot;&gt;&lt;del&gt;LU-2766&lt;/del&gt;&lt;/a&gt; to see if they are really related.&lt;/p&gt;</comment>
                            <comment id="62181" author="bfaccini" created="Fri, 12 Jul 2013 13:26:36 +0000"  >&lt;p&gt;To help me working more in-deep on this issue, could it be possible to get the full stacks out of the crash-dump ?? And may be more like concerned data structs if I ask you later ?&lt;/p&gt;</comment>
                            <comment id="94377" author="sebastien.buisson" created="Thu, 18 Sep 2014 13:46:02 +0000"  >&lt;p&gt;As we are unable to provide requested information, this ticket can be closed.&lt;/p&gt;

&lt;p&gt;Thank you,&lt;br/&gt;
Sebastien.&lt;/p&gt;</comment>
                            <comment id="94414" author="pjones" created="Thu, 18 Sep 2014 16:21:42 +0000"  >&lt;p&gt;ok thanks Sebastien&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvu7j:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>8869</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>