<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:28:24 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-2811] LBUG: stripe_count &gt; LOV_MAX_STRIPE_COUNT</title>
                <link>https://jira.whamcloud.com/browse/LU-2811</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;After running fine for about 8 weeks:&lt;/p&gt;

&lt;p&gt;Message from syslogd@pfs1n1 at Thu Feb 14 08:30:05 2013 ...&lt;br/&gt;
pfs1n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;1452648.918541&amp;#93;&lt;/span&gt; LustreError: &lt;br/&gt;
15442:0:(mdt_lib.c:541:mdt_dump_lmm()) ASSERTION( stripe_count &amp;lt;= &lt;br/&gt;
(__s16)160 ) failed:&lt;/p&gt;

&lt;p&gt;Message from syslogd@pfs1n1 at Thu Feb 14 08:30:05 2013 ...&lt;br/&gt;
pfs1n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;1452648.928990&amp;#93;&lt;/span&gt; LustreError: &lt;br/&gt;
15442:0:(mdt_lib.c:541:mdt_dump_lmm()) LBUG&lt;/p&gt;

&lt;p&gt;Message from syslogd@pfs1n1 at Thu Feb 14 08:30:05 2013 ...&lt;br/&gt;
pfs1n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;1452649.070201&amp;#93;&lt;/span&gt; Kernel panic - not syncing: LBUG&lt;/p&gt;</description>
                <environment>Vanilla Kernel 2.6.32.59&lt;br/&gt;
12 OSTs&lt;br/&gt;
approx. 980 clients running a mix of 1.8.x and 2.3</environment>
        <key id="17573">LU-2811</key>
            <summary>LBUG: stripe_count &gt; LOV_MAX_STRIPE_COUNT</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="yujian">Jian Yu</assignee>
                                    <reporter username="rfehren">Roland Fehrenbacher</reporter>
                        <labels>
                    </labels>
                <created>Thu, 14 Feb 2013 06:17:22 +0000</created>
                <updated>Thu, 27 Aug 2015 14:32:28 +0000</updated>
                            <resolved>Thu, 27 Aug 2015 14:32:28 +0000</resolved>
                                    <version>Lustre 2.1.3</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="52360" author="bfaccini" created="Thu, 14 Feb 2013 07:38:03 +0000"  >&lt;p&gt;Hello,&lt;br/&gt;
Was there a crash-dump taken at the time of the LBUG ?&lt;/p&gt;</comment>
                            <comment id="52364" author="rfehren" created="Thu, 14 Feb 2013 08:06:40 +0000"  >&lt;p&gt;I can&apos;t tell, since even if it was, it would have been wiped by the /tmp cleaning script upon reboot &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/sad.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;</comment>
                            <comment id="52366" author="bfaccini" created="Thu, 14 Feb 2013 08:12:04 +0000"  >&lt;p&gt;Roland,&lt;br/&gt;
I was not speaking about a Lustre debug log dumped into /tmp, but a full system crash-dump, meaning that Kdump or such kind of tool is installed+configured on your system. Are you aware of that ?&lt;/p&gt;
</comment>
                            <comment id="52367" author="rfehren" created="Thu, 14 Feb 2013 08:14:48 +0000"  >&lt;p&gt;Unfortunately not.&lt;/p&gt;</comment>
                            <comment id="52368" author="rfehren" created="Thu, 14 Feb 2013 08:15:57 +0000"  >&lt;p&gt;The system is part of an HA pair and got reset by the peer. So we couldn&apos;t even copy anything from the console.&lt;/p&gt;</comment>
                            <comment id="52457" author="adilger" created="Fri, 15 Feb 2013 13:26:06 +0000"  >&lt;p&gt;Without at least the stack trace, there isn&apos;t really enough information available about what triggered this problem.  Is it corruption of an on-disk LOV EA?  Was it a corrupted setstripe request from a client?  Is there some sanity checking missing in some normal code path?  Was this a problem with a 1.8 or 2.1 client?&lt;/p&gt;

&lt;p&gt;Roland, some things that could help debugging in the future:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;connecting a serial cable on each of the MDS nodes and log to a third system (or each other), perhaps in conjunction with &quot;conman&quot; which can manage ethernet-attached serial console servers&lt;/li&gt;
	&lt;li&gt;using &quot;netconsole&quot; to send console logs to a remote node via UDP&lt;/li&gt;
	&lt;li&gt;configure &quot;kdump&quot; and/or &quot;netdump&quot; to capture crash dumps (this gives by far the most debugging information)&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="52458" author="adilger" created="Fri, 15 Feb 2013 13:27:12 +0000"  >&lt;p&gt;I&apos;m going to close this bug for now, since there isn&apos;t enough information to figure out what went wrong.  Please re-open if it happens again.&lt;/p&gt;</comment>
                            <comment id="52616" author="rfehren" created="Mon, 18 Feb 2013 08:33:06 +0000"  >&lt;p&gt;OK. I already thought that it would be hard to find the cause with only so much info. I see how easy it will be to get kdump on the cluster. Thanks for looking at this anyway.&lt;/p&gt;</comment>
                            <comment id="52617" author="pjones" created="Mon, 18 Feb 2013 08:36:15 +0000"  >&lt;p&gt;Thanks Roland. The other thing to ask is - have any changes been made to the vanilla Lustre code on either of the releases?&lt;/p&gt;</comment>
                            <comment id="52622" author="rfehren" created="Mon, 18 Feb 2013 08:54:51 +0000"  >&lt;p&gt;Not as far as I know (the SuSE clients are not managed by myself, they are running 2.6.32/1.8.x, 3.0/2.3 respectively). &lt;/p&gt;</comment>
                            <comment id="80107" author="minyard" created="Mon, 24 Mar 2014 22:58:51 +0000"  >&lt;p&gt;We have a reproducer for this error, basically if you are running a newer client such as 2.4.2 and mount an older 2.1.5 Lustre filesystem, users can set the stripe to greater than the max stripe setting of 160 in 2.1 and trigger this crash as soon as they query or access the directory with stripe count set to more than 160.&lt;/p&gt;</comment>
                            <comment id="80191" author="pjones" created="Tue, 25 Mar 2014 04:35:30 +0000"  >&lt;p&gt;Yu, jian&lt;/p&gt;

&lt;p&gt;Could you please take care of this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="80377" author="yujian" created="Thu, 27 Mar 2014 15:20:24 +0000"  >&lt;p&gt;I&apos;ll reproduce and investigate the failure.&lt;/p&gt;</comment>
                            <comment id="80385" author="simmonsja" created="Thu, 27 Mar 2014 16:37:37 +0000"  >&lt;p&gt;Can you try patch &lt;a href=&quot;http://review.whamcloud.com/#/c/9734&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/9734&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="80449" author="yujian" created="Fri, 28 Mar 2014 10:26:15 +0000"  >&lt;blockquote&gt;&lt;p&gt;Can you try patch &lt;a href=&quot;http://review.whamcloud.com/#/c/9734&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/9734&lt;/a&gt;.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Sure, thanks!&lt;/p&gt;

&lt;p&gt;Before validating the patch, I can reproduce the failure on Lustre 2.1.6 server with Lustre 2.4.3 client according to the steps from Tommy.&lt;/p&gt;

&lt;p&gt;On Lustre 2.4.3 client node:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# mkdir /mnt/lustre/dir
# lfs getstripe -d /mnt/lustre/dir
stripe_count:   1 stripe_size:    1048576 stripe_offset:  -1
# lfs setstripe -c 200 /mnt/lustre/dir
# lfs getstripe -d /mnt/lustre/dir            &amp;lt;------------- hung here
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Console log on Lustre 2.1.6 MDS:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400):0:0
LustreError: 8510:0:(mdt_lib.c:543:mdt_dump_lmm()) ASSERTION( stripe_count &amp;lt;= (__s16)160 ) failed: 
LustreError: 8510:0:(mdt_lib.c:543:mdt_dump_lmm()) LBUG
Pid: 8510, comm: mdt_00

Call Trace:
 [&amp;lt;ffffffffa0425785&amp;gt;] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [&amp;lt;ffffffffa0425d97&amp;gt;] lbug_with_loc+0x47/0xb0 [libcfs]
 [&amp;lt;ffffffffa0d4fb72&amp;gt;] mdt_dump_lmm+0x272/0x280 [mdt]
 [&amp;lt;ffffffffa0d495f2&amp;gt;] mdt_getattr_internal+0x672/0xe90 [mdt]
 [&amp;lt;ffffffffa06bb6c0&amp;gt;] ? lustre_swab_mdt_body+0x0/0x150 [ptlrpc]
 [&amp;lt;ffffffffa0d4a035&amp;gt;] mdt_getattr+0x225/0x920 [mdt]
 [&amp;lt;ffffffffa0d40772&amp;gt;] mdt_handle_common+0x932/0x1750 [mdt]
 [&amp;lt;ffffffffa0d41665&amp;gt;] mdt_regular_handle+0x15/0x20 [mdt]
 [&amp;lt;ffffffffa06c8b9e&amp;gt;] ptlrpc_main+0xc4e/0x1a40 [ptlrpc]
 [&amp;lt;ffffffffa06c7f50&amp;gt;] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
 [&amp;lt;ffffffff8100c0ca&amp;gt;] child_rip+0xa/0x20
 [&amp;lt;ffffffffa06c7f50&amp;gt;] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
 [&amp;lt;ffffffffa06c7f50&amp;gt;] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
 [&amp;lt;ffffffff8100c0c0&amp;gt;] ? child_rip+0x0/0x20

Kernel panic - not syncing: LBUG
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I&apos;ll back-port the above patch to Lustre b2_1 branch and try again.&lt;/p&gt;</comment>
                            <comment id="81045" author="yujian" created="Fri, 4 Apr 2014 15:41:18 +0000"  >&lt;p&gt;Here is the back-ported patch for Lustre b2_1 branch &lt;a href=&quot;http://review.whamcloud.com/9884&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/9884&lt;/a&gt;. I&apos;ll validate it.&lt;/p&gt;</comment>
                            <comment id="81110" author="yujian" created="Mon, 7 Apr 2014 14:37:30 +0000"  >&lt;p&gt;With the above patch on Lustre b2_1 branch, the same test passed:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# mkdir /mnt/lustre/dir
# lfs getstripe -d /mnt/lustre/dir
stripe_count:   1 stripe_size:    1048576 stripe_offset:  -1 
# lfs setstripe -c 200 /mnt/lustre/dir
# lfs getstripe -d /mnt/lustre/dir
stripe_count:   200 stripe_size:    1048576 stripe_offset:  -1 
# touch /mnt/lustre/dir/file
# lfs getstripe -i -c -s /mnt/lustre/dir/file
lmm_stripe_count:   160
lmm_stripe_size:    1048576
lmm_stripe_offset:  93
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="125359" author="simmonsja" created="Thu, 27 Aug 2015 14:17:46 +0000"  >&lt;p&gt;Since Lustre 2.1 is no longer supported we should close this ticket. The solution if someone needs it is in this ticket.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="26291">LU-5578</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvj4f:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>6812</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10023"><![CDATA[4]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>