<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:31:36 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-16981] LBUG in lod_striped_create, fewer stripes allocated than requested</title>
                <link>https://jira.whamcloud.com/browse/LU-16981</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;This relates to these crashes in sanity test 270a: &lt;a href=&quot;https://knox.linuxhacker.ru/crashdb_ui_external.py.cgi?newid=68931&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://knox.linuxhacker.ru/crashdb_ui_external.py.cgi?newid=68931&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;A file is created and should have 4 stripes but one OST gets deactivated and only 3 stripes get created. There is a race condition and if the OST gets deactivated at the wrong moment, then stripe count doesn&apos;t correctly get updated and later there is a crash.&lt;/p&gt;


&lt;p&gt;Here are relevant lines from the debug log prior to this crash:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
00020000:00000001:4.0:1689017245.043107:0:7535:0:(lod_qos.c:2686:lod_qos_prep_create()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; entered
...
00020000:00000001:4.0:1689017245.043110:0:7535:0:(lod_qos.c:2088:lod_get_stripe_count()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=4 : 4 : 4)
00020000:00000010:4.0:1689017245.043118:0:7535:0:(lod_qos.c:2723:lod_qos_prep_create()) kmalloced &lt;span class=&quot;code-quote&quot;&gt;&apos;(stripe)&apos;&lt;/span&gt;: 32 at ffff880295611e38.
00020000:00000010:4.0:1689017245.043124:0:7535:0:(lod_qos.c:2726:lod_qos_prep_create()) kmalloced &lt;span class=&quot;code-quote&quot;&gt;&apos;(ost_indices)&apos;&lt;/span&gt;: 16 at ffff8802d8267868.
00020000:00001000:4.0:1689017245.043125:0:7535:0:(lod_qos.c:2734:lod_qos_prep_create()) tgt_count 4 stripe_count 4
...
00020000:00000001:4.0:1689017245.043136:0:7535:0:(lod_qos.c:1533:lod_ost_alloc_qos()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; entered
...
00020000:00000001:4.0:1689017245.043147:0:7535:0:(lod_qos.c:109:lod_statfs_and_check()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; entered
00000004:00000001:4.0:1689017245.043149:0:7535:0:(osp_dev.c:795:osp_statfs()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; entered
00000004:00001000:4.0:1689017245.043150:0:7535:0:(osp_dev.c:815:osp_statfs()) lustre-OST0000-osc-MDT0000: blocks=61184, bfree=1024, bavail=0, bsize=4096, reserved_mb_low=1, reserved_mb_high=3, files=35818, ffree=128, state=20
00000004:00000001:4.0:1689017245.043153:0:7535:0:(osp_dev.c:833:osp_statfs()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00020000:01000000:4.0:1689017245.043154:0:7535:0:(lod_qos.c:141:lod_statfs_and_check()) lustre-OST0000-osc-MDT0000: turns inactive
00020000:00000001:4.0:1689017245.043155:0:7535:0:(lod_qos.c:168:lod_statfs_and_check()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=18446744073709551588 : -28 : ffffffffffffffe4)
...
00020000:00001000:4.0:1689017245.043173:0:7535:0:(lod_qos.c:1639:lod_ost_alloc_qos()) found 3 good osts
...
# there are only 3 of these lines that actually allocated stripes...
00000004:00000010:4.0:1689017245.043188:0:7535:0:(osp_dev.c:118:osp_object_alloc()) slab-alloced &lt;span class=&quot;code-quote&quot;&gt;&apos;o&apos;&lt;/span&gt;: 456 at ffff8801a38115b0.
...
00020000:00000001:4.0:1689017245.043460:0:7535:0:(lod_qos.c:1771:lod_ost_alloc_qos()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
00020000:00000001:4.0:1689017245.043462:0:7535:0:(lod_qos.c:2820:lod_qos_prep_create()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In most cases, lod_ost_alloc_qos() will return -EAGAIN when it can&apos;t allocate enough stripes, and then lod_ost_alloc_rr() later will be called.&lt;br/&gt;
lod_ost_alloc_rr() will adjust lod_comp-&amp;gt;llc_stripe_count if it needs to be reduced because fewer stripes are allocated than requested.&lt;br/&gt;
However, in this case, if an OST is deactivated after the call to ltd_qos_is_usable() on line 1592 but before lod_statfs_and_check() on line 1615, then we can end up with fewer stripes than requested, but lod_ost_alloc_qos() still returns 0, not EAGAIN, so the llc_stripe_count is never reduced to the right value.&lt;/p&gt;

&lt;p&gt;This can happen as long as the number of available OSTs is greater than stripe_count_min (but less than stripe_count).&lt;/p&gt;

&lt;p&gt;Here is how to reproduce. There might be a more elegant way to reproduce this but this works for me...&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
# &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; is just to make one OST usage higher so that the QOS algorithm is used instead of RR
lfs setstripe -i0 -c1 /mnt/lustre/bigfile &amp;amp;&amp;amp; head --bytes=$((1024 * 100000)) /dev/zero &amp;gt; /mnt/lustre/bigfile
# on my setup (single VM, 2 MDT, 4 OST), &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; triggers the LBUG pretty reliably within a few hundred loops
&lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; i in {0..500}; &lt;span class=&quot;code-keyword&quot;&gt;do&lt;/span&gt;
&#160; &#160; &#160; &#160; lctl set_param osp.lustre-OST0000-osc-MDT0000.max_create_count=0 &amp;amp; lfs setstripe -c -1 /mnt/lustre/f$i
&#160; &#160; &#160; &#160; lctl set_param osp.lustre-OST0000-osc-MDT0000.max_create_count=1000 &amp;amp; lfs setstripe -c -1 /mnt/lustre/g$i
done
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;</description>
                <environment></environment>
        <key id="77140">LU-16981</key>
            <summary>LBUG in lod_striped_create, fewer stripes allocated than requested</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bertschinger">Thomas Bertschinger</assignee>
                                    <reporter username="bertschinger">Thomas Bertschinger</reporter>
                        <labels>
                    </labels>
                <created>Tue, 25 Jul 2023 15:52:17 +0000</created>
                <updated>Sat, 23 Sep 2023 14:10:56 +0000</updated>
                            <resolved>Sat, 23 Sep 2023 14:10:56 +0000</resolved>
                                                    <fixVersion>Lustre 2.16.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                            <comments>
                            <comment id="380043" author="gerrit" created="Tue, 25 Jul 2023 16:30:24 +0000"  >&lt;p&gt;&quot;Thomas Bertschinger &amp;lt;bertschinger@lanl.gov&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/51759&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/51759&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16981&quot; title=&quot;LBUG in lod_striped_create, fewer stripes allocated than requested&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16981&quot;&gt;&lt;del&gt;LU-16981&lt;/del&gt;&lt;/a&gt; lod: update llc_stripe_count after ost inactive&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: fe98f7f0aad369c6062c302aa4ebc14185cad80c&lt;/p&gt;</comment>
                            <comment id="385598" author="sergey" created="Tue, 12 Sep 2023 12:30:32 +0000"  >&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[17543.927775] Lustre: DEBUG MARKER: == sanity test 270a: DoM: basic functionality tests ====== 15:27:22 (1689017242)
[17545.833056] LustreError: 7535:0:(lod_object.c:6357:lod_striped_create()) ASSERTION( object != ((void *)0) ) failed: 
[17545.835206] LustreError: 7535:0:(lod_object.c:6357:lod_striped_create()) LBUG
[17545.835883] Pid: 7535, comm: mdt02_001 3.10.0-7.9-debug #2 SMP Tue Feb 1 18:17:58 EST 2022
[17545.837081] Call Trace:
[17545.837617] [&amp;lt;0&amp;gt;] libcfs_call_trace+0x90/0xf0 [libcfs]
[17545.838108] [&amp;lt;0&amp;gt;] lbug_with_loc+0x4c/0xa0 [libcfs]
[17545.838720] [&amp;lt;0&amp;gt;] lod_striped_create+0x50b/0x5d0 [lod]
[17545.839298] [&amp;lt;0&amp;gt;] lod_layout_change+0x1ab/0x2a0 [lod]
[17545.839913] [&amp;lt;0&amp;gt;] mdd_layout_change+0x1491/0x1bc0 [mdd]
[17545.840533] [&amp;lt;0&amp;gt;] mdt_layout_change+0x2c7/0x460 [mdt]
[17545.841151] [&amp;lt;0&amp;gt;] mdt_intent_layout+0x910/0xeb0 [mdt]
[17545.843660] [&amp;lt;0&amp;gt;] mdt_intent_opc+0x1dc/0xc40 [mdt]
[17545.844270] [&amp;lt;0&amp;gt;] mdt_intent_policy+0xfa/0x460 [mdt]
[17545.844904] [&amp;lt;0&amp;gt;] ldlm_lock_enqueue+0x3e1/0xbe0 [ptlrpc]
[17545.845620] [&amp;lt;0&amp;gt;] ldlm_handle_enqueue+0x373/0x17d0 [ptlrpc]
[17545.846243] [&amp;lt;0&amp;gt;] tgt_enqueue+0x68/0x240 [ptlrpc]
[17545.846876] [&amp;lt;0&amp;gt;] tgt_request_handle+0x88e/0x19b0 [ptlrpc]
[17545.848492] [&amp;lt;0&amp;gt;] ptlrpc_server_handle_request+0x251/0xc00 [ptlrpc]
[17545.851948] [&amp;lt;0&amp;gt;] ptlrpc_main+0xc41/0x1610 [ptlrpc]
[17545.852574] [&amp;lt;0&amp;gt;] kthread+0xe4/0xf0
[17545.853129] [&amp;lt;0&amp;gt;] ret_from_fork_nospec_begin+0x7/0x21
[17545.853749] [&amp;lt;0&amp;gt;] 0xfffffffffffffffe
[17545.854332] Kernel panic - not syncing: LBUG &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Adding back trace from the crash dump to make it is easier to find.&lt;/p&gt;</comment>
                            <comment id="387017" author="gerrit" created="Sat, 23 Sep 2023 06:02:56 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/51759/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/51759/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16981&quot; title=&quot;LBUG in lod_striped_create, fewer stripes allocated than requested&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16981&quot;&gt;&lt;del&gt;LU-16981&lt;/del&gt;&lt;/a&gt; lod: update llc_stripe_count after ost inactive&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 78336aa166f4a7a0128a5891c747eecf26ff9565&lt;/p&gt;</comment>
                            <comment id="387036" author="pjones" created="Sat, 23 Sep 2023 14:10:56 +0000"  >&lt;p&gt;Landed for 2.16&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="74967">LU-16623</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i03rdr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>