<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:32:06 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-17037] Tests should run with high and sparse index numbers for OSTs and MDTs</title>
                <link>https://jira.whamcloud.com/browse/LU-17037</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;As a long term effort to improve the overall stability of lustre, the test suite should be evaluated and modified to allow for testing against OST and MDT sets which contain index numbers that are both high and sparse.&lt;/p&gt;

&lt;p&gt;What I mean by this is that in many cases we&apos;re seeing more sites choosing to deploy flash OSTs within the first 100 slots, and then moving all HDD OSTs to index slots &amp;gt; 100 or vice-versa.&lt;/p&gt;

&lt;p&gt;This has been shown to introduce issues which some features (most recently a memory corruption issue within pool quotas: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-17034&quot; title=&quot;memory corruption caused by bug in qmt_seed_glbe_all&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-17034&quot;&gt;&lt;del&gt;LU-17034&lt;/del&gt;&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;Right now the existing test suite assumes that OSTs will exist on certain low-index numbers which are hard coded into it.&lt;/p&gt;

&lt;p&gt;This is sub-optimal to catch cases such as &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-17034&quot; title=&quot;memory corruption caused by bug in qmt_seed_glbe_all&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-17034&quot;&gt;&lt;del&gt;LU-17034&lt;/del&gt;&lt;/a&gt;, as a result we will need to modify the test suite to allow for testing these cases for all features.&lt;/p&gt;</description>
                <environment></environment>
        <key id="77492">LU-17037</key>
            <summary>Tests should run with high and sparse index numbers for OSTs and MDTs</summary>
                <type id="3" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11318&amp;avatarType=issuetype">Task</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="3" iconUrl="https://jira.whamcloud.com/images/icons/statuses/inprogress.png" description="This issue is being actively worked on at the moment by the assignee.">In Progress</status>
                    <statusCategory id="4" key="indeterminate" colorName="inprogress"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="yujian">Jian Yu</assignee>
                                    <reporter username="cfaber">Colin Faber</reporter>
                        <labels>
                            <label>tests</label>
                    </labels>
                <created>Thu, 17 Aug 2023 15:04:34 +0000</created>
                <updated>Thu, 26 Oct 2023 23:43:07 +0000</updated>
                                            <version>Lustre 2.15.3</version>
                                    <fixVersion>Lustre 2.17.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="382868" author="adilger" created="Thu, 17 Aug 2023 19:18:39 +0000"  >&lt;p&gt;Note that there are some test cases which are already testing sparse OST indexes, for example &lt;tt&gt;conf-sanity.sh&lt;/tt&gt; &lt;tt&gt;test_81&lt;/tt&gt;, &lt;tt&gt;test_82a&lt;/tt&gt;.&lt;/p&gt;

&lt;p&gt;There is already &lt;tt&gt;test-framework.sh&lt;/tt&gt; support for non-sequential OST numbers by using &lt;tt&gt;OST_INDEX_LIST&lt;/tt&gt; to specify the index values, but it might make sense to improve this support as needed.  This is &quot;documented&quot; in &lt;tt&gt;lustre/tests/cfg/local.sh&lt;/tt&gt;:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# OST indices can be specified as follows:
# OSTINDEX1=&quot;1&quot;
# OSTINDEX2=&quot;2&quot;
# OSTINDEX3=&quot;4&quot;
# ......
# or            
# OST_INDEX_LIST=&quot;[1,2,4-6,8]&quot;  # [n-m,l-k,...], where n &amp;lt; m and l &amp;lt; k, etc.
#               
# The default index value of an individual OST is its facet number minus 1.
# More specific ones override more general ones. See facet_index().
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;What needs to be done here is to fix the many, many subtests that assume &lt;tt&gt;ost1 == OST0000&lt;/tt&gt;, &lt;tt&gt;ost2 == OST0001&lt;/tt&gt;, etc. (often using the facet number - 1 as the index, or the index number + 1 as the facet name), and instead use helpers that map the facet name/number to the OST number in &lt;tt&gt;OST_INDEX_LIST&lt;/tt&gt; (which is mapped internally to an associative array &lt;tt&gt;$OST_INDICES&lt;/tt&gt;.&lt;/p&gt;

&lt;p&gt;There are some helper functions that exist, but might need to be updated, and definitely need to be used more widely:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;&lt;tt&gt;facet_number()&lt;/tt&gt; converts a facet name like &lt;tt&gt;ostN&lt;/tt&gt; to the facet number &lt;tt&gt;N&lt;/tt&gt;, looks OK&lt;/li&gt;
	&lt;li&gt;&lt;tt&gt;facet_type()&lt;/tt&gt; converts a facet name like &lt;tt&gt;ostN&lt;/tt&gt; to the type &lt;tt&gt;OST&lt;/tt&gt;, looks OK&lt;/li&gt;
	&lt;li&gt;&lt;tt&gt;facet_svc()&lt;/tt&gt; converts a facet name like &lt;tt&gt;ostN&lt;/tt&gt; to the service name via &lt;tt&gt;${facet}_svc&lt;/tt&gt; variables, likely &lt;tt&gt;$fsname-$typeXXXX&lt;/tt&gt; (not sure)&lt;/li&gt;
	&lt;li&gt;&lt;tt&gt;facet_index()&lt;/tt&gt; converts a facet name like &lt;tt&gt;ostN&lt;/tt&gt; to the OST index number &lt;tt&gt;n&lt;/tt&gt; (whatever it is), via &lt;tt&gt;OSTINDEXN&lt;/tt&gt; or &lt;tt&gt;OST_INDICES&lt;span class=&quot;error&quot;&gt;&amp;#91;N&amp;#93;&lt;/span&gt;&lt;/tt&gt; variables, though I&apos;m not sure why we have both).  This &lt;em&gt;should&lt;/em&gt; be used &lt;b&gt;lots&lt;/b&gt; of places but is not.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;It might be useful to add some more helper functions to simplify the remapping, like:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;&lt;tt&gt;facet_ost_name()&lt;/tt&gt; converts an index number like &quot;&lt;tt&gt;n&lt;/tt&gt;&quot; to the OST facet name &lt;tt&gt;ostN&lt;/tt&gt;&lt;/li&gt;
	&lt;li&gt;&lt;tt&gt;facet_mdt_name()&lt;/tt&gt; converts an index number like &quot;&lt;tt&gt;n&lt;/tt&gt;&quot; to the MDT facet name &lt;tt&gt;mdsN&lt;/tt&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;There is currently no support for non-contiguous MDT index numbers in &lt;tt&gt;test-framework.sh&lt;/tt&gt;, and I don&apos;t think this has been tested anywhere.  Until we get MDT pools, I&apos;m not sure if there is much motivation to configure discontiguous index numbers, but I&apos;m sure it will happen somewhere eventually.   However, I don&apos;t think implementing support for testing this and fixing the many resulting bugs is a priority compared to fixing discontiguous OST support.&lt;/p&gt;</comment>
                            <comment id="382869" author="adilger" created="Thu, 17 Aug 2023 19:27:34 +0000"  >&lt;p&gt;Probably due to the many subtests that need to be fixed, it would make sense to split patches into separate files (or maybe multiple patches for large scripts like &lt;tt&gt;sanity.sh&lt;/tt&gt;) so that they can land independently, unless there are only a few changes in a single file.&lt;/p&gt;

&lt;p&gt;It might be possible to test which subtests are having obvious problems by running &quot;&lt;tt&gt;env=OST_INDEX_LIST=&lt;span class=&quot;error&quot;&gt;&amp;#91;0,10,20,40,55,60,80&amp;#93;&lt;/span&gt;&lt;/tt&gt;&quot; (for &lt;tt&gt;OSTCOUNT=8&lt;/tt&gt;) or similar in autotest (or just set &lt;tt&gt;OST_INDEX_LIST&lt;/tt&gt; in your local test environment) and run through the test scripts multiple times to fix failures as they are hit.  Probably a huge number of test failures would be hit if there is no &lt;tt&gt;OST0000&lt;/tt&gt;, so that might be last to test after other subtests are fixed.&lt;/p&gt;</comment>
                            <comment id="384031" author="yujian" created="Tue, 29 Aug 2023 08:12:04 +0000"  >&lt;p&gt;In conf-sanity test_82a(), the random sparse indices for OSTs are generated as follows:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
        # Format OSTs with random sparse indices.
        local i
        local index
        local ost_indices
        local LOV_V1_INSANE_STRIPE_COUNT=65532
        &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; i in $(seq $OSTCOUNT); &lt;span class=&quot;code-keyword&quot;&gt;do&lt;/span&gt;
                index=$(((RANDOM * 2) % LOV_V1_INSANE_STRIPE_COUNT))
                ost_indices+=&lt;span class=&quot;code-quote&quot;&gt;&quot; $index&quot;&lt;/span&gt;
        done
        ost_indices=$(comma_list $ost_indices)

        stack_trap &lt;span class=&quot;code-quote&quot;&gt;&quot;restore_ostindex&quot;&lt;/span&gt; EXIT
        echo -e &lt;span class=&quot;code-quote&quot;&gt;&quot;\nFormat $OSTCOUNT OSTs with sparse indices $ost_indices&quot;&lt;/span&gt;
        OST_INDEX_LIST=[$ost_indices] formatall
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;To make a quick experiment, I used the above way in &lt;tt&gt;cfg/local.sh&lt;/tt&gt; to set &lt;tt&gt;OST_INDEX_LIST&lt;/tt&gt; with random sparse indices, and then ran runtests. It passed:&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sets/b4443be0-b163-4ec9-9271-f36464ac41c4&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/b4443be0-b163-4ec9-9271-f36464ac41c4&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;UUID                   1K-blocks        Used   Available Use% Mounted on
lustre-MDT0000_UUID        95248        4340       82252   6% /mnt/lustre[MDT:0]
lustre-OST090c_UUID       142216        7288      120928   6% /mnt/lustre[OST:2316]
lustre-OST4b24_UUID       142216        9088      119128   8% /mnt/lustre[OST:19236]
lustre-OST9234_UUID       142216        9416      118800   8% /mnt/lustre[OST:37428]
lustre-OST986c_UUID       142216       14568      113648  12% /mnt/lustre[OST:39020]

filesystem_summary:       568864       40360      472504   8% /mnt/lustre
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;I&apos;m going to push a fortestonly patch to run the full test group by autotest with the above change to see which subtests are failing.&lt;/p&gt;</comment>
                            <comment id="384085" author="gerrit" created="Tue, 29 Aug 2023 15:59:45 +0000"  >&lt;p&gt;&quot;Jian Yu &amp;lt;yujian@whamcloud.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/52158&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/52158&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-17037&quot; title=&quot;Tests should run with high and sparse index numbers for OSTs and MDTs&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-17037&quot;&gt;LU-17037&lt;/a&gt; tests: full group testing with sparse OST indexes&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: a5a1a2346e87afe5914b8ab098583ae8920e4598&lt;/p&gt;</comment>
                            <comment id="384176" author="yujian" created="Wed, 30 Aug 2023 04:55:32 +0000"  >&lt;p&gt;I&apos;m vetting the test results in &lt;a href=&quot;https://review.whamcloud.com/52158&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/52158&lt;/a&gt; on master branch.&lt;br/&gt;
With sparse OST indexes &quot;OST_INDEX_LIST=&lt;span class=&quot;error&quot;&gt;&amp;#91;0,10,20,40,55,60,80&amp;#93;&lt;/span&gt;&quot; (for OSTCOUNT=7) and &quot;ENABLE_QUOTA=yes&quot; specified, at least performance-sanity test 2 and sanity-benchmark test dbench crashed on master branch. I just updated &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-17034&quot; title=&quot;memory corruption caused by bug in qmt_seed_glbe_all&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-17034&quot;&gt;&lt;del&gt;LU-17034&lt;/del&gt;&lt;/a&gt; with the detailed LBUG info.&lt;br/&gt;
&#160;&lt;/p&gt;</comment>
                            <comment id="384182" author="adilger" created="Wed, 30 Aug 2023 07:32:44 +0000"  >&lt;p&gt;I think of particular interest is also sanity-quota and ost-pools, since pools + quota + sparse OST index was the source of the problem.&lt;/p&gt;</comment>
                            <comment id="384243" author="yujian" created="Wed, 30 Aug 2023 16:02:00 +0000"  >&lt;p&gt;Here are the full-dne-part-{1,2,3} test results with &quot;OST_INDEX_LIST=&lt;span class=&quot;error&quot;&gt;&amp;#91;0,10,20,40,55,60,80&amp;#93;&lt;/span&gt;&quot; and &quot;ENABLE_QUOTA=yes&quot;:&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sessions/551b2e1f-2493-411f-9cbd-bb28dd0b1607&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sessions/551b2e1f-2493-411f-9cbd-bb28dd0b1607&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sessions/f7c3d574-e349-42ee-99f3-35c761205148&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sessions/f7c3d574-e349-42ee-99f3-35c761205148&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sessions/a1fffb9a-56b2-439a-ab48-20f9c5fcd5eb&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sessions/a1fffb9a-56b2-439a-ab48-20f9c5fcd5eb&lt;/a&gt;&lt;br/&gt;
sanity-quota hit the LBUG at test 0. ost-pools didn&apos;t crash and it passed with 38 subtests out of 56.&lt;/p&gt;</comment>
                            <comment id="384244" author="yujian" created="Wed, 30 Aug 2023 16:08:51 +0000"  >&lt;p&gt;I just removed the &quot;ENABLE_QUOTA=yes&quot; test parameter and triggered the full group testing again to make LBUG not block other test suites. After &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-17034&quot; title=&quot;memory corruption caused by bug in qmt_seed_glbe_all&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-17034&quot;&gt;&lt;del&gt;LU-17034&lt;/del&gt;&lt;/a&gt; is fixed, I&apos;ll add the parameter and test again.&lt;br/&gt;
Now I&apos;m looking into the non-LBUG failures and trying to fix them.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="77477">LU-17034</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="73349">LU-16331</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i03t87:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10023"><![CDATA[4]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>