<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:19:44 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-1792] conf-sanity.sh test_53a/test_53b take too long to run</title>
                <link>https://jira.whamcloud.com/browse/LU-1792</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Looking at recent review test runs to see where the time is being spent, I see that conf-sanity.sh test_53a and test_53b are sometimes taking far too long to run - over 800s and 1000s respectively, sometimes twice that.  They should be able to complete in a few seconds, but the remount in the middle of the test seems to take the longest time.  Given that these tests run in our test environment about 20x per day, this could be wasting 10h or more of testing time each day.&lt;/p&gt;

&lt;p&gt;Some investigation needs to be done to see why these tests are taking so long to run.  Is it that mount and/or unmount is very slow?  If so, why?  Simply skipping these tests for SLOW=no is not a valid solution, since slow mounting/unmounting affects all of our users and wastes even more time for every test that is run, but it is less visible when done at the start of a test run instead of in the middle.&lt;/p&gt;

&lt;p&gt;See &lt;a href=&quot;https://maloo.whamcloud.com/sub_tests/c14c1bec-ef9e-11e1-bdf7-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/sub_tests/c14c1bec-ef9e-11e1-bdf7-52540035b04c&lt;/a&gt; and &lt;a href=&quot;https://maloo.whamcloud.com/sub_tests/c1559e7e-ef9e-11e1-bdf7-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/sub_tests/c1559e7e-ef9e-11e1-bdf7-52540035b04c&lt;/a&gt; for logs.&lt;/p&gt;</description>
                <environment></environment>
        <key id="15604">LU-1792</key>
            <summary>conf-sanity.sh test_53a/test_53b take too long to run</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="adilger">Andreas Dilger</reporter>
                        <labels>
                    </labels>
                <created>Mon, 27 Aug 2012 18:31:54 +0000</created>
                <updated>Sat, 10 Mar 2018 08:16:30 +0000</updated>
                            <resolved>Sat, 10 Mar 2018 08:16:30 +0000</resolved>
                                    <version>Lustre 2.4.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>1</watches>
                                                                            <comments>
                            <comment id="43840" author="brian" created="Mon, 27 Aug 2012 21:00:53 +0000"  >&lt;p&gt;As much as I hate to beat a (way) dead (and buried) horse, ltest used to time every test and keep a history of each test&apos;s run time and use those gathered timings to apply a timeout to future test runs.  The result was that any individual test&apos;s time run was bounded by it&apos;s historical run-times to prevent excessive wasting of time.  Of course, any test that over-ran it&apos;s historically calculated limit was considered a failure.&lt;/p&gt;

&lt;p&gt;Just food for thought.&lt;/p&gt;</comment>
                            <comment id="43851" author="adilger" created="Tue, 28 Aug 2012 01:05:28 +0000"  >&lt;p&gt;Sure, I remember.  I recall there was a 10-15% margin for variability in the test.  However, looking at the test results the variability is huge.  This may be related to running in a VM and potentially contending for CPU or disk bandwidth?&lt;/p&gt;

&lt;p&gt;The shortest passing test took 115s and the longest took 1541s so any attempt to do this for the current virtual test environment wouldn&apos;t be possible.  I haven&apos;t made any attempt to correlate this to branch/arch/cluster.&lt;/p&gt;
</comment>
                            <comment id="43862" author="brian" created="Tue, 28 Aug 2012 09:01:27 +0000"  >&lt;p&gt;Hrm.  Yeah.  I suppose VMing all of these test clusters could indeed introduce a lot of variability.  Perhaps too much to apply any sort of &quot;expected run time&quot; type of watchdogs.  Pity.&lt;/p&gt;

&lt;p&gt;I do remember that feature being a boon to preventing test clusters from spinning for many many hours on a test that had failed in an unpredictable manner.&lt;/p&gt;

&lt;p&gt;I wonder if it would be worth the effort for somebody to actually do the branch/arch/cluster correlation of the wildly swinging test times to see just how unpredictable it really is.&lt;/p&gt;</comment>
                            <comment id="44003" author="chris" created="Thu, 30 Aug 2012 14:24:35 +0000"  >&lt;p&gt;This is an issue of why lustre is taking so long to mount/unmount/remount the filesystem and the focus should be there for this topic.&lt;/p&gt;

&lt;p&gt;We do have statistics for the duration of every test ever run on autotest, so if people want to mine the data they are welcome. The problem of &apos;correct time&apos; is not a vm vs !vm issue because physical hardware can vary by as much between systems and network bandwidth in particular is very dependent on what else is happening on the system.&lt;/p&gt;

&lt;p&gt;The vm&apos;s client #24 and below probably provide very consistent times because they are a completely closed system during test with no outside influence.&lt;/p&gt;</comment>
                            <comment id="223031" author="adilger" created="Sat, 10 Mar 2018 08:16:30 +0000"  >&lt;p&gt;Current subtest runs are in the 125 or 250s range.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzw27j:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>10440</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>