<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:34:17 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-10353] parallel-scale* tests fail with &#8216;No space left on device&#8217;</title>
                <link>https://jira.whamcloud.com/browse/LU-10353</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Several of the tests in parallel-scale are failing with some variant of &#8216;No space left on device&#8217;. One failed parallel-scale test suite is at&lt;br/&gt;
&lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/fde9d7ba-dae4-11e7-8027-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/fde9d7ba-dae4-11e7-8027-52540065bddc&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The tests that fail are compilebench, simul, connectathon, iorssf, iorfpp, ior_mdtest_parallel_ssf, ior_mdtest_parallel_fpp, and fio.&lt;/p&gt;

&lt;p&gt;For test_compilebench, the test checks to see if there is enough space to write to the file system and will skip the test if there is not enough. compilebench needs ~ 1GB of space to run. In this case, we can see that we have 12482080 KB. The following is from the client test log:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;== parallel-scale test compilebench: compilebench ==================================================== 16:17:36 (1512577056)
OPTIONS:
cbench_DIR=/usr/bin
cbench_IDIRS=2
cbench_RUNS=2
trevis-3vm1.trevis.hpdd.intel.com
trevis-3vm2
free space = 12482080 KB
./compilebench -D /mnt/lustre/d0.compilebench.5990 -i 2         -r 2 --makej
using working directory /mnt/lustre/d0.compilebench.5990, 2 intial dirs 2 runs
native unpatched native-0 222MB in 285.09 seconds (0.78 MB/s)
native patched native-0 109MB in 48.51 seconds (2.26 MB/s)
native patched compiled native-0 691MB in 169.31 seconds (4.08 MB/s)
create dir kernel-0 222MB in 132.84 seconds (1.67 MB/s)
create dir kernel-1 222MB in 150.75 seconds (1.48 MB/s)
compile dir kernel-1 680MB in 172.82 seconds (3.94 MB/s)
Traceback (most recent call last):
  File &quot;./compilebench&quot;, line 594, in &amp;lt;module&amp;gt;
    if not compile_one_dir(dset, rnd):
  File &quot;./compilebench&quot;, line 368, in compile_one_dir
    mbs = run_directory(ch[0], dir, &quot;compile dir&quot;)
  File &quot;./compilebench&quot;, line 243, in run_directory
    fp.write(buf)
IOError: [Errno 28] No space left on device
 parallel-scale test_compilebench: @@@@@@ FAIL: compilebench failed: 1 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;metabench runs with no problems, but simul fails with &lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;16:46:21: Process 0(trevis-3vm1.trevis.hpdd.intel.com): FAILED in create_files, write in file /mnt/lustre/d0.simul/simul_read.0: No space left on device
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Similar to compilebench, connectathon checks to see how much space is available on the file system and will skip the test if there is not enough space; it needs about 40 MB. From the client test log, we can see that there is free space = 10654792 KB available on the Lustre file system. From the client test log:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;./test5: read and write
	./test5: (/mnt/lustre/d0.connectathon) &apos;bigfile&apos; write failed : No space left on device
basic tests failed
 parallel-scale test_connectathon: @@@@@@ FAIL: connectathon failed: 1 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The &#8220;bigfile&#8221; is 30 MB.&lt;/p&gt;

&lt;p&gt;Looking in the MDS (vm4) console for this test session (&lt;a href=&quot;https://testing.hpdd.intel.com/test_sessions/6c155f47-820d-447d-893f-15b24418827f&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sessions/6c155f47-820d-447d-893f-15b24418827f&lt;/a&gt;), eventhough metabench passed, we see&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[29644.867500] Lustre: DEBUG MARKER: == parallel-scale test metabench: metabench ========================================================== 16:38:42 (1512578322)
[29723.246493] LustreError: 19423:0:(osp_precreate.c:657:osp_precreate_send()) lustre-OST0000-osc-MDT0000: precreate fid [0x100000000:0xc7e76:0x0] &amp;lt; local used fid [0x100000000:0xc7e76:0x0]: rc = -116
[29723.252559] LustreError: 19423:0:(osp_precreate.c:1282:osp_precreate_thread()) lustre-OST0000-osc-MDT0000: cannot precreate objects: rc = -116
[30044.166827] sched: RT throttling activated
[30098.760027] Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 	    fail_val=0 2&amp;gt;/dev/null
[30099.740367] Lustre: DEBUG MARKER: rc=0;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;


&lt;p&gt;Similarly, for another test that passed, mdtestfpp, in the same MDS console log:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[30552.987700] Lustre: DEBUG MARKER: == parallel-scale test mdtestfpp: mdtestfpp ========================================================== 16:53:49 (1512579229)
[30792.302159] LustreError: 19423:0:(osp_precreate.c:657:osp_precreate_send()) lustre-OST0000-osc-MDT0000: precreate fid [0x100000000:0xf05cd:0x0] &amp;lt; local used fid [0x100000000:0xf05cd:0x0]: rc = -116
[30792.307280] LustreError: 19423:0:(osp_precreate.c:1282:osp_precreate_thread()) lustre-OST0000-osc-MDT0000: cannot precreate objects: rc = -116
[30792.307283] LustreError: 19401:0:(osp_precreate.c:1334:osp_precreate_ready_condition()) lustre-OST0000-osc-MDT0000: precreate failed opd_pre_status -116
[30792.307290] LustreError: 19401:0:(osp_precreate.c:1334:osp_precreate_ready_condition()) Skipped 1 previous similar message
[30917.997538] Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 	    fail_val=0 2&amp;gt;/dev/null
[30919.134892] Lustre: DEBUG MARKER: rc=0;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This failure looks similar to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7834&quot; title=&quot;parallel-scale-nfsv4 test_compilebench: IOError: [Errno 28] No space left on device&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7834&quot;&gt;LU-7834&lt;/a&gt; except, only test_compilebench fails.&lt;/p&gt;

&lt;p&gt;Note: These failures are only seen on full test sessions.&lt;/p&gt;

&lt;p&gt;Logs for parallel-scale &#8216;No space left on device&#8217; failures with several tests failing are at&lt;br/&gt;
&lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/4fdaa576-daa0-11e7-9c63-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/4fdaa576-daa0-11e7-9c63-52540065bddc&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/0e389fce-da73-11e7-8027-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/0e389fce-da73-11e7-8027-52540065bddc&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The first time parallel-scale test_compilebench failed with this error and with osp_precreate_thread errors in the MDS console log was on 2017-11-22 with master build # 3672.&lt;/p&gt;</description>
                <environment></environment>
        <key id="49654">LU-10353</key>
            <summary>parallel-scale* tests fail with &#8216;No space left on device&#8217;</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="jamesanunez">James Nunez</reporter>
                        <labels>
                    </labels>
                <created>Fri, 8 Dec 2017 00:28:47 +0000</created>
                <updated>Wed, 21 Dec 2022 22:18:45 +0000</updated>
                                            <version>Lustre 2.11.0</version>
                    <version>Lustre 2.12.0</version>
                    <version>Lustre 2.10.3</version>
                    <version>Lustre 2.10.4</version>
                    <version>Lustre 2.10.5</version>
                    <version>Lustre 2.12.1</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                            <comments>
                            <comment id="215994" author="adilger" created="Mon, 11 Dec 2017 22:44:06 +0000"  >&lt;p&gt;It seems that the filesystem default striping has been changed by one of the tests, possibly &lt;tt&gt;sanity-pfl.sh&lt;/tt&gt;, which is causing some of these tests to fail.&lt;/p&gt;

&lt;p&gt;In cases where the test itself has an assumption on the number of stripes, it should explicitly request that number of stripes.  &lt;/p&gt;

&lt;p&gt;However, in cases where the test itself doesn&apos;t have a requirement for a specific stripe count I don&apos;t want to make a blanket change to all tests &quot;just to make them pass&quot; when the stripe count is different.  Rather, I&apos;d prefer to understand why the test is failing and make it pass with different stripe counts (1, N, -1) so that we can get better test coverage.&lt;/p&gt;

&lt;p&gt;Ideally, we could run the the whole test suite with random (but valid) stripe counts, stripe sizes, PFL or FLR layouts, and have it pass, but I think we are still a ways away from that.&lt;/p&gt;</comment>
                            <comment id="216230" author="adilger" created="Wed, 13 Dec 2017 23:19:16 +0000"  >&lt;p&gt;Separate from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10350&quot; title=&quot;ost-pools test 1n fails with &amp;#39;failed to write to /mnt/lustre/d1n.ost-pools/file: 1&amp;#39;&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10350&quot;&gt;&lt;del&gt;LU-10350&lt;/del&gt;&lt;/a&gt;, one of the issues is that &lt;tt&gt;sanity-pfl test_10&lt;/tt&gt; is setting the default filesystem layout to use &quot;&lt;tt&gt;stripe_index: 0&lt;/tt&gt;&quot;, which may cause space imbalance for later tests.&lt;/p&gt;</comment>
                            <comment id="223328" author="mdiep" created="Mon, 12 Mar 2018 14:58:17 +0000"  >&lt;p&gt;+1 on b2_10&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/2f9e9f34-23e7-11e8-9852-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/2f9e9f34-23e7-11e8-9852-52540065bddc&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="49643">LU-10350</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="45439">LU-9324</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="49731">LU-10382</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzp07:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>