<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:35:39 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-3640] Test failure on test suite sanity, test_116a, when OST is full</title>
                <link>https://jira.whamcloud.com/browse/LU-3640</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;This issue was created by maloo for James Nunez &amp;lt;james.a.nunez@intel.com&amp;gt;&lt;/p&gt;

&lt;p&gt;This issue relates to the following test suite run: &lt;a href=&quot;http://maloo.whamcloud.com/test_sets/92e7707e-f4b3-11e2-b8a2-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://maloo.whamcloud.com/test_sets/92e7707e-f4b3-11e2-b8a2-52540035b04c&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The sub-test test_116a failed with the following error:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;test_116a returned 1&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;When an OST fills, test 116a looks for the OST with the minimum size available and tries to fill 25% of the remaining space, but it never checks to see if there is any space available.&lt;/p&gt;

&lt;p&gt;This test should check if the minimum size available on an OST is zero and, if so, exit/skip. &lt;/p&gt;

&lt;p&gt;The client test_log looks like:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;== sanity test 116a: stripe QOS: free space balance ===================== 22:17:35 (1374729455)
Free space priority error: get_param: /proc/{fs,sys}/{lnet,lustre}/lov/*-clilov-*/qos_prio_free: Found no match
CMD: client-26vm3 lctl set_param -n osd*.*MD*.force_sync 1
CMD: client-26vm3 lctl get_param -n osc.*MDT*.sync_*
CMD: client-26vm3 lctl get_param -n osc.*MDT*.sync_*
CMD: client-26vm3 lctl get_param -n osc.*MDT*.sync_*
CMD: client-26vm3 lctl get_param -n osc.*MDT*.sync_*
CMD: client-26vm3 lctl get_param -n osc.*MDT*.sync_*
CMD: client-26vm3 lctl get_param -n osc.*MDT*.sync_*
CMD: client-26vm3 lctl get_param -n osc.*MDT*.sync_*
CMD: client-26vm3 lctl get_param -n osc.*MDT*.sync_*
Waiting for local destroys to complete
OST kbytes available: 164288 172508 172284 0 164312 172300 162240
Min free space: OST 3: 0
Max free space: OST 1: 172508
Filling 25% remaining space in OST3 with 0Kb
CMD: client-26vm3 lctl get_param -n lov.*.qos_maxage
Waiting for local destroys to complete
OST kbytes available: 164068 172508 172504 0 164312 172520 162016
Min free space: OST 3: 0
Max free space: OST 5: 172520
/usr/lib64/lustre/tests/sanity.sh: line 6604: 172520 * 100 / 0: division by 0 (error token is &quot;0&quot;)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
</description>
                <environment></environment>
        <key id="20008">LU-3640</key>
            <summary>Test failure on test suite sanity, test_116a, when OST is full</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="jamesanunez">James Nunez</assignee>
                                    <reporter username="maloo">Maloo</reporter>
                        <labels>
                    </labels>
                <created>Thu, 25 Jul 2013 20:31:18 +0000</created>
                <updated>Wed, 16 Oct 2013 02:05:33 +0000</updated>
                            <resolved>Fri, 27 Sep 2013 20:55:19 +0000</resolved>
                                    <version>Lustre 2.4.1</version>
                    <version>Lustre 2.5.0</version>
                                    <fixVersion>Lustre 2.5.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="63048" author="jamesanunez" created="Fri, 26 Jul 2013 16:00:16 +0000"  >&lt;p&gt;Proposed patch at:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://review.whamcloud.com/7132&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/7132&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="63419" author="adilger" created="Wed, 31 Jul 2013 20:33:21 +0000"  >&lt;p&gt;I don&apos;t think there is anything wrong with this patch, but I think it is fixing the symptom and not the cause of the problem.&lt;/p&gt;

&lt;p&gt;Why is one OST full in the first place?  Is some test creating huge files and not cleaning them up?  Is there a missing &quot;wait_delete_completed&quot; after some large file is deleted? Are deleted files not causing the objects to be destroyed (which would be a serious bug that should be fixed)?&lt;/p&gt;</comment>
                            <comment id="63423" author="jamesanunez" created="Wed, 31 Jul 2013 20:43:13 +0000"  >&lt;p&gt;I agree. I&apos;m looking into what test(s) is filling an OST. In the cases of a single OST filling that I&apos;ve looked at, sanity test 101d is the first test that fails due to a full OST. So, working back from there.&lt;/p&gt;</comment>
                            <comment id="63670" author="jamesanunez" created="Mon, 5 Aug 2013 18:29:49 +0000"  >&lt;p&gt;So far, I can say that I&apos;ve run into this problem twice, both times on Toro where the OSTs are ~9.77 GB:&lt;br/&gt;
7/31/13 &lt;a href=&quot;https://maloo.whamcloud.com/test_sessions/951e6f9c-f985-11e2-8917-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sessions/951e6f9c-f985-11e2-8917-52540035b04c&lt;/a&gt;&lt;br/&gt;
7/29/13 &lt;a href=&quot;https://maloo.whamcloud.com/test_sessions/496f53b4-f8da-11e2-977b-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sessions/496f53b4-f8da-11e2-977b-52540035b04c&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I submitted a test run to try and trigger the full OST problem again with printing available space on the OSTs throughout the sanity suite, but haven&apos;t had any luck ... yet; &lt;a href=&quot;http://review.whamcloud.com/#/c/7207/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/7207/&lt;/a&gt; .&lt;/p&gt;

&lt;p&gt;Currently, some of sanity tests 27 look like possible culprits:&lt;br/&gt;
sanity test 27m: create file while OST0 was full &lt;span class=&quot;error&quot;&gt;&amp;#91;always skipped, so not the problem&amp;#93;&lt;/span&gt;&lt;br/&gt;
sanity test 27n: create file with some full OSTs&lt;br/&gt;
sanity test 27o: create file with all full OSTs (should error)&lt;br/&gt;
sanity test 27p: append to a truncated file with some full OSTs&lt;br/&gt;
sanity test 27q: append to truncated file with all OSTs full (should error)&lt;br/&gt;
sanity test 27r: stripe file with some full OSTs (shouldn&apos;t LBUG)&lt;/p&gt;</comment>
                            <comment id="63715" author="jamesanunez" created="Tue, 6 Aug 2013 13:57:11 +0000"  >&lt;p&gt;In the end, we didn&apos;t have to look very far from test 101d.  It just looks like test 101d is filling the OST on it&apos;s own. From the test suite log at &lt;a href=&quot;http://review.whamcloud.com/#/c/7207/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/7207/&lt;/a&gt;, we see that test 101c ends with space left on all OSTs, i.e. OST 0 is not full:&lt;/p&gt;

&lt;p&gt;18:35:11:== sanity test 101c: check stripe_size aligned read-ahead =================== 18:35:10 (1375752910) &lt;br/&gt;
... &lt;br/&gt;
18:35:34:OST kbytes available: 148656 148672 157676 147636 157892 157672 157892 18:35:34:Min free space: OST 3: 147636 18:35:34:Max free space: OST 4: 157892&lt;/p&gt;

&lt;p&gt;Then 101d starts and dd fails: &lt;br/&gt;
18:35:34:== sanity test 101d: file read with and without read-ahead enabled =================== 18:35:29 (1375752929) &lt;br/&gt;
18:35:34:Creating 500M test file /mnt/lustre/f.sanity.101d 18:35:45:dd: writing `/mnt/lustre/f.sanity.101d&apos;: No space left on device 18:35:45:165+0 records in 18:35:45:164+0 records out 18:35:46:172576768 bytes (173 MB) copied, 7.17172 s, 24.1 MB/s 18:35:46: sanity test_101d: @@@@@@ FAIL: dd failed&lt;/p&gt;

&lt;p&gt;From the code, we get the space on the file system and then try and write a 500 MB file, but the stripe of that file is never set. So, the file system has more than 500 MB of space available, but a single OST does not and, thus, the 500 MB file fills a single OST.&lt;/p&gt;

&lt;p&gt;I verified this locally by running df as in 101d:&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@client1 tests&amp;#93;&lt;/span&gt;# df -P /mnt/lscratch&lt;br/&gt;
Filesystem         1024-blocks      Used Available Capacity Mounted on&lt;br/&gt;
192.168.0.200@tcp0:/lscratch    749856    103072    601184      15% /mnt/lscratch&lt;/p&gt;

&lt;p&gt;and we see we have more than enough for a 500MB file, yet:&lt;br/&gt;
== sanity test 101d: file read with and without read-ahead enabled  =================== 07:32:58 (1375795978)&lt;br/&gt;
Waiting for local destroys to complete&lt;br/&gt;
OST kbytes available: 151696 150096 150296 149896&lt;br/&gt;
Min free space: OST 3: 149896&lt;br/&gt;
Max free space: OST 0: 151696&lt;br/&gt;
Found local space to be 601984 kb&lt;br/&gt;
Creating 500M test file /mnt/lscratch/f.sanity.101d&lt;br/&gt;
dd: writing `/mnt/lscratch/f.sanity.101d&apos;: No space left on device&lt;br/&gt;
156+0 records in&lt;br/&gt;
155+0 records out&lt;br/&gt;
163078144 bytes (163 MB) copied, 14.5852 s, 11.2 MB/s&lt;br/&gt;
 sanity test_101d: @@@@@@ FAIL: dd failed &lt;/p&gt;

&lt;p&gt;So, I guess the assumption is that the file should be striped across all OSTs since the check for available space is for the whole file system and not just individual OSTs.&lt;/p&gt;

&lt;p&gt;The patch for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3633&quot; title=&quot;sanity.sh test_101d failed for &amp;#39;dd failed&amp;#39;&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3633&quot;&gt;&lt;del&gt;LU-3633&lt;/del&gt;&lt;/a&gt; sets the file stripe and, thus, should &quot;fix&quot; the OST filling problem. Although there may be other tests that make a similar assumption about file stripe.&lt;/p&gt;


</comment>
                            <comment id="67258" author="adilger" created="Mon, 23 Sep 2013 16:55:37 +0000"  >&lt;p&gt;Is there anything left in this bug that needs to be done, or can it be closed?&lt;/p&gt;</comment>
                            <comment id="67260" author="jamesanunez" created="Mon, 23 Sep 2013 17:03:25 +0000"  >&lt;p&gt;The patch hasn&apos;t landed yet. Last batch of autotest results never posted ... at least didn&apos;t post for over 10 days after auto test started. So, just rebased and resubmitted earlier today.&lt;/p&gt;</comment>
                            <comment id="67887" author="jamesanunez" created="Fri, 27 Sep 2013 20:55:19 +0000"  >&lt;p&gt;Patch landed for 2.5.0&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="19989">LU-3633</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvw8f:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9374</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>