<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:36:39 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-10613] replay-single tests 20c, 21, 23, 24, 25, 26, 30, 48, 53f, 53g, 62, 70b, 70c,  fails on open with &#8216; No space left on device&#8217;</title>
                <link>https://jira.whamcloud.com/browse/LU-10613</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;A variety of replay-single tests are failing due to the file system being full; calls like open and touch are failing with &#8220;No space left on device&#8221;. For each test sessions, a different set of replay-single test will fail. For example, test session at &lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/91b58a56-0a6b-11e8-a6ad-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/91b58a56-0a6b-11e8-a6ad-52540065bddc&lt;/a&gt; has tests 23, 24, 25, 26, 48, 53g, 70b, 70c fail with no space left on device, but test session &lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/b226317a-08a2-11e8-a10a-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/b226317a-08a2-11e8-a10a-52540065bddc&lt;/a&gt; has tests 20c, 21, 30, 32, 32, 33a, 48, 53c, 53f, 53g, 70b, 70c fail with the same error message.&lt;/p&gt;

&lt;p&gt;For example, looking at a failover test session for lustre-master build #3703 for ldiskfs servers where replay-single fails (&lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/91b58a56-0a6b-11e8-a6ad-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/91b58a56-0a6b-11e8-a6ad-52540065bddc&lt;/a&gt;), we see test 26 fail with the error message&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;&apos;multiop_bg_pause /mnt/lustre/f26.replay-single-1 failed&apos; 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Looking at the test_log for replay-single test_26 (and most of the other failed tests), we see that the file system is not close to being full&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;== replay-single test 26: |X| open(O_CREAT), unlink two, close one, replay, close one (test mds_cleanup_orphans) ====================================================================================================== 08:40:42 (1517820042)
CMD: onyx-49vm11 sync; sync; sync
UUID                   1K-blocks        Used   Available Use% Mounted on
lustre-MDT0000_UUID      5825660       48852     5253976   1% /mnt/lustre[MDT:0]
lustre-OST0000_UUID      1933276       30916     1781120   2% /mnt/lustre[OST:0]
lustre-OST0001_UUID      1933276       25792     1786244   1% /mnt/lustre[OST:1]
lustre-OST0002_UUID      1933276       25788     1786248   1% /mnt/lustre[OST:2]
lustre-OST0003_UUID      1933276       25788     1786248   1% /mnt/lustre[OST:3]
lustre-OST0004_UUID      1933276       25788     1786248   1% /mnt/lustre[OST:4]
lustre-OST0005_UUID      1933276       25788     1786248   1% /mnt/lustre[OST:5]
lustre-OST0006_UUID      1933276       25840     1769716   1% /mnt/lustre[OST:6]

filesystem_summary:     13532932      185700    12482072   1% /mnt/lustre

CMD: onyx-49vm5.onyx.hpdd.intel.com,onyx-49vm7,onyx-49vm8 mcreate /mnt/lustre/fsa-\$(hostname); rm /mnt/lustre/fsa-\$(hostname)
CMD: onyx-49vm5.onyx.hpdd.intel.com,onyx-49vm7,onyx-49vm8 if [ -d /mnt/lustre2 ]; then mcreate /mnt/lustre2/fsa-\$(hostname); rm /mnt/lustre2/fsa-\$(hostname); fi
CMD: onyx-49vm11 /usr/sbin/lctl --device lustre-MDT0000 notransno
CMD: onyx-49vm11 /usr/sbin/lctl --device lustre-MDT0000 readonly
CMD: onyx-49vm11 /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000
multiop /mnt/lustre/f26.replay-single-1 vO_tSc
TMPPIPE=/tmp/multiop_open_wait_pipe.976
open(O_RDWR|O_CREAT): No space left on device
replay-single test_26: @@@@@@ FAIL: multiop_bg_pause /mnt/lustre/f26.replay-single-1 failed 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:5336:error()
  = /usr/lib64/lustre/tests/replay-single.sh:613:test_26()
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In the MDS dmesg, the only hint of trouble is&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[  117.289138] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000
[  117.297983] LustreError: 1978:0:(lod_qos.c:1352:lod_alloc_specific()) can&apos;t lstripe objid [0x200097212:0xe:0x0]: have 0 want 1
[  117.511816] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  replay-single test_26: @@@@@@ FAIL: multiop_bg_pause \/mnt\/lustre\/f26.replay-single-1 failed 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This error &#8220;can&apos;t lstripe objid&#8221; make the problem look like &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10350&quot; title=&quot;ost-pools test 1n fails with &amp;#39;failed to write to /mnt/lustre/d1n.ost-pools/file: 1&amp;#39;&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10350&quot;&gt;&lt;del&gt;LU-10350&lt;/del&gt;&lt;/a&gt;, but that patch landed and sanity-dom is not run in this test group.&lt;/p&gt;

&lt;p&gt;For each test that fails due to &#8216;no space&#8217;, here is the test and the error from the test log. Several lines of test output have been removed to focus on the error/failure:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;== replay-single test 20c: check that client eviction does not affect file content =================== 18:24:30 (1517624670)
multiop /mnt/lustre/f20c.replay-single vOw_c
TMPPIPE=/tmp/multiop_open_wait_pipe.31377
open(O_RDWR|O_CREAT): No space left on device
replay-single test_20c: @@@@@@ FAIL: multiop_bg_pause /mnt/lustre/f20c.replay-single failed 

== replay-single test 21: |X| open(O_CREAT), unlink touch new, replay, close (test mds_cleanup_orphans) 
multiop /mnt/lustre/f21.replay-single vO_tSc
TMPPIPE=/tmp/multiop_open_wait_pipe.31377
open(O_RDWR|O_CREAT): No space left on device
replay-single test_21: @@@@@@ FAIL: multiop_bg_pause /mnt/lustre/f21.replay-single failed 

== replay-single test 30: open(O_CREAT) two, unlink two, replay, close two (test mds_cleanup_orphans) ====================================================================================================== 18:34:12 (1517625252)
multiop /mnt/lustre/f30.replay-single-1 vO_tSc
TMPPIPE=/tmp/multiop_open_wait_pipe.31377
open(O_RDWR|O_CREAT): No space left on device
replay-single test_30: @@@@@@ FAIL: multiop_bg_pause /mnt/lustre/f30.replay-single-1 failed 

== replay-single test 31: open(O_CREAT) two, unlink one, |X| unlink one, close two (test mds_cleanup_orphans) ====================================================================================================== 18:34:16 (1517625256)
multiop /mnt/lustre/f31.replay-single-1 vO_tSc
TMPPIPE=/tmp/multiop_open_wait_pipe.31377
open(O_RDWR|O_CREAT): No space left on device
replay-single test_31: @@@@@@ FAIL: multiop_bg_pause /mnt/lustre/f31.replay-single-1 failed 

== replay-single test 32: close() notices client eviction; close() after client eviction ============= 18:34:18 (1517625258)
multiop /mnt/lustre/f32.replay-single vO_c
TMPPIPE=/tmp/multiop_open_wait_pipe.31377
open(O_RDWR|O_CREAT): No space left on device
replay-single test_32: @@@@@@ FAIL: multiop_bg_pause /mnt/lustre/f32.replay-single failed 

== replay-single test 33a: fid seq shouldn&apos;t be reused after abort recovery ========================== 18:34:20 (1517625260)
open(/mnt/lustre/f33a.replay-single-0) error: No space left on device
total: 0 open/close in 0.00 seconds: 0.00 ops/second
replay-single test_33a: @@@@@@ FAIL: createmany create /mnt/lustre/f33a.replay-single failed 

== replay-single test 48: MDS-&amp;gt;OSC failure during precreate cleanup (2824) =========================== 19:12:20 (1517627540)
Started lustre-MDT0000
CMD: onyx-48vm10 lctl set_param fail_loc=0x80000216
fail_loc=0x80000216
open(/mnt/lustre/f48.replay-single20) error: No space left on device
total: 0 open/close in 0.00 seconds: 0.00 ops/second
replay-single test_48: @@@@@@ FAIL: createmany recraete /mnt/lustre/f48.replay-single failed 

== replay-single test 53c: |X| open request and close request while two MDC requests in flight ======= 19:17:23 (1517627843)
CMD: onyx-48vm11 lctl set_param fail_loc=0x80000107
open(O_RDWR|O_CREAT): No space left on device
fail_loc=0x80000107
CMD: onyx-48vm11 lctl set_param fail_loc=0x80000115
fail_loc=0x80000115
/usr/lib64/lustre/tests/replay-single.sh: line 1293: kill: (3180) - No such process
replay-single test_53c: @@@@@@ FAIL: close_pid doesn&apos;t exist 

== replay-single test 53f: |X| open reply and close reply while two MDC requests in flight =========== 19:20:23 (1517628023)
CMD: onyx-48vm11 lctl set_param fail_loc=0x119
open(O_RDWR|O_CREAT): No space left on device
fail_loc=0x119
CMD: onyx-48vm11 lctl set_param fail_loc=0x8000013b
fail_loc=0x8000013b
/usr/lib64/lustre/tests/replay-single.sh: line 1398: kill: (6748) - No such process
replay-single test_53f: @@@@@@ FAIL: close_pid doesn&apos;t exist 

== replay-single test 53g: |X| drop open reply and close request while close and open are both in flight ====================================================================================================== 19:20:31 (1517628031)
CMD: onyx-48vm11 lctl set_param fail_loc=0x119
open(O_RDWR|O_CREAT): No space left on device
fail_loc=0x119
CMD: onyx-48vm11 lctl set_param fail_loc=0x80000115
fail_loc=0x80000115
/usr/lib64/lustre/tests/replay-single.sh: line 1436: kill: (7302) - No such process
CMD: onyx-48vm11 lctl set_param fail_loc=0
fail_loc=0
replay-single test_53g: @@@@@@ FAIL: close_pid doesn&apos;t exist 

== replay-single test 70b: dbench 1mdts recovery; 3 clients ========================================== 20:06:47 (1517630807)
onyx-48vm1: [6141] open ./clients/client0/~dmtmp/EXCEL/BEED0000 failed for handle 11169 (No space left on device)
onyx-48vm1: (6142) ERROR: handle 11169 was not found
onyx-48vm1: Child failed with status 1
onyx-48vm4: [6141] open ./clients/client0/~dmtmp/EXCEL/BEED0000 failed for handle 11169 (No space left on device)
onyx-48vm4: (6142) ERROR: handle 11169 was not found
onyx-48vm4: Child failed with status 1
onyx-48vm4: dbench: no process found
onyx-48vm1: dbench: no process found
onyx-48vm3: [6320] open ./clients/client0/~dmtmp/WORDPRO/LWPSAV0.TMP failed for handle 11188 (No space left on device)
onyx-48vm3: (6321) ERROR: handle 11188 was not found
onyx-48vm3: Child failed with status 1
onyx-48vm3: dbench: no process found
onyx-48vm3: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 10 sec
onyx-48vm4: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 10 sec
onyx-48vm1: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 10 sec
CMD: onyx-48vm1,onyx-48vm3,onyx-48vm4 killall -0 dbench
onyx-48vm4: dbench: no process found
onyx-48vm3: dbench: no process found
onyx-48vm1: dbench: no process found
replay-single test_70b: @@@@@@ FAIL: dbench stopped on some of onyx-48vm1,onyx-48vm3,onyx-48vm4! 

CMD: onyx-48vm1,onyx-48vm3,onyx-48vm4 killall  dbench
onyx-48vm3: dbench: no process found
onyx-48vm4: dbench: no process found
onyx-48vm1: dbench: no process found
replay-single test_70b: @@@@@@ FAIL: rundbench load on onyx-48vm1,onyx-48vm3,onyx-48vm4 failed! 

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Logs for this failure can be found at&lt;br/&gt;
lustre-master build # 3703 el7 - &lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/91b58a56-0a6b-11e8-a6ad-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/91b58a56-0a6b-11e8-a6ad-52540065bddc&lt;/a&gt;&lt;br/&gt;
lustre-master build # 3703 el7 servers/sles12sp3 clients - &lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/b226317a-08a2-11e8-a10a-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/b226317a-08a2-11e8-a10a-52540065bddc&lt;/a&gt;&lt;br/&gt;
lustre-master build # 3703 sles12sp2 - &lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/4d1259a8-06d6-11e8-a7cd-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/4d1259a8-06d6-11e8-a7cd-52540065bddc&lt;/a&gt;&lt;br/&gt;
lustre-master build # 3702 el7 servers/sles12sp3 clients - &lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/cee4eabe-04e1-11e8-bd00-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/cee4eabe-04e1-11e8-bd00-52540065bddc&lt;/a&gt;&lt;/p&gt;

</description>
                <environment></environment>
        <key id="50602">LU-10613</key>
            <summary>replay-single tests 20c, 21, 23, 24, 25, 26, 30, 48, 53f, 53g, 62, 70b, 70c,  fails on open with &#8216; No space left on device&#8217;</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="jamesanunez">James Nunez</reporter>
                        <labels>
                    </labels>
                <created>Tue, 6 Feb 2018 14:24:05 +0000</created>
                <updated>Sun, 7 Jun 2020 21:36:19 +0000</updated>
                                            <version>Lustre 2.11.0</version>
                    <version>Lustre 2.10.4</version>
                    <version>Lustre 2.10.5</version>
                    <version>Lustre 2.13.0</version>
                    <version>Lustre 2.10.7</version>
                    <version>Lustre 2.12.1</version>
                    <version>Lustre 2.12.3</version>
                    <version>Lustre 2.12.4</version>
                    <version>Lustre 2.12.5</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>1</watches>
                                                                            <comments>
                            <comment id="243862" author="jamesanunez" created="Wed, 13 Mar 2019 18:23:08 +0000"  >&lt;p&gt;We are still seeing a few replay-single tests fail with &apos;No space left on device&apos;. For example, for 2.10.7 RC1 failover test session with logs at &lt;a href=&quot;https://testing.whamcloud.com/test_sets/01b2e4f4-43fd-11e9-9720-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/01b2e4f4-43fd-11e9-9720-52540065bddc&lt;/a&gt; . &lt;/p&gt;</comment>
                            <comment id="255503" author="jamesanunez" created="Fri, 27 Sep 2019 19:47:55 +0000"  >&lt;p&gt;We are still seeing these issues for master and 2.12.3.&lt;/p&gt;

&lt;p&gt;Looking at &lt;a href=&quot;https://testing.whamcloud.com/test_sets/56982af2-dfec-11e9-a0ba-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/56982af2-dfec-11e9-a0ba-52540065bddc&lt;/a&gt;, we can see that it is unlikely that the file system is full since we print out the space usage earlier in test 70b&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;UUID                   1K-blocks        Used   Available Use% Mounted on
lustre-MDT0000_UUID      5825660       48764     5254064   1% /mnt/lustre[MDT:0]
lustre-OST0000_UUID      1933276      195836     1611176  11% /mnt/lustre[OST:0]
lustre-OST0001_UUID      1933276       25796     1786240   2% /mnt/lustre[OST:1]
lustre-OST0002_UUID      1933276       25796     1786240   2% /mnt/lustre[OST:2]
lustre-OST0003_UUID      1933276       25796     1786240   2% /mnt/lustre[OST:3]
lustre-OST0004_UUID      1933276       25796     1786240   2% /mnt/lustre[OST:4]
lustre-OST0005_UUID      1933276       25796     1786240   2% /mnt/lustre[OST:5]
lustre-OST0006_UUID      1933276       25796     1786240   2% /mnt/lustre[OST:6]

filesystem_summary:     13532932      350612    12328616   3% /mnt/lustre

CMD: trevis-40vm6.trevis.whamcloud.com,trevis-40vm8,trevis-40vm9 mcreate /mnt/lustre/fsa-\$(hostname); rm /mnt/lustre/fsa-\$(hostname)
CMD: trevis-40vm6.trevis.whamcloud.com,trevis-40vm8,trevis-40vm9 if [ -d /mnt/lustre2 ]; then mcreate /mnt/lustre2/fsa-\$(hostname); rm /mnt/lustre2/fsa-\$(hostname); fi
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In this case, no other test fails with &apos;no space left on device&apos; except for test 70b. So, this may be a different issue?&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="24004">LU-4846</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="50593">LU-10609</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="50593">LU-10609</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="50601">LU-10612</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzs9j:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>