<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:55:50 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-12810] replay-single test 20b fails with &apos;after 180548 &gt; before N + 50&apos;</title>
                <link>https://jira.whamcloud.com/browse/LU-12810</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;replay-single test_20b fails with for ldiskfs with errors similar to &apos;after 180548 &amp;gt; before 25792 + 50&apos;. &lt;/p&gt;

&lt;p&gt;replay-single test_20b fails with for ZFS with errors similar to &apos;after 21504 &amp;gt; before 3072 + 2048&apos;. &lt;/p&gt;

&lt;p&gt;We find example of this error in master and b2_12 since at least June 2019 for the failover test group with DNE configured.&lt;/p&gt;

&lt;p&gt;In all of these cases, we see recovery complete and the test tries to sync and test if space is freed three times and, if not, exits with an error.  Looking at &lt;a href=&quot;https://testing.whamcloud.com/test_sets/a5f46eb0-d40e-11e9-97d5-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/a5f46eb0-d40e-11e9-97d5-52540065bddc&lt;/a&gt;, we see the sync/test in the client test_log:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;trevis-40vm8: *.lustre-MDT0000.recovery_status status: COMPLETE
Waiting for local destroys to complete
CMD: trevis-40vm8 lctl set_param -n os[cd]*.*MDT*.force_sync=1
CMD: trevis-40vm6 lctl set_param -n osd*.*OS*.force_sync=1
before 25800, after 1784144
CMD: trevis-40vm8 lctl set_param -n os[cd]*.*MDT*.force_sync=1
CMD: trevis-40vm6 lctl set_param -n osd*.*OS*.force_sync=1
before 25800, after 1784144
CMD: trevis-40vm8 lctl set_param -n os[cd]*.*MDT*.force_sync=1
CMD: trevis-40vm6 lctl set_param -n osd*.*OS*.force_sync=1
before 25800, after 1784144
 replay-single test_20b: @@@@@@ FAIL: after 1784144 &amp;gt; before 25800 + 50 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:5829:error()
  = /usr/lib64/lustre/tests/replay-single.sh:513:test_20b()
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In the client test logs for all of these failures, we see a &#8216;Transport endpoint&#8217; error at the beginning of the test when trying to set force_sync. For example, for &lt;a href=&quot;https://testing.whamcloud.com/test_sets/b73e377a-d349-11e9-9fc9-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/b73e377a-d349-11e9-9fc9-52540065bddc&lt;/a&gt;, we see&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;== replay-single test 20b: write, unlink, eviction, replay (test mds_cleanup_orphans) ================ 19:21:45 (1568056905)
CMD: trevis-35vm11 lctl set_param -n os[cd]*.*MDT*.force_sync=1
trevis-35vm11: error: set_param: setting /sys/fs/lustre/osc/lustre-OST0000-osc-MDT0000/force_sync=1: Transport endpoint is not connected
trevis-35vm11: error: set_param: setting /sys/fs/lustre/osc/lustre-OST0001-osc-MDT0000/force_sync=1: Transport endpoint is not connected
trevis-35vm11: error: set_param: setting /sys/fs/lustre/osc/lustre-OST0002-osc-MDT0000/force_sync=1: Transport endpoint is not connected
trevis-35vm11: error: set_param: setting /sys/fs/lustre/osc/lustre-OST0003-osc-MDT0000/force_sync=1: Transport endpoint is not connected
trevis-35vm11: error: set_param: setting /sys/fs/lustre/osc/lustre-OST0004-osc-MDT0000/force_sync=1: Transport endpoint is not connected
trevis-35vm11: error: set_param: setting /sys/fs/lustre/osc/lustre-OST0005-osc-MDT0000/force_sync=1: Transport endpoint is not connected
CMD: trevis-35vm10 lctl set_param -n osd*.*OS*.force_sync=1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In one case, we see the &#8216;Transport endpoint&#8217; error during the final syncs before calling error():&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Waiting for local destroys to complete
CMD: trevis-23vm12 lctl set_param -n os[cd]*.*MDT*.force_sync=1
trevis-23vm12: error: set_param: setting /sys/fs/lustre/osc/lustre-OST0000-osc-MDT0000/force_sync=1: Transport endpoint is not connected
trevis-23vm12: error: set_param: setting /sys/fs/lustre/osc/lustre-OST0001-osc-MDT0000/force_sync=1: Transport endpoint is not connected
trevis-23vm12: error: set_param: setting /sys/fs/lustre/osc/lustre-OST0002-osc-MDT0000/force_sync=1: Transport endpoint is not connected
trevis-23vm12: error: set_param: setting /sys/fs/lustre/osc/lustre-OST0003-osc-MDT0000/force_sync=1: Transport endpoint is not connected
trevis-23vm12: error: set_param: setting /sys/fs/lustre/osc/lustre-OST0004-osc-MDT0000/force_sync=1: Transport endpoint is not connected
trevis-23vm12: error: set_param: setting /sys/fs/lustre/osc/lustre-OST0005-osc-MDT0000/force_sync=1: Transport endpoint is not connected
CMD: trevis-23vm10 lctl set_param -n osd*.*OS*.force_sync=1
before 25832, after 180548
CMD: trevis-23vm12 lctl set_param -n os[cd]*.*MDT*.force_sync=1
CMD: trevis-23vm10 lctl set_param -n osd*.*OS*.force_sync=1
before 25832, after 180548
CMD: trevis-23vm12 lctl set_param -n os[cd]*.*MDT*.force_sync=1
CMD: trevis-23vm10 lctl set_param -n osd*.*OS*.force_sync=1
before 25832, after 180548
 replay-single test_20b: @@@@@@ FAIL: after 180548 &amp;gt; before 25832 + 50 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It seems like the forced syncs may not be taking place or not as many of them as we are trying are taking place.&lt;/p&gt;

&lt;p&gt;Logs for more recent failures are at&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sets/56982af2-dfec-11e9-a0ba-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/56982af2-dfec-11e9-a0ba-52540065bddc&lt;/a&gt;&lt;/p&gt;</description>
                <environment>DNE</environment>
        <key id="57009">LU-12810</key>
            <summary>replay-single test 20b fails with &apos;after 180548 &gt; before N + 50&apos;</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="jamesanunez">James Nunez</reporter>
                        <labels>
                    </labels>
                <created>Thu, 26 Sep 2019 21:04:26 +0000</created>
                <updated>Wed, 14 Apr 2021 16:14:13 +0000</updated>
                                            <version>Lustre 2.13.0</version>
                    <version>Lustre 2.12.3</version>
                    <version>Lustre 2.12.4</version>
                    <version>Lustre 2.12.5</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>1</watches>
                                                                                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00ndb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>