<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:44:56 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-11560] recovery-small test 134 fails with &#8216;rm failed&#8217;</title>
                <link>https://jira.whamcloud.com/browse/LU-11560</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;recovery-small test_134 is failing to remove and or move files. Looking at the client test_log for  &lt;a href=&quot;https://testing.whamcloud.com/test_sets/18761bbc-d05c-11e8-82f2-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/18761bbc-d05c-11e8-82f2-52540065bddc&lt;/a&gt; , we see errors on remove and copy&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Started lustre-MDT0000
rm: cannot remove &apos;/mnt/lustre/d134.recovery-small/1/f134.recovery-small&apos;: Input/output error
onyx-39vm5: error: invalid path &apos;/mnt/lustre&apos;: Input/output error
onyx-39vm7: error: invalid path &apos;/mnt/lustre&apos;: Input/output error
onyx-39vm7: mv: cannot stat &apos;/mnt/lustre/d134.recovery-small/2/f134.recovery-small_2&apos;: Input/output error
onyx-39vm8: error: invalid path &apos;/mnt/lustre&apos;: Input/output error
CMD: onyx-39vm5,onyx-39vm7,onyx-39vm8 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/mpi/gcc/openmpi/bin:/sbin:/usr/sbin:/usr/local/sbin:/root/bin:/usr/local/bin:/usr/bin:/bin:/usr/games:/usr/sbin:/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config bash rpc.sh wait_import_state_mount \(FULL\|IDLE\) mdc.lustre-MDT0000-mdc-*.mds_server_uuid 
onyx-39vm7: onyx-39vm7: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid
onyx-39vm8: onyx-39vm8: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid
onyx-39vm5: onyx-39vm5: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid
onyx-39vm8: CMD: onyx-39vm8 lctl get_param -n at_max
onyx-39vm8: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
onyx-39vm7: CMD: onyx-39vm7 lctl get_param -n at_max
onyx-39vm7: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
onyx-39vm5: CMD: onyx-39vm5 lctl get_param -n at_max
onyx-39vm5: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
 recovery-small test_134: @@@@@@ FAIL: rm failed 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;On client1 (vm5) console log, we see the client trying to remove the file and get an error&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[181353.732914] Lustre: DEBUG MARKER: rm /mnt/lustre/d134.recovery-small/1/f134.recovery-small
[181364.712997] Lustre: 3269:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1539556867/real 1539556867]  req@ffff88007862b3c0 x1614152568603904/t0(0) o400-&amp;gt;MGC10.2.8.116@tcp@10.2.8.117@tcp:26/25 lens 224/224 e 0 to 1 dl 1539556874 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[181364.713002] Lustre: 3269:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 12 previous similar messages
[181364.713034] LustreError: 166-1: MGC10.2.8.116@tcp: Connection to MGS (at 10.2.8.117@tcp) was lost; in progress operations using this service will fail
[181364.713035] LustreError: Skipped 1 previous similar message
 [181421.136794] LustreError: 26336:0:(file.c:4383:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000007:0x1:0x0] error: rc = -5
[181421.136798] LustreError: 26336:0:(file.c:4383:ll_inode_revalidate_fini()) Skipped 26 previous similar messages
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;On another client&#8217;s (vm7) console log, we see the client trying to move a file and get the same error&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[ 6695.690885] Lustre: DEBUG MARKER: mv /mnt/lustre/d134.recovery-small/2/f134.recovery-small /mnt/lustre/d134.recovery-small/2/f134.recovery-small_2
[ 6706.654708] Lustre: 1779:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1539556867/real 1539556867]  req@ffff88006e3946c0 x1614335408916880/t0(0) o400-&amp;gt;MGC10.2.8.116@tcp@10.2.8.117@tcp:26/25 lens 224/224 e 0 to 1 dl 1539556874 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[ 6706.654741] Lustre: 1779:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages
[ 6706.654775] LustreError: 166-1: MGC10.2.8.116@tcp: Connection to MGS (at 10.2.8.117@tcp) was lost; in progress operations using this service will fail
[ 6706.654776] LustreError: Skipped 1 previous similar message
[ 6709.002741] Lustre: lustre-MDT0000-mdc-ffff880070293800: Connection to lustre-MDT0000 (at 10.2.8.117@tcp) was lost; in progress operations using this service will wait for recovery to complete
[ 6709.002751] Lustre: Skipped 2 previous similar messages
[ 6774.109416] LustreError: 167-0: lustre-MDT0000-mdc-ffff880070293800: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
[ 6774.109423] LustreError: Skipped 1 previous similar message
[ 6774.109684] LustreError: 23597:0:(file.c:4383:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000007:0x1:0x0] error: rc = -5
[ 6774.115931] Lustre: Evicted from MGS (at 10.2.8.116@tcp) after server handle changed from 0xc36de71764adb58 to 0xc256e14d64d7ddc7
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We see recovery-small test 134 fail frequently with this error message, but the client logs do not always have the &#8216;revalidate FID&#8217; error. Thus, I&#8217;m not sure the failures are caused by the same thing. Here&#8217;s an example of recovery-small test 134 failing without the &#8216;revalidate FID&#8217; error&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sets/1067f8b8-d6c0-11e8-b589-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/1067f8b8-d6c0-11e8-b589-52540065bddc&lt;/a&gt;&lt;/p&gt;</description>
                <environment></environment>
        <key id="53799">LU-11560</key>
            <summary>recovery-small test 134 fails with &#8216;rm failed&#8217;</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="jamesanunez">James Nunez</reporter>
                        <labels>
                    </labels>
                <created>Tue, 23 Oct 2018 16:48:25 +0000</created>
                <updated>Thu, 8 Oct 2020 03:03:28 +0000</updated>
                                            <version>Lustre 2.12.0</version>
                    <version>Lustre 2.12.1</version>
                    <version>Lustre 2.12.3</version>
                    <version>Lustre 2.12.4</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>2</watches>
                                                                            <comments>
                            <comment id="245496" author="jamesanunez" created="Tue, 9 Apr 2019 22:45:15 +0000"  >&lt;p&gt;We&apos;ve seen recovery-small test 134 fail with  &apos;rm failed&apos;  for DNE testing at a high rate since 8 April 2019. Some recent failures are at:&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sets/29425f24-5a99-11e9-9720-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/29425f24-5a99-11e9-9720-52540065bddc&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sets/4e97765e-5a78-11e9-9646-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/4e97765e-5a78-11e9-9646-52540065bddc&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sets/56a401c4-5a4a-11e9-b98a-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/56a401c4-5a4a-11e9-b98a-52540065bddc&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="245786" author="pfarrell" created="Mon, 15 Apr 2019 18:45:20 +0000"  >&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;21070.939051&amp;#93;&lt;/span&gt; LustreError: 18395:0:(lod_dev.c:434:lod_sub_recovery_thread()) lustre-MDT0002-osp-MDT0000 get update log failed: rc = -22&lt;/p&gt;

&lt;p&gt;That&apos;s the same error as in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12175&quot; title=&quot;sanity test 208 fails with &amp;#39;lease broken over recovery&amp;#39;&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12175&quot;&gt;LU-12175&lt;/a&gt;, I suspect the same underlying cause...&lt;/p&gt;</comment>
                            <comment id="246048" author="pfarrell" created="Thu, 18 Apr 2019 22:38:37 +0000"  >&lt;p&gt;Recent spate of failures was related to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12175&quot; title=&quot;sanity test 208 fails with &amp;#39;lease broken over recovery&amp;#39;&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12175&quot;&gt;LU-12175&lt;/a&gt;, but the original report is still A) pretty recent, and B) predates the landing of the problematic patch (&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11636&quot; title=&quot;t-f test_mkdir() does not support interop with non DNEII servers&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11636&quot;&gt;&lt;del&gt;LU-11636&lt;/del&gt;&lt;/a&gt;), so I think let&apos;s leave this one open...&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="54312">LU-11789</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="55455">LU-12210</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i004zz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>