<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:31:05 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-9991] sanity-hsm test 31c fails with &apos;copytools failed to stop&apos; </title>
                <link>https://jira.whamcloud.com/browse/LU-9991</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;sanity-hsm test_31c fails to archive the newly created file:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Update not seen after 200s: wanted &apos;SUCCEED&apos; got &apos;STARTED&apos;
 sanity-hsm test_31c: @@@@@@ FAIL: request on 0x200000402:0x38:0x0 is not SUCCEED on mds1 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and then we can shutdown the copytool:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;copytools still running on trevis-51vm6
CMD: trevis-51vm6 pgrep -x lhsmtool_posix
trevis-51vm6: 6036
copytools still running on trevis-51vm6
CMD: trevis-51vm6 echo 1 &amp;gt;/proc/sys/kernel/sysrq ;  echo t &amp;gt;/proc/sysrq-trigger
copytools failed to stop in 200s
 sanity-hsm test_31c: @@@@@@ FAIL: copytools failed to stop 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;From the copytool logs, the archive was progressing and it doesn&#8217;t look like it exceeded 200 seconds unless the copytool was hung:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;1505294662.538461 lhsmtool_posix[6038]: &apos;[0x200000402:0x38:0x0]&apos; action ARCHIVE reclen 72, cookie=0x59b8f942
1505294662.539849 lhsmtool_posix[6038]: processing file &apos;d31c.sanity-hsm/f31c.sanity-hsm&apos;
1505294662.555950 lhsmtool_posix[6038]: archiving &apos;/mnt/lustre2/.lustre/fid/0x200000402:0x38:0x0&apos; to &apos;/tmp/arc1/shsm/0038/0000/0402/0000/0002/0000/0x200000402:0x38:0x0_tmp&apos;
1505294662.556901 lhsmtool_posix[6038]: saving stripe info of &apos;/mnt/lustre2/.lustre/fid/0x200000402:0x38:0x0&apos; in /tmp/arc1/shsm/0038/0000/0402/0000/0002/0000/0x200000402:0x38:0x0_tmp.lov
1505294662.558302 lhsmtool_posix[6038]: start copy of 34603008 bytes from &apos;/mnt/lustre2/.lustre/fid/0x200000402:0x38:0x0&apos; to &apos;/tmp/arc1/shsm/0038/0000/0402/0000/0002/0000/0x200000402:0x38:0x0_tmp&apos;
1505294692.836074 lhsmtool_posix[6038]: %90 
1505294692.838435 lhsmtool_posix[6038]: bandwith control: 1048576B/s excess=1048576 sleep for 1.000000000s
1505294695.842294 lhsmtool_posix[6038]: copied 34603008 bytes in 33.285317 seconds
1505294695.896192 lhsmtool_posix[6038]: data archiving for &apos;/mnt/lustre2/.lustre/fid/0x200000402:0x38:0x0&apos; to &apos;/tmp/arc1/shsm/0038/0000/0402/0000/0002/0000/0x200000402:0x38:0x0_tmp&apos; done
1505294695.896379 lhsmtool_posix[6038]: attr file for &apos;/mnt/lustre2/.lustre/fid/0x200000402:0x38:0x0&apos; saved to archive &apos;/tmp/arc1/shsm/0038/0000/0402/0000/0002/0000/0x200000402:0x38:0x0_tmp&apos;
1505294695.897327 lhsmtool_posix[6038]: fsetxattr of &apos;trusted.hsm&apos; on &apos;/tmp/arc1/shsm/0038/0000/0402/0000/0002/0000/0x200000402:0x38:0x0_tmp&apos; rc=0 (Success)
1505294695.897360 lhsmtool_posix[6038]: fsetxattr of &apos;trusted.version&apos; on &apos;/tmp/arc1/shsm/0038/0000/0402/0000/0002/0000/0x200000402:0x38:0x0_tmp&apos; rc=0 (Success)
1505294695.897402 lhsmtool_posix[6038]: fsetxattr of &apos;trusted.link&apos; on &apos;/tmp/arc1/shsm/0038/0000/0402/0000/0002/0000/0x200000402:0x38:0x0_tmp&apos; rc=0 (Success)
1505294695.897432 lhsmtool_posix[6038]: fsetxattr of &apos;trusted.lov&apos; on &apos;/tmp/arc1/shsm/0038/0000/0402/0000/0002/0000/0x200000402:0x38:0x0_tmp&apos; rc=0 (Success)
1505294695.897476 lhsmtool_posix[6038]: fsetxattr of &apos;trusted.lma&apos; on &apos;/tmp/arc1/shsm/0038/0000/0402/0000/0002/0000/0x200000402:0x38:0x0_tmp&apos; rc=0 (Success)
1505294695.897515 lhsmtool_posix[6038]: fsetxattr of &apos;lustre.lov&apos; on &apos;/tmp/arc1/shsm/0038/0000/0402/0000/0002/0000/0x200000402:0x38:0x0_tmp&apos; rc=-1 (Operation not supported)
1505294695.897531 lhsmtool_posix[6038]: xattr file for &apos;/mnt/lustre2/.lustre/fid/0x200000402:0x38:0x0&apos; saved to archive &apos;/tmp/arc1/shsm/0038/0000/0402/0000/0002/0000/0x200000402:0x38:0x0_tmp&apos;
1505294695.898946 lhsmtool_posix[6038]: symlink &apos;/tmp/arc1/shsm/shadow/d31c.sanity-hsm/f31c.sanity-hsm&apos; to &apos;../../0038/0000/0402/0000/0002/0000/0x200000402:0x38:0x0&apos; done
exiting: Terminated
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;


&lt;p&gt;So far, we&#8217;ve only see this &#8216;copytool failed to stop&#8217; error for one patch test session. Logs for this failure are at&lt;br/&gt;
&lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/3e2f0620-9872-11e7-b775-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/3e2f0620-9872-11e7-b775-5254006e85c2&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We&#8217;ve seen the fail to archive a file several times for this test and the failure has at least once been attributed to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7988&quot; title=&quot;HSM: high lock contention for cdt_llog_lock&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7988&quot;&gt;&lt;del&gt;LU-7988&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</description>
                <environment></environment>
        <key id="48311">LU-9991</key>
            <summary>sanity-hsm test 31c fails with &apos;copytools failed to stop&apos; </summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="6" iconUrl="https://jira.whamcloud.com/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="jamesanunez">James Nunez</reporter>
                        <labels>
                    </labels>
                <created>Thu, 14 Sep 2017 15:20:11 +0000</created>
                <updated>Fri, 15 Dec 2017 18:32:19 +0000</updated>
                            <resolved>Fri, 15 Dec 2017 18:31:54 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>2</watches>
                                                                            <comments>
                            <comment id="208446" author="bfaccini" created="Thu, 14 Sep 2017 23:02:14 +0000"  >&lt;p&gt;James,&lt;br/&gt;
In the auto-test failure logs you have reported, we can see that the copy tool&apos;s threads are stuck trying to close the Lustre file :&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;09:31:59:[12891.943987] lhsmtool_posix  S 000000000000f908     0  6036      1 0x00000080
09:31:59:[12891.945640]  ffff88006c5b3b30 0000000000000086 ffff88005e4d3f40 ffff88006c5b3fd8
09:31:59:[12891.947482]  ffff88006c5b3fd8 ffff88006c5b3fd8 ffff88005e4d3f40 ffffffff81e87180
09:31:59:[12891.949277]  ffff88006c5b3b60 0000000100c07565 ffffffff81e87180 000000000000f908
09:31:59:[12891.950973] Call Trace:
09:31:59:[12891.952275]  [&amp;lt;ffffffff816a94c9&amp;gt;] schedule+0x29/0x70
09:31:59:[12891.953752]  [&amp;lt;ffffffff816a6f14&amp;gt;] schedule_timeout+0x174/0x2c0
09:31:59:[12891.955315]  [&amp;lt;ffffffff81098b20&amp;gt;] ? internal_add_timer+0x70/0x70
09:31:59:[12891.956858]  [&amp;lt;ffffffffc09adec0&amp;gt;] ptlrpc_set_wait+0x4c0/0x910 [ptlrpc]
09:31:59:[12891.958477]  [&amp;lt;ffffffff810c4810&amp;gt;] ? wake_up_state+0x20/0x20
09:31:59:[12891.959988]  [&amp;lt;ffffffffc09ae38d&amp;gt;] ptlrpc_queue_wait+0x7d/0x220 [ptlrpc]
09:31:59:[12891.961644]  [&amp;lt;ffffffffc0ac75cc&amp;gt;] mdc_close+0x1bc/0x8a0 [mdc]
09:31:59:[12891.963168]  [&amp;lt;ffffffffc0b6f426&amp;gt;] lmv_close+0x216/0x540 [lmv]
09:31:59:[12891.964760]  [&amp;lt;ffffffffc0bb4b48&amp;gt;] ll_close_inode_openhandle+0x2f8/0xe20 [lustre]
09:31:59:[12891.966402]  [&amp;lt;ffffffffc0bb8270&amp;gt;] ll_md_real_close+0xf0/0x1e0 [lustre]
09:31:59:[12891.968043]  [&amp;lt;ffffffffc0bb899d&amp;gt;] ll_file_release+0x63d/0xa70 [lustre]
09:31:59:[12891.969592]  [&amp;lt;ffffffffc0ba424f&amp;gt;] ll_dir_release+0x2f/0xd0 [lustre]
09:31:59:[12891.971179]  [&amp;lt;ffffffff81202fb9&amp;gt;] __fput+0xe9/0x260
09:31:59:[12891.972581]  [&amp;lt;ffffffff8120321e&amp;gt;] ____fput+0xe/0x10
09:31:59:[12891.974049]  [&amp;lt;ffffffff810ad247&amp;gt;] task_work_run+0xa7/0xf0
09:31:59:[12891.975494]  [&amp;lt;ffffffff8102ab62&amp;gt;] do_notify_resume+0x92/0xb0
09:31:59:[12891.977059]  [&amp;lt;ffffffff816b527d&amp;gt;] int_signal+0x12/0x17
09:31:59:[12891.978503] lhsmtool_posix  S 000000000000f908     0  6038      1 0x00000080
09:31:59:[12891.980173]  ffff88005d3b3b50 0000000000000086 ffff880063d1bf40 ffff88005d3b3fd8
09:31:59:[12891.981897]  ffff88005d3b3fd8 ffff88005d3b3fd8 ffff880063d1bf40 ffff88007c100000
09:31:59:[12891.983599]  ffff88005d3b3b80 0000000100c04b03 ffff88007c100000 000000000000f908
09:31:59:[12891.985316] Call Trace:
09:31:59:[12891.986518]  [&amp;lt;ffffffff816a94c9&amp;gt;] schedule+0x29/0x70
09:31:59:[12891.987982]  [&amp;lt;ffffffff816a6f14&amp;gt;] schedule_timeout+0x174/0x2c0
09:31:59:[12891.989450]  [&amp;lt;ffffffff81098b20&amp;gt;] ? internal_add_timer+0x70/0x70
09:31:59:[12891.991030]  [&amp;lt;ffffffffc09adec0&amp;gt;] ptlrpc_set_wait+0x4c0/0x910 [ptlrpc]
09:31:59:[12891.992610]  [&amp;lt;ffffffff810c4810&amp;gt;] ? wake_up_state+0x20/0x20
09:31:59:[12891.994084]  [&amp;lt;ffffffffc09ae38d&amp;gt;] ptlrpc_queue_wait+0x7d/0x220 [ptlrpc]
09:31:59:[12891.995683]  [&amp;lt;ffffffffc0ac75cc&amp;gt;] mdc_close+0x1bc/0x8a0 [mdc]
09:31:59:[12891.997162]  [&amp;lt;ffffffffc0b6f426&amp;gt;] lmv_close+0x216/0x540 [lmv]
09:31:59:[12891.998691]  [&amp;lt;ffffffffc0bb4b48&amp;gt;] ll_close_inode_openhandle+0x2f8/0xe20 [lustre]
09:31:59:[12892.000295]  [&amp;lt;ffffffffc0bb8270&amp;gt;] ll_md_real_close+0xf0/0x1e0 [lustre]
09:31:59:[12892.001882]  [&amp;lt;ffffffffc0bb899d&amp;gt;] ll_file_release+0x63d/0xa70 [lustre]
09:31:59:[12892.003395]  [&amp;lt;ffffffff81202fb9&amp;gt;] __fput+0xe9/0x260
09:31:59:[12892.004846]  [&amp;lt;ffffffff8120321e&amp;gt;] ____fput+0xe/0x10
09:31:59:[12892.006206]  [&amp;lt;ffffffff810ad247&amp;gt;] task_work_run+0xa7/0xf0
09:31:59:[12892.007663]  [&amp;lt;ffffffff8102ab62&amp;gt;] do_notify_resume+0x92/0xb0
09:31:59:[12892.009093]  [&amp;lt;ffffffff816b527d&amp;gt;] int_signal+0x12/0x17
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;this explains why the Archive operation is still in Started state and not Succeed due to end of archiving could not be sent by copytool and then why they can&apos;t be killed/stopped during sub-test exit.&lt;/p&gt;

&lt;p&gt;And on the MDS/MDT side we can find the following error message after the archive operation has been started :&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;09:25:01:[12442.007026] WARNING: Pool &apos;lustre-mdt1&apos; has encountered an uncorrectable I/O failure and has been suspended.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;so my guess is that the problem is on the ZFS/media side.&lt;/p&gt;</comment>
                            <comment id="216419" author="jhammond" created="Fri, 15 Dec 2017 18:31:31 +0000"  >&lt;p&gt;Closing as a duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9845&quot; title=&quot;ost-pools test_22 hangs with &#8216;WARNING: Pool &amp;#39;lustre-mdt1&amp;#39; has encountered an uncorrectable I/O failure and has been suspended.&#8217;&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9845&quot;&gt;&lt;del&gt;LU-9845&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                        <issuelink>
            <issuekey id="47700">LU-9845</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzk73:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>