<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:39:05 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4035] sanity-hsm test_58 failure:  &apos;truncate 3158 does not trig restore, state = &apos;</title>
                <link>https://jira.whamcloud.com/browse/LU-4035</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;test results at: &lt;a href=&quot;https://maloo.whamcloud.com/test_sessions/d9bd658c-2a2b-11e3-8527-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sessions/d9bd658c-2a2b-11e3-8527-52540035b04c&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From the client test_log:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;== sanity-hsm test 58: Truncate a released file will trigger restore == 14:49:23 (1380577763)
pdsh@c10: c07: ssh exited with exit code 1
Purging archive on c07
Starting copytool agt1 on c07
truncate up from 3158 to 6316
/lustre/scratch/d0.sanity-hsm/d58/f.sanity-hsm.58: (0x0000000b) exists dirty archived, archive_id:2
truncate down from 3158 to 1579
/lustre/scratch/d0.sanity-hsm/d58/f.sanity-hsm.58: (0x0000000b) exists dirty archived, archive_id:2
truncate to 0
/lustre/scratch/d0.sanity-hsm/d58/f.sanity-hsm.58: (0x0000000b) exists dirty archived, archive_id:2
 sanity-hsm test_58: @@@@@@ FAIL: truncate 3158 does not trig restore, state =  
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4264:error_noexit()
  = /usr/lib64/lustre/tests/test-framework.sh:4291:error()
  = /usr/lib64/lustre/tests/sanity-hsm.sh:2158:truncate_released_file()
  = /usr/lib64/lustre/tests/sanity-hsm.sh:2181:test_58()
  = /usr/lib64/lustre/tests/test-framework.sh:4530:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:4563:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:4433:run_test()
  = /usr/lib64/lustre/tests/sanity-hsm.sh:2185:main()
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;From the copytool log, it looks like there are three file restores, one for each of the calls to truncate_release_file:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;lhsmtool_posix[31629]: &apos;[0x200000401:0x1ba:0x0]&apos; action RESTORE reclen 72, cookie=0x5249f10d
lhsmtool_posix[31629]: processing file &apos;d0.sanity-hsm/d58/f.sanity-hsm.58&apos;
lhsmtool_posix[31629]: reading stripe rules from &apos;/archive/scratch/01ba/0000/0401/0000/0002/0000/0x200000401:0x1ba:0x0.lov&apos; for &apos;/archive/scratch/01ba/0000/0401/0000/0002/0000/0x200000401:0x1ba:0x0&apos;
lhsmtool_posix[31629]: restoring data from &apos;/archive/scratch/01ba/0000/0401/0000/0002/0000/0x200000401:0x1ba:0x0&apos; to &apos;{VOLATILE}=[0x200000402:0x18:0x0]&apos;
lhsmtool_posix[31629]: going to copy data from &apos;/archive/scratch/01ba/0000/0401/0000/0002/0000/0x200000401:0x1ba:0x0&apos; to &apos;{VOLATILE}=[0x200000402:0x18:0x0]&apos;
lhsmtool_posix[31629]: Going to copy 0 bytes /archive/scratch/01ba/0000/0401/0000/0002/0000/0x200000401:0x1ba:0x0 -&amp;gt; {VOLATILE}=[0x200000402:0x18:0x0]

lhsmtool_posix[31629]: data restore from &apos;/archive/scratch/01ba/0000/0401/0000/0002/0000/0x200000401:0x1ba:0x0&apos; to &apos;{VOLATILE}=[0x200000402:0x18:0x0]&apos; done
lhsmtool_posix[31629]: Action completed, notifying coordinator cookie=0x5249f10d, FID=[0x200000401:0x1ba:0x0], hp_flags=0 err=0
lhsmtool_posix[31629]: llapi_hsm_action_end() on &apos;/lustre/scratch/.lustre/fid/0x200000401:0x1ba:0x0&apos; ok (rc=0)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It&apos;s not clear why the status for the file didn&apos;t return &quot;SUCCEED&quot;.&lt;/p&gt;</description>
                <environment>Lustre 2.4.93 build # 1687&lt;br/&gt;
OpenSFS cluster with combined MGS/MDS, single OSS with two OSTs, four clients; one agent + Lustre client (c07), one Lustre client + with robinhood/db running (c08) and two Lustre clients (c09, c10) </environment>
        <key id="21213">LU-4035</key>
            <summary>sanity-hsm test_58 failure:  &apos;truncate 3158 does not trig restore, state = &apos;</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="jay">Jinshan Xiong</assignee>
                                    <reporter username="jamesanunez">James Nunez</reporter>
                        <labels>
                            <label>HSM</label>
                    </labels>
                <created>Tue, 1 Oct 2013 14:41:13 +0000</created>
                <updated>Wed, 19 Jan 2022 20:58:23 +0000</updated>
                            <resolved>Thu, 4 Dec 2014 18:30:22 +0000</resolved>
                                    <version>Lustre 2.5.0</version>
                    <version>Lustre 2.6.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>9</watches>
                                                                            <comments>
                            <comment id="68123" author="jay" created="Wed, 2 Oct 2013 05:17:29 +0000"  >&lt;p&gt;from the log, the restore process has been finished, but for unknown reason it delayed for a while, so that the test script didn;t see the restore state afterwards.&lt;/p&gt;

&lt;p&gt;From the MDT log, here is the only thing interesting:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;LustreError: 0-0: scratch-MDT0000: trigger OI scrub by RPC for [0x200000405:0x9:0x0], rc = 0 [1]
LustreError: 0-0: scratch-MDT0000: trigger OI scrub by RPC for [0x200000405:0x9:0x0], rc = 0 [1]
LustreError: 0-0: scratch-MDT0000: trigger OI scrub by RPC for [0x200000405:0x9:0x0], rc = 0 [1]
LustreError: 0-0: scratch-MDT0000: trigger OI scrub by RPC for [0x200000405:0x9:0x0], rc = 0 [1]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Not sure if this can cause the delay.&lt;/p&gt;</comment>
                            <comment id="72873" author="jamesanunez" created="Thu, 5 Dec 2013 03:59:42 +0000"  >&lt;p&gt;I&apos;ve run some more tests on the OpenSFS cluster and test 58 still fails some of the time. I can&apos;t consistently get it to fail and I don&apos;t see a pattern of failures when running the test alone nor in the full sanity-hsm test suite. I&apos;ve uploaded new logs at &lt;a href=&quot;https://maloo.whamcloud.com/test_sessions/b223c634-5d51-11e3-ad71-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sessions/b223c634-5d51-11e3-ad71-52540035b04c&lt;/a&gt; .&lt;/p&gt;</comment>
                            <comment id="72957" author="jay" created="Fri, 6 Dec 2013 02:04:38 +0000"  >&lt;p&gt;Hi James, do you know what made the test case failed?&lt;/p&gt;</comment>
                            <comment id="73025" author="jamesanunez" created="Sat, 7 Dec 2013 00:16:55 +0000"  >&lt;p&gt;Jinshan,  Unfortunately, I don&apos;t have any more information on why the test is failing. I&apos;ll try to see if I can get it to reproduce more reliably and that might provide more information.&lt;/p&gt;</comment>
                            <comment id="74293" author="sarah" created="Fri, 3 Jan 2014 04:07:54 +0000"  >&lt;p&gt;another instance seen in reivew-dne&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://maloo.whamcloud.com/test_sets/7e3e01a2-7388-11e3-8412-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/7e3e01a2-7388-11e3-8412-52540035b04c&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="75619" author="sarah" created="Sat, 25 Jan 2014 01:13:25 +0000"  >&lt;p&gt;hit this in interop testing between 2.6 client and 2.5 server:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://maloo.whamcloud.com/test_sets/64654b22-8333-11e3-a5fa-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/64654b22-8333-11e3-a5fa-52540035b04c&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="79117" author="adegremont" created="Wed, 12 Mar 2014 13:31:30 +0000"  >&lt;p&gt;I looked at it and this issue seems difficult to track. &quot;Usually&quot; this kind of issues appears when request state list is badly parsed or some timing issues where the request list entry is removed before we tried to read it.&lt;br/&gt;
Looking at sanity-hsm.sh, it is set to 10 seconds. According to Maloo logs, it seems there is only few seconds elapsed between truncate and get_request_state() call. &lt;/p&gt;

&lt;p&gt;If we have few apparition of this issue and we do not know how to reproduce it, this becomes tricky. I only imagine adding some debugging patch to try to gather more information about that. (Like printing request state content )&lt;/p&gt;</comment>
                            <comment id="100709" author="adilger" created="Thu, 4 Dec 2014 18:30:22 +0000"  >&lt;p&gt;Test has not failed in the past month, and those failures have a different message &quot;request on 0x200000401:0x2d:0x0 is not SUCCEED on mds1&quot;, so that should be a different bug.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzw4if:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>10839</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>