<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:37:18 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-3834] hsm_cdt_request_completed() may clear HS_RELEASED on failed restore</title>
                <link>https://jira.whamcloud.com/browse/LU-3834</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;In the restore case of hsm_cdt_request_completed(), if the copytool returned success but the layout swap fails then we get an unreadable file with HS_RELEASED clear but LOV_PATTERN_F_RELEASED set.&lt;/p&gt;

&lt;p&gt;Perhaps the new HSM attributes should be applied to the volatile object before layout swap, and hsm_swap_layouts() should call mo_swap_layouts() with SWAP_LAYOUTS_MDS_HSM set.&lt;/p&gt;</description>
                <environment></environment>
        <key id="20615">LU-3834</key>
            <summary>hsm_cdt_request_completed() may clear HS_RELEASED on failed restore</summary>
                <type id="7" iconUrl="https://jira.whamcloud.com/images/icons/issuetypes/task_agile.png">Technical task</type>
                            <parent id="20020">LU-3647</parent>
                                    <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bfaccini">Bruno Faccini</assignee>
                                    <reporter username="jhammond">John Hammond</reporter>
                        <labels>
                            <label>HSM</label>
                    </labels>
                <created>Mon, 26 Aug 2013 19:48:32 +0000</created>
                <updated>Wed, 24 Feb 2016 17:25:14 +0000</updated>
                            <resolved>Mon, 20 Jan 2014 16:52:36 +0000</resolved>
                                    <version>Lustre 2.5.0</version>
                                    <fixVersion>Lustre 2.6.0</fixVersion>
                    <fixVersion>Lustre 2.5.1</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="65830" author="bfaccini" created="Thu, 5 Sep 2013 12:37:36 +0000"  >&lt;p&gt;John,&lt;br/&gt;
Sorry, but HSM code is still new for me, so I need some clarifications here ...&lt;/p&gt;

&lt;p&gt;In this particular case of error handling, there is no other option than to return the file to its released state, is&apos;nt it ? &lt;br/&gt;
So is this what you mean by having &quot;hsm_swap_layouts() call mo_swap_layouts() with SWAP_LAYOUTS_MDS_HSM set&quot;, as during release ?&lt;/p&gt;
</comment>
                            <comment id="65866" author="bfaccini" created="Thu, 5 Sep 2013 18:07:42 +0000"  >&lt;p&gt;John,&lt;br/&gt;
Can you also detail me the scenario/conditions when you encountered such problem already ? As we discussed I may then be able to reproduce it or do some kind of error injection.&lt;/p&gt;</comment>
                            <comment id="65871" author="jhammond" created="Thu, 5 Sep 2013 18:54:15 +0000"  >&lt;p&gt;Bruno,&lt;/p&gt;

&lt;p&gt;(I am just restating what I said on the call today.)&lt;/p&gt;

&lt;p&gt;In mdt_hsm_release() we call mdd_swap_layouts() with the SWAP_LAYOUTS_MDS_HSM flag, which causes the HSM xattrs to be handled along with the LOV xattrs, and rolls back both xattrs on failure.&lt;/p&gt;

&lt;p&gt;Contrast this with hsm_cdt_request_completed() which calls mdt_hsm_attr_set() and hsm_swap_layouts(). hsm_swap_layouts() then calls mdd_swap_layouts() with no flags. In this case if the layout swap fails then we do not restore the HSM xattr to its previous (released set) state.&lt;/p&gt;

&lt;p&gt;I have not checked but I suspect that may be possible to remodel the handing of restore after that of release and thereby avoid this inconsistency when layout swap fails.&lt;/p&gt;</comment>
                            <comment id="65908" author="jay" created="Fri, 6 Sep 2013 05:17:44 +0000"  >&lt;p&gt;If this is a simple fix, then we can work out a patch for this. Otherwise I&apos;d like to put the resource on something else because it&apos;s unlikely for swap_layout to fail anyway.&lt;/p&gt;</comment>
                            <comment id="66337" author="bfaccini" created="Wed, 11 Sep 2013 13:45:53 +0000"  >&lt;p&gt;Ok, I see where this can be fixed now, thank&apos;s John. But now, and to save time before submit patch, what is the preferred way to do this ? :&lt;/p&gt;

&lt;p&gt;      _ always call mo_swap_layouts() with SWAP_LAYOUTS_MDS_HSM flag from hsm_swap_layouts(), since (actually?) hsm_swap_layouts() is only called from mdt_hsm_update_request_state() for a RESTORE op.&lt;/p&gt;

&lt;p&gt;      _ add a new flags to hsm_swap_layouts() to enable SWAP_LAYOUTS_MDS_HSM flag use or not during call to mo_swap_layouts().&lt;/p&gt;
</comment>
                            <comment id="66375" author="jhammond" created="Wed, 11 Sep 2013 18:06:26 +0000"  >&lt;p&gt;&amp;gt; _ always call mo_swap_layouts() with SWAP_LAYOUTS_MDS_HSM flag from hsm_swap_layouts(), since (actually?) hsm_swap_layouts() is only called from mdt_hsm_update_request_state() for a RESTORE op.&lt;/p&gt;

&lt;p&gt;This seems better to me.&lt;/p&gt;</comment>
                            <comment id="67532" author="bfaccini" created="Wed, 25 Sep 2013 13:45:25 +0000"  >&lt;p&gt;1st patch attempt is at &lt;a href=&quot;http://review.whamcloud.com/7631&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/7631&lt;/a&gt;. &lt;br/&gt;
Unfortunately it failed in multiple auto-tests and needs at least to re-base. &lt;br/&gt;
But also failed in several sub-tests of sanity-hsm, which is highly suspect ... &lt;br/&gt;
Currently under investigations.&lt;/p&gt;</comment>
                            <comment id="71792" author="bfaccini" created="Mon, 18 Nov 2013 15:12:32 +0000"  >&lt;p&gt;Latest patch-set #6 auto-tests failures appear not related to this patch but to multiple issues already addressed in others tickets (&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4093&quot; title=&quot;sanity-hsm test_24d: wanted &amp;#39;SUCCEED&amp;#39; got &amp;#39;WAITING&amp;#39;&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4093&quot;&gt;&lt;del&gt;LU-4093&lt;/del&gt;&lt;/a&gt;, &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4086&quot; title=&quot;Test failure on test suite sanity-hsm, subtest test_33&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4086&quot;&gt;&lt;del&gt;LU-4086&lt;/del&gt;&lt;/a&gt;, &#8230;). Some of them have heir patches that landed until now, so I will rebase it again &#8230;&lt;/p&gt;</comment>
                            <comment id="72298" author="bfaccini" created="Tue, 26 Nov 2013 12:34:36 +0000"  >&lt;p&gt;I found a possible bug in my original patch version causing layout-lock not to be released when restore is canceled &#8230; Just submitted patch-set #8 to fix this, will see if is passes auto-tests (particularly sanity-hsm/test_33 which was timing-out due to md5sum process never ending!!&#8230;).&lt;/p&gt;</comment>
                            <comment id="72796" author="bfaccini" created="Wed, 4 Dec 2013 14:19:12 +0000"  >&lt;p&gt;I am wondering if I should also add some error injection to simulate SWAP_LAYOUT failure during restore ??&lt;/p&gt;

&lt;p&gt;I will also push a new patch-set #8 to address John&apos;s last comment and convert to usual error handling style.&lt;/p&gt;</comment>
                            <comment id="73568" author="bfaccini" created="Mon, 16 Dec 2013 13:55:49 +0000"  >&lt;p&gt;Some update, after I added fault-injection (force -ENOENT in the middle of mdd_swap_layouts() to cause layouts swap back) and associated sub-test test_12o within patch-set #8.&lt;/p&gt;

&lt;p&gt;test_12o fails due to &quot;diff&quot; command, that caused the implicit restore, to be successful when it is expected to fail because of the fault-injection. Strange is that the Restore operation has been marked as failed, the Copytool received the error, and file still has the &quot;released&quot; flag set!!&lt;/p&gt;

&lt;p&gt;I wonder if there could be some issue in mdd_swap_layouts() causing this unexpected behavior ?&lt;/p&gt;
</comment>
                            <comment id="74342" author="bfaccini" created="Sat, 4 Jan 2014 15:51:16 +0000"  >&lt;p&gt;Hehe, finally I found that my fault-injection code itself introduced some problem because being added after the volatile/2nd file layout change and not reverting it to mimic the error !! This caused the restored datas to be available as if restore succeed &#8230;&lt;/p&gt;

&lt;p&gt;I changed this in patch-set #13, and now new sub-test test_12o runs fine, returning errors on both copytool (ENOTSUPP, injected!) and client (ENODATA) sides with layout-swap fault-injection, and next restore attempt without fault-injection to be successful.&lt;/p&gt;

&lt;p&gt;Will run with build+patch locally and see if I can still reproduce the Volatile object leak on MDT, seen as part of this ticket and &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4293&quot; title=&quot;lfs_migrate is failing with a volatile file Operation not permitted error&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4293&quot;&gt;&lt;del&gt;LU-4293&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="75284" author="bfaccini" created="Mon, 20 Jan 2014 16:52:36 +0000"  >&lt;p&gt;patch &lt;a href=&quot;http://review.whamcloud.com/7631&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/7631&lt;/a&gt; has landed. Closing.&lt;/p&gt;</comment>
                            <comment id="76652" author="adilger" created="Mon, 10 Feb 2014 21:00:30 +0000"  >&lt;p&gt;Patch was only landed to master and not b2_5.  In the future, this type of patch should be cherry-picked to b2_5 so that it is fixed in the maintenance release.&lt;/p&gt;</comment>
                            <comment id="76706" author="bfaccini" created="Tue, 11 Feb 2014 08:55:25 +0000"  >&lt;p&gt;Hello Andreas,&lt;br/&gt;
I am sorry if I missed to do something here, to be honest actually I mainly focus to get the patch done for the branch where problem has been reported. But then should I create a new patch version for each affected version listed?&lt;/p&gt;</comment>
                            <comment id="76709" author="adilger" created="Tue, 11 Feb 2014 10:11:20 +0000"  >&lt;p&gt;Bruno, the patch was marked as affecting the 2.5.0 release.  I&apos;m just going through patches that have landed to master and trying to see which ones need to be landed for 2.5.1 that have not been landed there, since that is the long-term maintenance release.  If you are closing a but then you should consider if it is fixing a problem that is serious and may affect earlier versions of Lustre and should land on the maintenance release.  In many cases, Oleg can cherry-pick the patch directly to b2_5 without putting it through Gerrit/Jenkins/autotest again, but he needs to know to do this.&lt;/p&gt;</comment>
                            <comment id="76710" author="bfaccini" created="Tue, 11 Feb 2014 11:40:21 +0000"  >&lt;p&gt;Ok thanks Andreas, I understand now that I need to take care of this because it is also under my responsibility, if a patch is required for earlier versions,  to either create+push a new patch for each other versions or ask Oleg to cherry-pick the original patch for each other versions.&lt;/p&gt;

&lt;p&gt;I don&apos;t know why but I thought that the patch integration/release decision was done by other people (you, Oleg, Peter, &#8230;), this may simply be you are doing this verification work very requently and do the job for lazy guys like me!!&lt;/p&gt;</comment>
                            <comment id="76813" author="bfaccini" created="Wed, 12 Feb 2014 10:05:39 +0000"  >&lt;p&gt;Andreas, you b2_5 patch for this ticket at &lt;a href=&quot;http://review.whamcloud.com/9212&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/9212&lt;/a&gt;, has found a flaw in sanity-hsm/test_12o (from original patch &lt;a href=&quot;http://review.whamcloud.com/7631&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/7631&lt;/a&gt; from this ticket too !!) during auto-tests session.&lt;/p&gt;

&lt;p&gt;This new problem is tracked within &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4613&quot; title=&quot;Failure on test suite sanity-hsm test_12o: request on 0x200000bd1:0xf:0x0 is not SUCCEED on mds1&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4613&quot;&gt;&lt;del&gt;LU-4613&lt;/del&gt;&lt;/a&gt; where I already pushed a patch to master (&lt;a href=&quot;http://review.whamcloud.com/9235&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/9235&lt;/a&gt;), since #7631 has already landed to master, but what should we do for the b2_5 version you just pushed ?&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvz87:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9919</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>