<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:49:21 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-5195] HSM: mdt_hsm_cdt_actions.c:104:cdt_llog_process() failed to process HSM_ACTIONS llog</title>
                <link>https://jira.whamcloud.com/browse/LU-5195</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Several times while testing HSM in a virtual environment (Centos 6.5 + Lustre 2.5.1 on clients and servers), we&apos;ve observed what may be HSM_ACTIONS llog corruption.&lt;/p&gt;

&lt;p&gt;Here&apos;s our internal bug description:&lt;br/&gt;
A Lustre filesystem where HSM and changelogs were used started misbehaving. The system was rebooted, and started spewing a lot of these traces in the system log:&lt;/p&gt;

&lt;p&gt;&amp;lt;3&amp;gt;LustreError: 2990:0:(mdt_hsm_cdt_actions.c:104:cdt_llog_process()) tas01-MDT0000: failed to process HSM_ACTIONS llog (rc=-2)&lt;br/&gt;
&amp;lt;3&amp;gt;LustreError: 2990:0:(mdt_hsm_cdt_actions.c:104:cdt_llog_process()) Skipped 600 previous similar messages&lt;br/&gt;
&amp;lt;3&amp;gt;LustreError: 2990:0:(llog_cat.c:192:llog_cat_id2handle()) tas01-MDD0000: error opening log id 0x1c:1:0: rc = -2&lt;br/&gt;
&amp;lt;3&amp;gt;LustreError: 2990:0:(llog_cat.c:192:llog_cat_id2handle()) Skipped 600 previous similar messages&lt;br/&gt;
&amp;lt;3&amp;gt;LustreError: 2990:0:(llog_cat.c:556:llog_cat_process_cb()) tas01-MDD0000: cannot find handle for llog 0x1c:1: -2&lt;br/&gt;
&amp;lt;3&amp;gt;LustreError: 2990:0:(llog_cat.c:556:llog_cat_process_cb()) Skipped 600 previous similar messages&lt;br/&gt;
&amp;lt;3&amp;gt;LustreError: 2990:0:(mdt_hsm_cdt_actions.c:104:cdt_llog_process()) tas01-MDT0000: failed to process HSM_ACTIONS llog (rc=-2)&lt;br/&gt;
&amp;lt;3&amp;gt;LustreError: 2990:0:(mdt_hsm_cdt_actions.c:104:cdt_llog_process()) Skipped 600 previous similar messages&lt;br/&gt;
&amp;lt;3&amp;gt;LustreError: 2990:0:(llog_cat.c:192:llog_cat_id2handle()) tas01-MDD0000: error opening log id 0x1c:1:0: rc = -2&lt;br/&gt;
&amp;lt;3&amp;gt;LustreError: 2990:0:(llog_cat.c:192:llog_cat_id2handle()) Skipped 600 previous similar messages&lt;/p&gt;

&lt;p&gt;At that point the MDS would not accept any HSM request, nor would it deliver any.&lt;/p&gt;

&lt;p&gt;The MGT/MDT were unmounted and remounted as ldisk, and the file hsm_actions was deleted. Lustre was then remounted, and HSM became usable again.&lt;/p&gt;

&lt;p&gt;We do not have a simple reproducer for this, but it has happened several times.&lt;/p&gt;</description>
                <environment></environment>
        <key id="25154">LU-5195</key>
            <summary>HSM: mdt_hsm_cdt_actions.c:104:cdt_llog_process() failed to process HSM_ACTIONS llog</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="jamesanunez">James Nunez</assignee>
                                    <reporter username="paf">Patrick Farrell</reporter>
                        <labels>
                            <label>hsm</label>
                            <label>patch</label>
                    </labels>
                <created>Fri, 13 Jun 2014 18:13:45 +0000</created>
                <updated>Mon, 20 Apr 2015 17:21:46 +0000</updated>
                            <resolved>Wed, 27 Aug 2014 17:15:03 +0000</resolved>
                                    <version>Lustre 2.5.1</version>
                                    <fixVersion>Lustre 2.7.0</fixVersion>
                    <fixVersion>Lustre 2.5.4</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="86598" author="paf" created="Fri, 13 Jun 2014 18:38:00 +0000"  >&lt;p&gt;Dump of the MDS is at:&lt;br/&gt;
ftp.whamcloud.com:/uploads/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5195&quot; title=&quot;HSM: mdt_hsm_cdt_actions.c:104:cdt_llog_process() failed to process HSM_ACTIONS llog&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5195&quot;&gt;&lt;del&gt;LU-5195&lt;/del&gt;&lt;/a&gt;/cdt_llog_process_HSM_ACTIONS_140613.tar.gz&lt;/p&gt;</comment>
                            <comment id="89952" author="haasken" created="Thu, 24 Jul 2014 16:05:28 +0000"  >&lt;p&gt;This issue occurred again on the same system.  Here is what led up to the incident, according to the person who was working on the system:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I was doing some archiving. The lustre client stopped working, so I rebooted both the client and the MDS.&lt;/p&gt;

&lt;p&gt;Now archiving is not working. On the client:&lt;/p&gt;

&lt;p&gt;  &amp;#35; cd /mnt/tas01/&lt;br/&gt;
  &amp;#35; lfs hsm_archive fz&lt;br/&gt;
  Cannot send HSM request (use of fz): No such file or directory&lt;/p&gt;

&lt;p&gt;On the MDS:&lt;/p&gt;

&lt;p&gt; &amp;#35; cat /proc/fs/lustre/mdt/tas01-MDT0000/hsm/actions&lt;br/&gt;
  cat: /proc/fs/lustre/mdt/tas01-MDT0000/hsm/actions: No such file or&lt;br/&gt;
directory&lt;/p&gt;

&lt;p&gt;I don&apos;t think I did some unexpected actions from an admin point of view.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;At this point, I got on the system and gathered as much relevant information as I could.&lt;/p&gt;

&lt;p&gt;I gathered full dk logs, the contents of the hsm_actions file on the MDT, the contents of the hsm proc files, and a dump of the system.&lt;/p&gt;

&lt;p&gt;I got the system working again by following the steps in this bug&apos;s description.  That is,&lt;/p&gt;

&lt;ol&gt;
	&lt;li&gt;Unmount the Lustre MDT.&lt;/li&gt;
	&lt;li&gt;Mount the MDT as ldiskfs.&lt;/li&gt;
	&lt;li&gt;Remove the file hsm_actions from the root of the MDT.&lt;/li&gt;
	&lt;li&gt;Unmount the MDT.&lt;/li&gt;
	&lt;li&gt;Remount the MDT as Lustre.  The LustreError messages stopped appearing in the console, and HSM was usable again.&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;After I got the HSM working again, I checked what would happen if I replaced the hsm_actions file on the MDT with the &quot;unhealthy&quot; one which was in place when HSM was not working.  When I did this and remounted the MDT as Lustre, I got the same LustreErrors in the console log again.  Replacing the hsm_actions file with the one which was previously in place got it working again.&lt;/p&gt;</comment>
                            <comment id="89975" author="haasken" created="Thu, 24 Jul 2014 17:11:54 +0000"  >&lt;p&gt;The logs and dump mentioned in the above comment have been uploaded to the whamcloud ftp server.&lt;/p&gt;

&lt;p&gt;ftp.whamcloud.com:/uploads/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5195&quot; title=&quot;HSM: mdt_hsm_cdt_actions.c:104:cdt_llog_process() failed to process HSM_ACTIONS llog&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5195&quot;&gt;&lt;del&gt;LU-5195&lt;/del&gt;&lt;/a&gt;/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5195&quot; title=&quot;HSM: mdt_hsm_cdt_actions.c:104:cdt_llog_process() failed to process HSM_ACTIONS llog&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5195&quot;&gt;&lt;del&gt;LU-5195&lt;/del&gt;&lt;/a&gt;-logs.tar.gz&lt;/p&gt;

&lt;p&gt;That tar contains a README describing each file in it.&lt;/p&gt;</comment>
                            <comment id="91429" author="fzago" created="Tue, 12 Aug 2014 17:09:42 +0000"  >&lt;p&gt;This bug can be reproduced by inserting the failed hsm_actions on a healthy filesystem.&lt;/p&gt;

&lt;p&gt;Proposed fix: &lt;a href=&quot;http://review.whamcloud.com/11419&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/11419&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="92631" author="jamesanunez" created="Wed, 27 Aug 2014 17:15:03 +0000"  >&lt;p&gt;Landed to master (2.7.0)&lt;/p&gt;</comment>
                            <comment id="92633" author="jamesanunez" created="Wed, 27 Aug 2014 17:16:59 +0000"  >&lt;p&gt;Patch for b2_5 at &lt;a href=&quot;http://review.whamcloud.com/#/c/11619/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/11619/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="94588" author="adegremont" created="Sun, 21 Sep 2014 14:22:17 +0000"  >&lt;p&gt;Is there some reasons to prevent the b2_5 patch to also land? Seems an interesting fix, just missing a +2...&lt;/p&gt;</comment>
                            <comment id="94621" author="jamesanunez" created="Mon, 22 Sep 2014 15:00:59 +0000"  >&lt;p&gt;Aurelien, &lt;/p&gt;

&lt;p&gt;When we start landing patches for 2.5.4, this patch will be considered for that release.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="29540">LU-6471</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwp1z:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>14513</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>