<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:37:41 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-3876] flow control of HSM requests</title>
                <link>https://jira.whamcloud.com/browse/LU-3876</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;In a stress test I did today, I created 40K files and archive them with 2 clients. The requests were queued into MDT successfully but it caused other problems.&lt;/p&gt;

&lt;p&gt;the first problem is the lprocfs implementation of agent_action. The symptom is:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@mds01 ~]# lctl get_param mdt.*.hsm.agent_actions
error: get_param: read(&apos;/proc/fs/lustre/mdt/hsm-MDT0000/hsm/agent_actions&apos;) failed: Cannot allocate memory
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Though I didn&apos;t look at it yet, I think the root cause is that the llog is too long so it ran into a problem for some reason.&lt;/p&gt;

&lt;p&gt;I think the more severe problem is flow control. It&apos;s not good to keep the requests in queue so much long, at least we should have a parameter to control how long the maximum length of queue will be.&lt;/p&gt;

&lt;p&gt;Another problem I saw in the test is that:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;LustreError: 27319:0:(mdt_coordinator.c:1418:mdt_hsm_update_request_state()) hsm-MDT0000: Cannot find running request for cookie 0x5226bb27 on fid=[0x200000400:0xee5:0x0]
LustreError: 27319:0:(mdt_coordinator.c:1418:mdt_hsm_update_request_state()) Skipped 74 previous similar messages
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;There were a huge number of this warning. I will dig it tomorrow&lt;/p&gt;</description>
                <environment></environment>
        <key id="20757">LU-3876</key>
            <summary>flow control of HSM requests</summary>
                <type id="7" iconUrl="https://jira.whamcloud.com/images/icons/issuetypes/task_agile.png">Technical task</type>
                            <parent id="20020">LU-3647</parent>
                                    <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="jay">Jinshan Xiong</assignee>
                                    <reporter username="jay">Jinshan Xiong</reporter>
                        <labels>
                            <label>HSM</label>
                    </labels>
                <created>Wed, 4 Sep 2013 06:04:49 +0000</created>
                <updated>Tue, 24 Sep 2013 20:46:06 +0000</updated>
                            <resolved>Tue, 24 Sep 2013 20:46:06 +0000</resolved>
                                                    <fixVersion>Lustre 2.5.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="66144" author="jay" created="Tue, 10 Sep 2013 01:07:08 +0000"  >&lt;p&gt;patch is at: &lt;a href=&quot;http://review.whamcloud.com/7589&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/7589&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Just fix the problem of ENOMEM. More work will be needed to add flow control.&lt;/p&gt;</comment>
                            <comment id="66146" author="jhammond" created="Tue, 10 Sep 2013 01:09:25 +0000"  >&lt;p&gt;From the autotest logs I have also seen this file return -EIO causing sanity-hsm test 40 to pass when it should have failed. Does anyone have any idea why it might do so?&lt;/p&gt;</comment>
                            <comment id="66932" author="jay" created="Wed, 18 Sep 2013 16:34:21 +0000"  >&lt;p&gt;In 2.5, we&apos;re going to fix the problem of dumping a huge amount of agent_actions only. The real flow control will be fixed in 2.6 due to limited resource.&lt;/p&gt;</comment>
                            <comment id="67459" author="jlevi" created="Tue, 24 Sep 2013 20:46:06 +0000"  >&lt;p&gt;Patch landed to Master. Follow on work for 2.6 is being tracked in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4004&quot; title=&quot;CLONE - flow control of HSM requests&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4004&quot;&gt;&lt;del&gt;LU-4004&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvzvj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>10057</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>