<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:45:41 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4768] ost-survey hangs on client 2.4</title>
                <link>https://jira.whamcloud.com/browse/LU-4768</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;ost-survey on this environment&lt;/p&gt;

&lt;p&gt;  mds, oss: lustre 2.4.2 (CentOS 6.5 with lustre patched kernel RPM)&lt;br/&gt;
  client: lustre 2.4.2 (CentOS 6.5 patchless client, lustre iokit 1.4.0.1)&lt;/p&gt;

&lt;p&gt;  command line: ost-survey -s 100 /lustre&lt;/p&gt;

&lt;p&gt;will hang, chewing all the CPU in the attempt to&lt;br/&gt;
echo 0 in /proc/fs/lustre/llite/*/max_cached_mb . This is due to the&lt;br/&gt;
subroutine cache_off in the ost-survey script.&lt;/p&gt;

&lt;p&gt;In a lustre client 2.1.6 the max_cached_mb appears as&lt;br/&gt;
a single number&lt;/p&gt;

&lt;ol&gt;
	&lt;li&gt;cat /proc/fs/lustre/llite/*/max_cached_mb&lt;br/&gt;
   18114&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;but on a 2.4.2 client is different&lt;/p&gt;

&lt;ol&gt;
	&lt;li&gt;cat /proc/fs/lustre/llite/*/max_cached_mb&lt;br/&gt;
   users: 2&lt;br/&gt;
   max_cached_mb: 96766&lt;br/&gt;
   used_mb: 21806&lt;br/&gt;
   unused_mb: 74960&lt;br/&gt;
   reclaim_count: 0&lt;/li&gt;
&lt;/ol&gt;



&lt;p&gt;So probably the script has not been fixed to work with the new output.&lt;/p&gt;
</description>
                <environment>mds, oss: lustre 2.4.2 (CentOS 6.5 with lustre patched kernel RPM)&lt;br/&gt;
client: lustre 2.4.2 (CentOS 6.5 patchless client, lustre iokit 1.4.0.1)&lt;br/&gt;
</environment>
        <key id="23631">LU-4768</key>
            <summary>ost-survey hangs on client 2.4</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="jamesanunez">James Nunez</assignee>
                                    <reporter username="deggio">francesco de giorgi</reporter>
                        <labels>
                    </labels>
                <created>Fri, 14 Mar 2014 11:52:49 +0000</created>
                <updated>Tue, 9 Dec 2014 18:28:03 +0000</updated>
                            <resolved>Sun, 5 Oct 2014 12:59:17 +0000</resolved>
                                    <version>Lustre 2.4.2</version>
                                    <fixVersion>Lustre 2.7.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>10</watches>
                                                                            <comments>
                            <comment id="79526" author="adilger" created="Mon, 17 Mar 2014 17:44:08 +0000"  >&lt;p&gt;Two things are wrong here:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;it doesn&apos;t make sense to allow the client to have no pages cached at all.  ll_max_cached_mb_seq_write() should always allow at least PTLRPC_MAX_BRW_PAGES of cached data, so tha the client can make well-formed RPCs. It probably should just silently limit the input value to max(pages_number, PTLRPC_MAX_BRW_PAGES) if the requested value is smaller.&lt;/li&gt;
	&lt;li&gt;the ost-survey script should be changed to ask for a minimum of 256 pages&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="91055" author="sknolin" created="Thu, 7 Aug 2014 13:57:15 +0000"  >&lt;p&gt;I&apos;m pretty sure this affects 2.5 client also, but don&apos;t have one up right now to examine.&lt;/p&gt;

&lt;p&gt;Anyway I&apos;d just add while it&apos;s a minor problem for things like initial set up testing, it can become a big problem if trying to debug possible bad filesystems. If you forget to remove ost-survey or modify it, and start testing from a client, that client hangs. If you&apos;re not someone aware of the problem, it can cause a bit of a sysadmin freak-out.&lt;/p&gt;

&lt;p&gt;ost-survey has other problems too, for example:&lt;/p&gt;

&lt;p&gt;It looks to me like it assumes you only have one filesystem mounted so when it does things like count ost&apos;s, it looks at:&lt;br/&gt;
/proc/fs/lustre/lov/&amp;#42;&amp;#45;clilov&amp;#45;&amp;#42;/numobd  -where it seems like it should look at /proc/fs/lustre/lov/$MNT&amp;#45;clilov&amp;#45;&amp;#42;/numobd.&lt;/p&gt;

&lt;p&gt;It uses the long deprecated positional parameters for setstripe, so spews a lot of errors&lt;/p&gt;

&lt;p&gt;I think this tool really does needs to be fixed or just removed. If it&apos;s too low a priority to get a proper fix and review, then just not distributing it would be preferable.&lt;/p&gt;

&lt;p&gt;Scott&lt;/p&gt;</comment>
                            <comment id="94314" author="jamesanunez" created="Wed, 17 Sep 2014 21:00:14 +0000"  >&lt;p&gt;Proposed master patch at: &lt;a href=&quot;http://review.whamcloud.com/11971&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/11971&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This patch only allows ost-survey to run and does not modify ll_max_cached_mb_seq_write().&lt;/p&gt;

&lt;p&gt;The comment was made to, in ost-survey, set max_cached_mb to 256 pages. The patch sets max_cached_mb to 256 pages (in MB), 256 * pagesize / 1024 * 1024, but this allows the read values to be, possibly, generous due to reading from cache. For example, on the system I was testing this patch on, the page size is 2621440. So, max_cached_mb is set to 640. Here are the results for max_cached_mb = 2 and max_cached_mb = 640:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;With max_cached_mb = 2:
Ost#  Read(MB/s)  Write(MB/s)  Read-time  Write-time
----------------------------------------------------
0     258.701       125.005        3.865      8.000
1     341.453       107.399        2.929      9.311
2     340.931       117.782        2.933      8.490
3     277.857       105.958        3.599      9.438

With max_cached_mb = 640:
Ost#  Read(MB/s)  Write(MB/s)  Read-time  Write-time
----------------------------------------------------
0     1227.975       128.612        0.814      7.775
1     1109.066       108.828        0.902      9.189
2     1069.722       127.521        0.935      7.842
3     1050.668       107.104        0.952      9.337
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Write performance is about the same in each case, but read performance is much larger with max_cached_mb = 640. Do we really want max_cached_mb to be 256 pages?&lt;/p&gt;</comment>
                            <comment id="94319" author="jamesanunez" created="Wed, 17 Sep 2014 21:37:06 +0000"  >&lt;p&gt;The patch for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5030&quot; title=&quot;&amp;quot;lctl {get,set}_param&amp;quot; should also check in /sys/fs/{lnet,lustre}&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5030&quot;&gt;&lt;del&gt;LU-5030&lt;/del&gt;&lt;/a&gt; made significant changes to ost-survey; &lt;a href=&quot;http://review.whamcloud.com/#/c/10534/8&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/10534/8&lt;/a&gt; .&lt;/p&gt;</comment>
                            <comment id="94337" author="simmonsja" created="Thu, 18 Sep 2014 02:27:42 +0000"  >&lt;p&gt;Yes the patch for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5030&quot; title=&quot;&amp;quot;lctl {get,set}_param&amp;quot; should also check in /sys/fs/{lnet,lustre}&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5030&quot;&gt;&lt;del&gt;LU-5030&lt;/del&gt;&lt;/a&gt; removed the direct access to the proc filesystem. Instead it uses lctl &lt;span class=&quot;error&quot;&gt;&amp;#91;s|g&amp;#93;&lt;/span&gt;et_param as it should. Let see if the patch does the wrong thing&lt;/p&gt;</comment>
                            <comment id="94468" author="jamesanunez" created="Thu, 18 Sep 2014 22:56:16 +0000"  >&lt;p&gt;I&apos;ve updated the patch to correctly get the page size and, as Andreas asked for, modified ll_max_cached_mb_seq_write() to set the requested pages to be the max of the requested pages and PTLRPC_MAX_BRW_PAGES.&lt;/p&gt;</comment>
                            <comment id="95684" author="pjones" created="Sun, 5 Oct 2014 12:59:17 +0000"  >&lt;p&gt;Landed for 2.7&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="24619">LU-5030</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="27114">LU-5773</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10030" key="com.atlassian.jira.plugin.system.customfieldtypes:labels">
                        <customfieldname>Epic/Theme</customfieldname>
                        <customfieldvalues>
                                        <label>test</label>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwhn3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>13113</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>