<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:45:15 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4720] Help on performance scalability</title>
                <link>https://jira.whamcloud.com/browse/LU-4720</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;With the above environment, need suggestion on performance at the clients, since i&apos;m struck. Appreciate your help.&lt;/p&gt;

&lt;p&gt;I&apos;m getting the block device write performance of 9.6GB/s with 16 LUNs which is measured through XDD. I ran obdfilter survey at the OSS machines and getting around 8.4GB/s as write performance. I&apos;ve measured the LNET performance and getting 9.6GB/s between OSS machines and 8 clients. which  But when I run IOR in the clients, i&apos;m getting around 2.6GB/s for write performance with one client. While I run it across two nodes, getting 4.4GB/s of write throughput. But while I scale beyond 2nodes, getting the same performance of 4GB/s only. Could you please help to find the root cause for this performance availability issue.&lt;/p&gt;</description>
                <environment>MDS Servers(2nos) - 12cores, 64GB Memory, QDR Single port&lt;br/&gt;
OSS Servers (2nos) - 16cores, 64GB Memory, FDR multi rail(2ports from each OSS)&lt;br/&gt;
Clients(8nos) - 12cores, 64GB memory, QDR single port&lt;br/&gt;
Lustre Version - 2.5&lt;br/&gt;
CentOS version - 6.4&lt;br/&gt;
No. of OSTs - 16(Configured in RAID-6(8+2)) and load balanced between OSS servers with 8 OSTs on each&lt;br/&gt;
</environment>
        <key id="23487">LU-4720</key>
            <summary>Help on performance scalability</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="chakravarthy.nagarajan1@wipro.com">Chakravarthy Nagarajan</reporter>
                        <labels>
                    </labels>
                <created>Thu, 6 Mar 2014 03:58:43 +0000</created>
                <updated>Wed, 12 Mar 2014 17:23:14 +0000</updated>
                                            <version>Lustre 2.5.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="78612" author="keith" created="Thu, 6 Mar 2014 18:09:09 +0000"  >&lt;p&gt;How many numbers for threads are you using per client? &lt;/p&gt;</comment>
                            <comment id="78613" author="chakravarthy.nagarajan1@wipro.com" created="Thu, 6 Mar 2014 18:14:41 +0000"  >&lt;p&gt;I&apos;ve set the no. of threads in OSS as 256. Means allocated 32 threads per client. Please advise.&lt;/p&gt;</comment>
                            <comment id="78614" author="keith" created="Thu, 6 Mar 2014 18:19:58 +0000"  >&lt;p&gt;Can you try 6 threads per client? &lt;/p&gt;</comment>
                            <comment id="78620" author="chakravarthy.nagarajan1@wipro.com" created="Thu, 6 Mar 2014 18:56:55 +0000"  >&lt;p&gt;It has reduced the performance with 2 clients  by 50% and the scalability issue still remains while I run with 4 clients. I&apos;ve initially set the no.of threads according to the total no. of spindles. But still no luck. Do you think Metadata may an issue, since i&apos;ve 2 MDts configured in RAID-1 only instead of RAID-1+0. &lt;/p&gt;</comment>
                            <comment id="78624" author="keith" created="Thu, 6 Mar 2014 19:20:25 +0000"  >&lt;p&gt;What is your single thread performance when ran on a single client?  If you have 6 threads and 1GB/s that seems a little odd.  Are you running IOR with one file per process or with a single file? &lt;/p&gt;


&lt;p&gt;Are you using DNE to have 2 metadata targets? IOR is not metadata intensive so it should not be a serious factor. &lt;/p&gt;</comment>
                            <comment id="78625" author="chakravarthy.nagarajan1@wipro.com" created="Thu, 6 Mar 2014 19:27:23 +0000"  >&lt;p&gt;On a single client i&apos;m getting 2.2GB/s. Yes i&apos;m using IOR with N-N only and also using DNE with 2 MDTs.&lt;/p&gt;</comment>
                            <comment id="79147" author="adilger" created="Wed, 12 Mar 2014 17:06:12 +0000"  >&lt;p&gt;The MDT doesn&apos;t have anything to do with IO performance.  The metadata on the MDS has nothing to do with block allocation.&lt;/p&gt;</comment>
                            <comment id="79152" author="chakravarthy.nagarajan1@wipro.com" created="Wed, 12 Mar 2014 17:23:14 +0000"  >&lt;p&gt;Thanks and even i&apos;ve realized the same by monitoring the MDS utilization. I&apos;ve tried the following, but no luck. Please advise.&lt;br/&gt;
1. Disable checksum at the clients&lt;br/&gt;
2. Increase RPCs in flight to 32 at the cleints&lt;br/&gt;
3. Disable LRu re-sizing at the clients&lt;br/&gt;
4. Set readahead_max_file_size to 1M at the oss machines&lt;br/&gt;
5. Tested with multiple thread counts upto 512 at oss machines.&lt;/p&gt;

&lt;p&gt;The issues is obdfilter ran at oss machines resulting 8.4GB/s, but clients are unable to get the same.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwgw7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>12974</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>