<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:02:48 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-6737] many stripe testing of DNE2</title>
                <link>https://jira.whamcloud.com/browse/LU-6737</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Many stripe count test&lt;/p&gt;

&lt;p&gt;The many stripe count functional test is intended to show that a DNE2 configuration can handle many MDTs in a single filesystem, and a single directory can be striped over many MDTs.  Due to the virtual AWS environment in which this is being tested, while performance will be measured, neither performance scaling nor load testing are primary goals of this test.  It is rather a functional scaling test of the ability of the filesystem configuration and directory striping code to handle a large number of MDTs.&lt;/p&gt;

&lt;ol&gt;
	&lt;li&gt;Create a filesystem with 128 MDTs, 128 OSTs and at least 128 client mount points (multiple mounts per client)&lt;/li&gt;
	&lt;li&gt;Create striped directories with stripe count N in 16, 32, 64, 96, 128:
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;        lfs setdirstripe -c N /mnt/lustre/testN
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt; Note: This command creates a striped directory across N MDTs.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;        lfs setdirstripe -D -c N /mnt/lustre/testN
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt; Note: This command sets the default stripe count to N. All directories created within this directory will have this default stripe count applied.&lt;/p&gt;&lt;/li&gt;
	&lt;li&gt;Run mdtest on all client mount points, and each thread will create/stat/unlink at least 128k files in the striped test directory. Run this test under a striped directory with default stripes, so all of subdirectories will be striped directory.
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;        lfs setdirstripe -c N /mnt/lustre/testN
        lfs setdirstripe -D -c N /mnt/lustre/testN
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;&lt;/li&gt;
	&lt;li&gt;No errors will be observed, and balanced striping of files across MDTs will be observed.&lt;/li&gt;
&lt;/ol&gt;
</description>
                <environment></environment>
        <key id="30717">LU-6737</key>
            <summary>many stripe testing of DNE2</summary>
                <type id="3" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11318&amp;avatarType=issuetype">Task</type>
                                            <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="rread">Robert Read</assignee>
                                    <reporter username="rhenwood">Richard Henwood</reporter>
                        <labels>
                    </labels>
                <created>Wed, 17 Jun 2015 18:14:14 +0000</created>
                <updated>Thu, 16 Jul 2015 15:02:24 +0000</updated>
                            <resolved>Thu, 2 Jul 2015 16:05:59 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="118865" author="rread" created="Wed, 17 Jun 2015 19:02:08 +0000"  >&lt;p&gt;Can mdsrate be used instead of mdtest? If mdtest is required, please post a script to run this as I don&apos;t have one for mdtest.&lt;/p&gt;

&lt;p&gt;Should the stats and unlinks be done on different client nodes than the one that created them?&lt;/p&gt;

&lt;p&gt;Just one thread per client mount point?&lt;/p&gt;

&lt;p&gt;Please provide a method to determine if directory striping is balanced.  Since test requires unlink, this need to be integrated into the test loop and ideally could be done using MPI, too. &lt;/p&gt;

&lt;p&gt;Does &quot;No errors&quot; just mean no application errors?  Or does this also mean there should be no lustre messages printed on any any of the consoles during the test run?&lt;/p&gt;

&lt;p&gt;Just to be precise, does 128k mean 12800 or  2^17 (131072)?&lt;/p&gt;</comment>
                            <comment id="118873" author="adilger" created="Wed, 17 Jun 2015 19:25:12 +0000"  >&lt;p&gt;Since this is mostly intended to be a functional test of how many MDTs the DNE2 code can use instead of a performance test, there is a large leeway in terms of the testing options.  I don&apos;t have a strong preference for mdtest over mdsrate, with the minor caveat that mdsrate is a Lustre-specific benchmark while mdtest is not.  The goal would be to create all of the files in the one striped directory, rather than having each client/thread create its own subdirectory.&lt;/p&gt;

&lt;p&gt;There could be multiple threads per client mountpoint, since even without the multi-slot last_rcvd patches or other workarounds there can be one RPC in flight &lt;em&gt;per MDT&lt;/em&gt; so this would also provide natural scaling at the client as the number of MDTs increases.&lt;/p&gt;

&lt;p&gt;As for determining the MDT load balance, given the large numbers of files and the fact that these are newly formatted filesystems I think that &lt;tt&gt;lfs df -i&lt;/tt&gt; before and after each test would be enough to determine whether the created files are roughly evenly distributed across MDTs or not.  Since the MDT selection is done via a hash function, the distribution should be fairly even but not perfectly so.  Ideally, if you already have infrastructure in CE to monitor MDS load (e.g. LMT) then it would be &lt;em&gt;interesting&lt;/em&gt; to see if the load is distributed evenly across MDSes during runtime, but that is not a requirement for this testing since it is targeted at testing the limits of MDSes and MDTs counts.&lt;/p&gt;

&lt;p&gt;&quot;No errors&quot; means at a minimum no application-visible errors.  For a purely functional test like this I would also expect that there are no Lustre-level errors either (timeouts, etc).  If any Lustre errors are printed during the test run please attach them here, or create a new ticket if they indicate some more serious problem.&lt;/p&gt;

&lt;p&gt;As for 128000 vs 131072 I don&apos;t think it really matters - the goal is to create a decent number of files per MDT to ensure a reasonable minimum runtime without creating so many that the tests with low MDT counts take too long.  Creating 128 * 128000 files = 16M files, which would likely be too many for a 1-stripe directory, but should be reasonable for 16+ stripes (~1M/MDT at 16 stripes, down to ~128K/MDT at 128 stripes) which is the minimum for this test unless the hash distribution is totally broken.&lt;/p&gt;</comment>
                            <comment id="118877" author="rread" created="Wed, 17 Jun 2015 19:56:01 +0000"  >&lt;p&gt;What is the difference between these three commands?&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;        lfs setdirstripe -c N /mnt/lustre/testN
        lfs setdirstripe -D -c N /mnt/lustre/testN
        lfs mdkir -c N /mnt/lustre/testN
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="118881" author="di.wang" created="Wed, 17 Jun 2015 20:04:12 +0000"  >&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt; 
lfs setdirstripe -c N /mnt/lustre/testN
lfs mdkir -c N /mnt/lustre/testN
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;They are same, and they will create a striped directory with stripe_count =2. Note: if you do not indicate -i here, the master stripe(stripe 0) will be in the same MDT with its parent.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;lfs setdirstripe -D -c N /mnt/lustre/testN
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This will be used to set default stripe count of testN, i.e. all of the subdirectories under testN will be created with this layout (-c N). And also the default stripe will be inherited by these subdirectories as well.&lt;/p&gt;
</comment>
                            <comment id="118902" author="rread" created="Wed, 17 Jun 2015 21:22:14 +0000"  >&lt;p&gt;Is it required for testN to be a striped directory in order set the default to be striped? In other words, would the following result in striped subdirectories of testN:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;    mkdir /mnt/lustre/testN
    lfs setdirstripe -D -c N /mnt/lustre/testN
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
</comment>
                            <comment id="118903" author="di.wang" created="Wed, 17 Jun 2015 21:30:14 +0000"  >&lt;p&gt;No, it is not required. i.e. lfs setdirstripe will set default stripes on both normal dir and striped directory.&lt;/p&gt;</comment>
                            <comment id="118921" author="rread" created="Thu, 18 Jun 2015 00:12:40 +0000"  >&lt;p&gt;As I already have some automation built around mdsrate, I&apos;ll use that and ensure all threads use a single, shared directory.  I also have a patch to mdsrate that adds support for directory striping, though I&apos;ll use the lfs commands to make it explicit. &lt;/p&gt;

&lt;p&gt;I&apos;ll add an `lfs df -i` before and after the create step of each test so we can confirm, at least manually, that the files are balanced.&lt;/p&gt;

&lt;p&gt;Do you want the files  created with 0-stripes or normally with a single stripe?&lt;/p&gt;

&lt;p&gt;BTW, I&apos;ve been putting our provisioning tools through the paces today, but thought I&apos;d try one client with 128k files. I noticed that large DNE striping has a pretty big impact on directory scanning performance:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[ec2-user@client00 ~]$ lfs getdirstripe /mnt/scratch/test/dir-0/
/mnt/scratch/test/dir-0/
lmv_stripe_count: 0 lmv_stripe_offset: 0

[ec2-user@client00 ~]$ time lfs find /mnt/scratch/test/dir-0/ |wc -l 
131073

real	0m0.112s
user	0m0.038s
sys	0m0.159s

[ec2-user@client00 ~]$ time lfs getdirstripe /mnt/scratch/test128/dir-0/ 
/mnt/scratch/test128/dir-0/
lmv_stripe_count: 128 lmv_stripe_offset: 0
[stripe details deleted]

[ec2-user@client00 ~]$ time lfs find /mnt/scratch/test128/dir-0 |wc -l
131073

real	0m43.969s
user	0m0.053s
sys	0m43.990s
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I noticed this because &quot;lfs getdirstripe&quot; was taking ~40s to return because for some reason lfs reads all the directory entries after printing the striping data. I&apos;ll make sure to only do this on empty directories for now. &lt;/p&gt;</comment>
                            <comment id="118923" author="adilger" created="Thu, 18 Jun 2015 00:23:20 +0000"  >&lt;p&gt;It seems like a bug for &quot;lfs getdirstripe&quot; to scan all the entries in the subdirectory I think?  That should require &quot;-R&quot; to scan subdirectories. &lt;/p&gt;

&lt;p&gt;As for &quot;lfs find&quot; I guess it is doing the reassure on all 128 directory shards, but it would be interesting to compare if this is slower than e.g. &quot;lfs find&quot; on a directory with 128 subdirs with an equal number of files (i.e. 1000/subdir). &lt;/p&gt;</comment>
                            <comment id="118924" author="di.wang" created="Thu, 18 Jun 2015 00:33:33 +0000"  >&lt;p&gt;If you have enough OSTs(let&apos;s say &amp;gt;= 32), then single stripe, otherwise zero stripe. &lt;/p&gt;

&lt;p&gt;I assume /mnt/scratch/test128/dir-0 has 128 stripes?  all children(131073) under dir-0 are regular files?  Strange, I  did not expect lfs find under striped directory are so slow. IMHO, it should be similar as no-striped directory. something might be wrong. probably statahead. Could you please collect client side -1 debug log? Thanks &lt;/p&gt;</comment>
                            <comment id="118926" author="rread" created="Thu, 18 Jun 2015 01:05:51 +0000"  >&lt;p&gt;&quot;lfs getdirstripe &amp;lt;dir&amp;gt;&quot; is only printing the stripe info of the one directory so the reason for the long pause was not obvious.  I had to use strace to see it reading all the dirents after it prints the stripe info.  Yes I&apos;d agree it&apos;s a bug. I peaked at the code, and this behavior appears to be buried in the details of the llapi_semantic_traverse().&lt;/p&gt;

&lt;p&gt;Yes, test128/dir-0 has 128 stripes with 128k regular files in one directory.  test/dir-0 also 128k regular files in one directory. &lt;/p&gt;

&lt;p&gt;I&apos;ll try with 128 unstriped subdirectories for comparison next time, but I suspect scanning that will still be quick. &lt;/p&gt;</comment>
                            <comment id="119583" author="di.wang" created="Thu, 25 Jun 2015 09:38:25 +0000"  >&lt;p&gt;Robert, any update for the test? Thanks. &lt;/p&gt;</comment>
                            <comment id="119603" author="rread" created="Thu, 25 Jun 2015 15:43:35 +0000"  >&lt;p&gt;My tools are ready, but haven&apos;t had a chance to go run the full test yet. Will try to get to this today. &lt;/p&gt;</comment>
                            <comment id="119941" author="rread" created="Tue, 30 Jun 2015 16:59:27 +0000"  >&lt;p&gt;Log file and results summary for test run.  &lt;/p&gt;

&lt;p&gt;Details&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;8 MDS nodes, each with 16x MDT&lt;/li&gt;
	&lt;li&gt;8 OSS nodes, each with 16x OST&lt;/li&gt;
	&lt;li&gt;8 clients, each with 16 mount points&lt;/li&gt;
	&lt;li&gt;all nodes were m3.2xlarge instances&lt;/li&gt;
	&lt;li&gt;4 test runs, each in a single shared 16, 32, 64, 128 striped directory&lt;/li&gt;
	&lt;li&gt;mdsrate &amp;#45;-create, &amp;#45;-stat, &amp;#45;-unlink in each directory&lt;/li&gt;
	&lt;li&gt;128k files per MDT for each run&lt;/li&gt;
	&lt;li&gt;8 threads per MDT for each run&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="119943" author="rread" created="Tue, 30 Jun 2015 17:13:18 +0000"  >&lt;p&gt;Although this was not intended to be a performance test, I did notice that the stripe allocation policy for striped directories appears to be simplistic. As you can see, it appears to always  allocate N sequential targets starting from MDT0.  This means usage of MDTs will be very uneven unless all directories are widely striped. &lt;/p&gt;

&lt;p&gt;CE is designed to provision targets  sequentially on each node, and with during striped directory allocation scheme this results in the initial 16 MDT striped directory using a single MDS, rather than using all of them. In the interest of saving time, I changed the target allocation scheme specifically for this test so targets were staggered across the servers, and this balance IO across all MDS instances for all test runs. &lt;/p&gt;</comment>
                            <comment id="119982" author="adilger" created="Tue, 30 Jun 2015 22:39:38 +0000"  >&lt;p&gt;Robert, you are correct that the current DNE MDT allocation policy is not as balanced as the OST allocation policy.  That is an enhancement for the future, including taking MDT space usage into account.&lt;/p&gt;

&lt;p&gt;It should be noted that the DNE allocation policy isn&apos;t necessarily to always start at MDT0, but rather (I believe by default) it will use the parent directory as the master (stripe 0) and round-robin from there, so if all of the directories are created off the filesystem root they will use MDT0 as a starting point.  This can be changed via &lt;tt&gt;lfs mkdir &amp;#45;i &amp;lt;master_mdt_idx&amp;gt; &amp;#45;c N&lt;/tt&gt; to explicitly start the stripe creation on a different MDT, but it isn&apos;t as good as an improved MDT allocation policy.&lt;/p&gt;</comment>
                            <comment id="120121" author="rread" created="Wed, 1 Jul 2015 23:41:10 +0000"  >&lt;p&gt;Results from the 96 stripe run. &lt;/p&gt;</comment>
                            <comment id="120160" author="rhenwood" created="Thu, 2 Jul 2015 16:05:59 +0000"  >&lt;p&gt;Thanks for you help Robert - we&apos;ve got the data we need.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10120">
                    <name>Blocker</name>
                                            <outwardlinks description="is blocking">
                                        <issuelink>
            <issuekey id="31106">LU-6858</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is blocked by">
                                        <issuelink>
            <issuekey id="30066">LU-6602</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="18350" name="20150629-bench.log" size="282111" author="rread" created="Tue, 30 Jun 2015 16:59:27 +0000"/>
                            <attachment id="18351" name="20150629-results.json" size="1044" author="rread" created="Tue, 30 Jun 2015 16:59:27 +0000"/>
                            <attachment id="18364" name="20150701-bench96.log" size="77967" author="rread" created="Wed, 1 Jul 2015 23:41:10 +0000"/>
                            <attachment id="18365" name="20150701-results96.json" size="264" author="rread" created="Wed, 1 Jul 2015 23:41:10 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzxg1b:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>