<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:05:46 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-13973] 4K random write performance impacts on large sparse files</title>
                <link>https://jira.whamcloud.com/browse/LU-13973</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Here is a tested workload.&lt;/p&gt;

&lt;p&gt;4k, random write, FPP(File per process)&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[randwrite]
ioengine=libaio
rw=randwrite
blocksize=4k
iodepth=4
direct=1
size=${SIZE}
runtime=60
numjobs=16
group_reporting
directory=/ai400x/out
create_serialize=0
filename_format=f.$jobnum.$filenum
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The test case is that 2 clients have each 16 fio processes and each fio process does 4k random write to different files.&lt;br/&gt;
 However, if file size is large (128GB in this case), it causes the huge performance impacts. Here is two test results.&lt;/p&gt;

&lt;p&gt;1GB file&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# SIZE=1g /work/ihara/fio.git/fio --client=hostfile randomwrite.fio

write: IOPS=16.8k, BW=65.5MiB/s (68.7MB/s)(3930MiB/60004msec); 0 zone resets
 &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;128GB file&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# SIZE=128g /work/ihara/fio.git/fio --client=hostfile randomwrite.fio

write: IOPS=2894, BW=11.3MiB/s (11.9MB/s)(679MiB/60039msec)
 &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;As far as I observed those two cases and collected cpu profiles on OSS, in 128GB file case, there were big spinlocks in ldiskfs_mb_new_block() and ldiskfs_mb_normalized_request() and it spent 89% time (14085/15823 samples) of total ost_io_xx() against 20% (1895/9296 samples) in 1GB file case. Please see attached framegraph.&lt;/p&gt;</description>
                <environment>master</environment>
        <key id="60867">LU-13973</key>
            <summary>4K random write performance impacts on large sparse files</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="qian_wc">Qian Yingjin</assignee>
                                    <reporter username="sihara">Shuichi Ihara</reporter>
                        <labels>
                    </labels>
                <created>Sun, 20 Sep 2020 07:57:49 +0000</created>
                <updated>Thu, 29 Oct 2020 11:49:47 +0000</updated>
                                            <version>Lustre 2.14.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="280089" author="qian_wc" created="Sun, 20 Sep 2020 14:24:32 +0000"  >&lt;p&gt;Hi Ihara, &lt;/p&gt;

&lt;p&gt;Could you please first preallocate all space via fallocate?&lt;br/&gt;
i.e&lt;br/&gt;
fio with fallocate,&lt;br/&gt;
or use the command &apos;fallocate -l &apos; to preallocate all needed space,&lt;br/&gt;
and then do the fio testing?&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Qian&lt;/p&gt;</comment>
                            <comment id="280090" author="sihara" created="Mon, 21 Sep 2020 00:47:31 +0000"  >&lt;p&gt;Yingjin, I also thought fallocate might help and tried fallocate with fio (NOTE, fio use fallocate if filesystem supports it) after patch &lt;a href=&quot;https://review.whamcloud.com/#/c/39342/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/#/c/39342/&lt;/a&gt; applied, but it was same problem and fallocate didn&apos;t help neither. btw, overwriting files helped. e.g. create 128GB files and allocate all blocks first then randomwrite on them.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="280092" author="qian_wc" created="Mon, 21 Sep 2020 02:09:57 +0000"  >&lt;p&gt;Hi Ihara,&lt;/p&gt;

&lt;p&gt;I may find the reason, it should be a problem of fallocate for direct IO (not for buffered IO).&lt;/p&gt;

&lt;p&gt;Will make a revised patch soon.&lt;/p&gt;

&lt;p&gt;Regards,&lt;br/&gt;
Qian&lt;/p&gt;</comment>
                            <comment id="280093" author="qian_wc" created="Mon, 21 Sep 2020 02:16:16 +0000"  >&lt;p&gt;Please try the updated fallocate patch:&lt;br/&gt;
 &lt;a href=&quot;https://review.whamcloud.com/39342&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/39342&lt;/a&gt; &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13765&quot; title=&quot;ldiskfs_mb_mark_diskspace_used:3472: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13765&quot;&gt;&lt;del&gt;LU-13765&lt;/del&gt;&lt;/a&gt; osd-ldiskfs: Extend credit correctly for fallocate&lt;/p&gt;

&lt;p&gt;It jus modified one line:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
diff --git a/lustre/osd-ldiskfs/osd_io.c b/lustre/osd-ldiskfs/osd_io.c
index 462a462cc9..689471e8a3 100644
--- a/lustre/osd-ldiskfs/osd_io.c
+++ b/lustre/osd-ldiskfs/osd_io.c
@@ -2009,7 +2009,7 @@ &lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; osd_fallocate(&lt;span class=&quot;code-keyword&quot;&gt;const&lt;/span&gt; struct lu_env *env, struct dt_object *dt,
                        &lt;span class=&quot;code-keyword&quot;&gt;break&lt;/span&gt;;
 
                rc = ldiskfs_map_blocks(handle, inode, &amp;amp;map,
-                                       LDISKFS_GET_BLOCKS_CREATE_UNWRIT_EXT);
+                                       LDISKFS_GET_BLOCKS_CREATE);
                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (rc &amp;lt;= 0) {
                        CDEBUG(D_INODE, &lt;span class=&quot;code-quote&quot;&gt;&quot;inode #%lu: block %u: len %u: &quot;&lt;/span&gt;
                               &lt;span class=&quot;code-quote&quot;&gt;&quot;ldiskfs_map_blocks returned %d\n&quot;&lt;/span&gt;,

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Regards,&lt;br/&gt;
Qian&lt;/p&gt;</comment>
                            <comment id="280094" author="qian_wc" created="Mon, 21 Sep 2020 02:20:33 +0000"  >&lt;p&gt;Btw, could you please measure the fallocate performance with/without the updated patches?&lt;/p&gt;

&lt;p&gt;i.e.&lt;/p&gt;

&lt;p&gt;time fallocate -l 128G test1&lt;br/&gt;
time fallocate -l 256G test2&lt;br/&gt;
I just want to known whether it will affect the fallocate using time.&lt;/p&gt;

&lt;p&gt;thanks,&lt;br/&gt;
Qian&lt;/p&gt;</comment>
                            <comment id="280097" author="sihara" created="Mon, 21 Sep 2020 06:09:30 +0000"  >&lt;p&gt;In fact, it seems that fallocate is not working in both patch (patchset6 and patchet7) properly..&lt;/p&gt;

&lt;p&gt;patchset 6&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@ec01 ~]# time  fallocate -l 128g /ai400x/test1

real	0m0.004s
user	0m0.001s
sys	0m0.000s
[root@ec01 ~]# ls -l /ai400x/test1 
-rw-r--r-- 1 root root 0 Sep 21 14:47 /ai400x/test1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;patchset 7&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@ec01 ~]# time  fallocate -l 128g /ai400x/test1

real	0m0.003s
user	0m0.001s
sys	0m0.000s
[root@ec01 ~]# ls -l /ai400x/test1 
-rw-r--r-- 1 root root 0 Sep 21 15:06 /ai400x/test1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="280106" author="qian_wc" created="Mon, 21 Sep 2020 08:42:17 +0000"  >&lt;p&gt;Just fixed the problem:&lt;br/&gt;
With LDISKFS_GET_BLOCKS_CREATE:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
[root@qvm1 tests]# time fallocate -l 5G /mnt/lustre/test

real	0m0.220s
user	0m0.002s
sys	0m0.003s
[root@qvm1 tests]# stat /mnt/lustre/test
  File: /mnt/lustre/test
  Size: 5368709120	Blocks: 10485768   IO Block: 4194304 regular file
[root@qvm1 tests]# time fallocate -l 1G /mnt/lustre/test

real	0m0.175s
user	0m0.002s
sys	0m0.003s
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;With LDISKFS_GET_BLOCKS_CREATE_UNWRIT_EXT:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
[root@qvm1 tests]# time fallocate -l 5G /mnt/lustre/test

real	0m0.268s
user	0m0.002s
sys	0m0.005s
[root@qvm1 tests]# stat /mnt/lustre/test
  File: /mnt/lustre/test
  Size: 5368709120	Blocks: 10485768   IO Block: 4194304 regular file
Device: 2c54f966h/743766374d	Inode: 144115205272502273  Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Context: unconfined_u:object_r:unlabeled_t:s0
Access: 2020-09-21 16:36:57.000000000 +0800
Modify: 2020-09-21 16:36:57.000000000 +0800
Change: 2020-09-21 16:36:57.000000000 +0800
 Birth: -

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Please try the updated patch again.&lt;/p&gt;

&lt;p&gt;BTW, could you please also try with large allocation by using EXT4 allocation flags:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
[root@qvm1 lustre-release]# git diff
diff --git a/lustre/osd-ldiskfs/osd_io.c b/lustre/osd-ldiskfs/osd_io.c
index 7897fd4082..233ea54c6f 100644
--- a/lustre/osd-ldiskfs/osd_io.c
+++ b/lustre/osd-ldiskfs/osd_io.c
@@ -1983,7 +1983,7 @@ &lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; osd_fallocate(&lt;span class=&quot;code-keyword&quot;&gt;const&lt;/span&gt; struct lu_env *env, struct dt_object *dt,
        boff = start &amp;gt;&amp;gt; inode-&amp;gt;i_blkbits;
        blen = (ALIGN(end, 1 &amp;lt;&amp;lt; inode-&amp;gt;i_blkbits) &amp;gt;&amp;gt; inode-&amp;gt;i_blkbits) - boff;
 
-       flags = LDISKFS_GET_BLOCKS_CREATE;
+       flags = LDISKFS_GET_BLOCKS_CREATE_UNWRIT_EXT;
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (mode &amp;amp; FALLOC_FL_KEEP_SIZE)
                flags |= LDISKFS_GET_BLOCKS_KEEP_SIZE;

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and measure the allocation time and the performance you did via fio again?&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Qian&lt;/p&gt;</comment>
                            <comment id="280115" author="sihara" created="Mon, 21 Sep 2020 10:44:43 +0000"  >&lt;p&gt;Hi Yingjin,&lt;br/&gt;
 yup, I&apos;ve also confirmed the latest patch (patchset 8 of &lt;a href=&quot;https://review.whamcloud.com/39342&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/39342&lt;/a&gt;) solved problem.&lt;br/&gt;
 I was back to original problem of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13973&quot; title=&quot;4K random write performance impacts on large sparse files&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13973&quot;&gt;LU-13973&lt;/a&gt; and re-tested again, it also solved problem. fallocate works with O_DIRECT well for now.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# cat hostlist
ec01
ec02
# SIZE=1g /work/ihara/fio.git/fio --client=hostlist randomwrite.fio
  write: IOPS=37.4k, BW=146Mi (153M)(8761MiB/60004msec); 0 zone resets

# SIZE=128g /work/ihara/fio.git/fio --client=hostlist randomwrite.fio
  write: IOPS=38.1k, BW=149Mi (156M)(8921MiB/60007msec); 0 zone resets
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="59894">LU-13765</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="36074" name="128g-4krandomwrite.svg" size="445848" author="sihara" created="Sun, 20 Sep 2020 07:57:44 +0000"/>
                            <attachment id="36075" name="1g-4krandomwrite.svg" size="605968" author="sihara" created="Sun, 20 Sep 2020 07:57:44 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i01ab3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>