<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:04:04 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-13771] llite stats is inconsistent with osc stats</title>
                <link>https://jira.whamcloud.com/browse/LU-13771</link>
                <project id="10000" key="LU">Lustre</project>
                    <description></description>
                <environment></environment>
        <key id="59904">LU-13771</key>
            <summary>llite stats is inconsistent with osc stats</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="6" iconUrl="https://jira.whamcloud.com/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="ys">Yang Sheng</assignee>
                                    <reporter username="yliping">Yang Liping</reporter>
                        <labels>
                    </labels>
                <created>Thu, 9 Jul 2020 09:30:12 +0000</created>
                <updated>Mon, 10 Aug 2020 06:04:29 +0000</updated>
                            <resolved>Mon, 10 Aug 2020 06:04:28 +0000</resolved>
                                    <version>Lustre 2.10.6</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                            <comments>
                            <comment id="274854" author="yliping" created="Thu, 9 Jul 2020 09:42:24 +0000"  >&lt;p&gt;Client: &lt;br/&gt;
 &#160; OS: CentOS Linux release 7.6.1810 (Core), 3.10.0-957.el7.x86_64, &lt;br/&gt;
 &#160; Lustre: 2.10.6_1_g88b8798&lt;/p&gt;

&lt;p&gt;Server: &lt;br/&gt;
 &#160; OS: CentOS Linux release 7.3.1611 (Core), 3.10.0-957.el7_lustre.x86_64,&lt;br/&gt;
 &#160; Lustre: 2.10.6_1_g88b8798&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;While debugging Lustre server overloaded with 4k read problem. I found that the llite stats is inconsistent with osc stats. Is it a bug? Or something I misunderstand?&lt;/p&gt;

&lt;p&gt;From the output below, we can see that there are only 2 samples in llite stats. However, the sum of all osc stats is 79,120.&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
[root@client ~]# grep read_bytes /proc/fs/lustre/llite/zfsz3-*/stats; /tmp/osc_reads.sh zfsz3; echo; sleep 10; grep read_bytes /proc/fs/lustre/llite/zfsz3-*/stats; /tmp/osc_reads.sh zfsz3
read_bytes                45564442648 samples [bytes] 1 4194304 123838172598382
4540200040

read_bytes                45564442650 samples [bytes] 1 4194304 123838172606674
4540279160

[root@client ~]# cat /tmp/osc_reads.sh
#!/bin/bash

&lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; [ $# == 1 ]; then
    grep read_bytes /proc/fs/lustre/osc/$1-OST*-osc-*/stats | awk &lt;span class=&quot;code-quote&quot;&gt;&apos;{sum+=$2} END {print sum}&apos;&lt;/span&gt;
&lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt;
    echo &lt;span class=&quot;code-quote&quot;&gt;&quot;Usage: $0 lustre_fs_name&quot;&lt;/span&gt;
fi
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The client cause lots of stress in the server. In the obdfilter exports stats, I can see lots of read operation from this client. In the client, I use strace the systemtap to trace the read syscall. But I can&apos;t find any evidence showing that the client is reading from Lustre aggressively.&lt;/p&gt;</comment>
                            <comment id="274858" author="jhammond" created="Thu, 9 Jul 2020 10:49:11 +0000"  >&lt;p&gt;llite &lt;tt&gt;read&amp;#95;bytes&lt;/tt&gt; counts bytes read by &lt;tt&gt;read()&lt;/tt&gt; and related system calls. osc &lt;tt&gt;read&amp;#95;bytes&lt;/tt&gt; counts bytes transferred by &lt;tt&gt;OST&amp;#95;READ&lt;/tt&gt; RPCs. Due to things like caching or read ahead it may be that either exceeds the other.&lt;/p&gt;</comment>
                            <comment id="274875" author="yliping" created="Thu, 9 Jul 2020 14:18:49 +0000"  >&lt;p&gt;Does osc read_bytes count bytes transferred by OST_READ RPC? Or does it count numbers of OST_READ RPC? Two read() syscall trigger&#160;79,120 bytes of osc read_bytes is reasonable.&#160; However, two read() syscall trigger 79,120 osc read_bytes is not.&lt;/p&gt;</comment>
                            <comment id="274878" author="jhammond" created="Thu, 9 Jul 2020 14:33:22 +0000"  >&lt;p&gt;osc read_bytes counts bytes:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;read_bytes                45564442648 samples [bytes] 1 4194304 123838172598382
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;45564442648 is the number of samples (number of RPCs).&lt;br/&gt;
1 is the min number of bytes over those samples.&lt;br/&gt;
4194304 is the max number of bytes over those samples.&lt;br/&gt;
123838172598382 is the sum of bytes over those samples.&lt;/p&gt;</comment>
                            <comment id="274881" author="yliping" created="Thu, 9 Jul 2020 14:53:57 +0000"  >&lt;p&gt;As you can see in my /tmp/osc_reads.sh script, it counts the number of samples. So is it OK that 2 read() syscall trigger 79,120 OST_READ RPC? It sounds weird to me.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-bash&quot;&gt;
&lt;span class=&quot;code-comment&quot;&gt;# grep read_bytes /proc/fs/lustre/osc/zfsz3-OST006f-osc-ffff9e86b7e7c000/stats
&lt;/span&gt;read_bytes                32641830 samples [bytes] 0 1048576 732351718180 601737760789691752

&lt;span class=&quot;code-comment&quot;&gt;# grep read_bytes /proc/fs/lustre/osc/zfsz3-OST006f-osc-ffff9e86b7e7c000/stats | awk &lt;span class=&quot;code-quote-red&quot;&gt;&apos;{sum+=&lt;span class=&quot;code-object&quot;&gt;$2&lt;/span&gt;} END {print sum}&apos;&lt;/span&gt;
&lt;/span&gt;32641830
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;


&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="274894" author="yliping" created="Thu, 9 Jul 2020 16:14:22 +0000"  >&lt;p&gt;Client debug log which with rpctrace and vfstrace enabled uploaded.&#160;&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/35359/35359_lustre-log_online2.1594310743.tar.gz&quot; title=&quot;lustre-log_online2.1594310743.tar.gz attached to LU-13771&quot;&gt;lustre-log_online2.1594310743.tar.gz&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;</comment>
                            <comment id="274943" author="emoly.liu" created="Fri, 10 Jul 2020 04:56:29 +0000"  >&lt;p&gt;Yang liping requested to reopen this ticket to continue to ask question.&lt;/p&gt;</comment>
                            <comment id="274952" author="yliping" created="Fri, 10 Jul 2020 07:32:03 +0000"  >&lt;p&gt;Another client&apos;s full debug log uploaded.&#160;&lt;/p&gt;

&lt;p&gt;I&apos;m not an expert on Lustre code. However, from the log, I can see lots of&#160;write_end() operation. No&#160;readpage() operation at all. Then where are those RPC read requests from?&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
00000008:00000002:11.0:1594347136.965673:0:6419:0:(osc_request.c:1995:osc_build_rpc()) @@@ 1 pages, aa ffffa07d914ab470. now 7r/0w in flight  req@ffffa07d914ab300 x1639854597962928/t0(0) o3-&amp;gt;zfsz3-OST0058-osc-ffffa08c68aeb800@10.53.19.56@tcp:6/4 lens 608/432 e 0 to 0 dl 0 ref 2 fl New:/0/ffffffff rc 0/-1

00000100:00100000:11.0:1594347136.992957:0:6420:0:(client.c:2045:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ptlrpcd_05_01:5efc6dcc-0dc7-7bda-9ab4-0c6fe2da4f9e:6420:1639854597962928:10.53.19.56@tcp:3
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;There are only 5 write RPC. But there are hundreds of read RPC.&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-bash&quot;&gt;
&lt;span class=&quot;code-comment&quot;&gt;# grep &lt;span class=&quot;code-quote-red&quot;&gt;&apos;Sending RPC pname&apos;&lt;/span&gt; lustre-log.1594347137 | awk -F &lt;span class=&quot;code-quote-red&quot;&gt;&apos;:&apos;&lt;/span&gt; &lt;span class=&quot;code-quote-red&quot;&gt;&apos;&lt;span class=&quot;code-object&quot;&gt;$NF&lt;/span&gt; == 4&apos;&lt;/span&gt;
&lt;/span&gt;00000100:00100000:10.0:1594347136.938920:0:6420:0:(client.c:1620:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc ptlrpcd_05_01:5efc6dcc-0dc7-7bda-9ab4-0c6fe2da4f9e:6420:1639854597962240:10.53.19.56@tcp:4 00000100:00100000:13.0:1594347137.176978:0:6421:0:(client.c:1620:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc ptlrpcd_06_00:5efc6dcc-0dc7-7bda-9ab4-0c6fe2da4f9e:6421:1639854597967232:10.53.19.56@tcp:4 00000100:00100000:0.0:1594347137.796053:0:6400:0:(client.c:1620:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc ptlrpcd_00_00:5efc6dcc-0dc7-7bda-9ab4-0c6fe2da4f9e:6400:1639854597982576:10.53.19.56@tcp:4 00000100:00100000:7.0:1594347137.849694:0:6406:0:(client.c:1620:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc ptlrpcd_03_00:5efc6dcc-0dc7-7bda-9ab4-0c6fe2da4f9e:6406:1639854597984144:10.53.19.56@tcp:4 00000100:00100000:3.0:1594347137.868957:0:6403:0:(client.c:1620:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc ptlrpcd_01_01:5efc6dcc-0dc7-7bda-9ab4-0c6fe2da4f9e:6403:1639854597984544:10.53.19.56@tcp:4

&lt;span class=&quot;code-comment&quot;&gt;# grep &lt;span class=&quot;code-quote-red&quot;&gt;&apos;Sending RPC pname&apos;&lt;/span&gt; lustre-log.1594347137 | awk -F &lt;span class=&quot;code-quote-red&quot;&gt;&apos;:&apos;&lt;/span&gt; &lt;span class=&quot;code-quote-red&quot;&gt;&apos;&lt;span class=&quot;code-object&quot;&gt;$NF&lt;/span&gt; == 4&apos;&lt;/span&gt; | wc -l
&lt;/span&gt;5

&lt;span class=&quot;code-comment&quot;&gt;# grep &lt;span class=&quot;code-quote-red&quot;&gt;&apos;Sending RPC pname&apos;&lt;/span&gt; lustre-log.1594347137 | awk -F &lt;span class=&quot;code-quote-red&quot;&gt;&apos;:&apos;&lt;/span&gt; &lt;span class=&quot;code-quote-red&quot;&gt;&apos;&lt;span class=&quot;code-object&quot;&gt;$NF&lt;/span&gt; == 3&apos;&lt;/span&gt; | wc -l
&lt;/span&gt;314
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="274974" author="jhammond" created="Fri, 10 Jul 2020 14:43:40 +0000"  >&lt;p&gt;Could be memory mapped files or another application.&lt;/p&gt;</comment>
                            <comment id="275011" author="pjones" created="Fri, 10 Jul 2020 17:09:00 +0000"  >&lt;p&gt;Yang Sheng&lt;/p&gt;

&lt;p&gt;Could you please investigate?&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="275076" author="yliping" created="Sat, 11 Jul 2020 05:03:00 +0000"  >&lt;p&gt;I&apos;ve search&#160;&lt;em&gt;/proc/&lt;span class=&quot;error&quot;&gt;&amp;#91;pid&amp;#93;&lt;/span&gt;/maps&lt;/em&gt;&#160;in the client for process mapping files in the Lustre file system. I also tried &lt;em&gt;lsof&lt;/em&gt; and &lt;em&gt;pmap&lt;/em&gt; tools. However, nothing interesting found.&#160;&lt;/p&gt;

&lt;p&gt;What should I do next? How could I know which process triggers those read RPC?&lt;/p&gt;</comment>
                            <comment id="275087" author="ys" created="Sat, 11 Jul 2020 05:56:03 +0000"  >&lt;p&gt;Hi, LiPing,&lt;/p&gt;

&lt;p&gt;I&apos;ll contact you for your question.&lt;/p&gt;

&lt;p&gt;Thanks.&lt;br/&gt;
YangSheng&lt;/p&gt;</comment>
                            <comment id="275118" author="ys" created="Sat, 11 Jul 2020 08:55:57 +0000"  >&lt;p&gt;Hi, LiPing,&lt;/p&gt;


&lt;p&gt;I think the read rpc was caused by partial write, form log:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000080:00008000:14.0:1594347134.173014:0:24100:0:(rw26.c:629:ll_write_begin()) Writing 15875068 of 2110 to 1986 bytes
00000080:00008000:14.0:1594347134.176921:0:24100:0:(rw26.c:629:ll_write_begin()) Writing 15875070 of 2110 to 1986 bytes
00000080:00008000:14.0:1594347134.181230:0:24100:0:(rw26.c:629:ll_write_begin()) Writing 15875072 of 2110 to 1986 bytes
00000080:00008000:14.0:1594347134.185397:0:24100:0:(rw26.c:629:ll_write_begin()) Writing 15875074 of 2110 to 1986 bytes
00000080:00008000:14.0:1594347135.702578:0:24097:0:(rw26.c:629:ll_write_begin()) Writing 16771332 of 1174 to 2922 bytes
00000080:00008000:15.0:1594347135.704168:0:24099:0:(rw26.c:629:ll_write_begin()) Writing 15652186 of 2344 to 1752 bytes
00000080:00008000:15.0:1594347135.707017:0:24099:0:(rw26.c:629:ll_write_begin()) Writing 15652188 of 2344 to 1752 bytes
00000080:00008000:14.0:1594347135.707694:0:24093:0:(rw26.c:629:ll_write_begin()) Writing 3161899 of 1288 to 2808 bytes
00000080:00008000:15.0:1594347135.709961:0:24099:0:(rw26.c:629:ll_write_begin()) Writing 15652190 of 2344 to 1752 bytes
00000080:00008000:15.0:1594347135.713100:0:24099:0:(rw26.c:629:ll_write_begin()) Writing 15652192 of 2344 to 1752 bytes
00000080:00008000:15.0:1594347135.715609:0:24099:0:(rw26.c:629:ll_write_begin()) Writing 15652194 of 2344 to 1752 bytes
00000080:00008000:15.0:1594347135.718384:0:24099:0:(rw26.c:629:ll_write_begin()) Writing 15652196 of 2344 to 1752 bytes
00000080:00008000:15.0:1594347135.720595:0:24099:0:(rw26.c:629:ll_write_begin()) Writing 15652198 of 2344 to 1752 bytes
00000080:00008000:15.0:1594347135.723561:0:24099:0:(rw26.c:629:ll_write_begin()) Writing 15652200 of 2344 to 1752 bytes
00000080:00008000:15.0:1594347135.725620:0:24099:0:(rw26.c:629:ll_write_begin()) Writing 15652202 of 2344 to 1752 bytes
00000080:00008000:14.0:1594347135.727548:0:24093:0:(rw26.c:629:ll_write_begin()) Writing 3161901 of 1288 to 2808 bytes
00000080:00008000:15.0:1594347135.728399:0:24099:0:(rw26.c:629:ll_write_begin()) Writing 15652204 of 2344 to 1752 bytes
00000080:00008000:15.0:1594347135.730721:0:24099:0:(rw26.c:629:ll_write_begin()) Writing 15652206 of 2344 to 1752 bytes
00000080:00008000:15.0:1594347135.733274:0:24099:0:(rw26.c:629:ll_write_begin()) Writing 15652208 of 2344 to 1752 bytes
00000080:00008000:15.0:1594347135.735530:0:24099:0:(rw26.c:629:ll_write_begin()) Writing 15652210 of 2344 to 1752 bytes
00000080:00008000:15.0:1594347135.738171:0:24099:0:(rw26.c:629:ll_write_begin()) Writing 15652212 of 2344 to 1752 bytes
00000080:00008000:15.0:1594347135.740527:0:24099:0:(rw26.c:629:ll_write_begin()) Writing 15652214 of 2344 to 1752 bytes
00000080:00008000:15.0:1594347135.743197:0:24099:0:(rw26.c:629:ll_write_begin()) Writing 15652216 of 2344 to 1752 bytes
00000080:00008000:14.0:1594347135.746249:0:24099:0:(rw26.c:629:ll_write_begin()) Writing 15652218 of 2344 to 1752 bytes

   FORMAT: Writing &amp;lt;page index&amp;gt; of &amp;lt;from&amp;gt; to &amp;lt;length&amp;gt; bytes

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;For partial write, we have to read page first and then write it out.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
YangSheng&lt;/p&gt;</comment>
                            <comment id="275162" author="ys" created="Sat, 11 Jul 2020 15:36:51 +0000"  >&lt;p&gt;From log:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000080:00008000:7.0:1594347137.845609:0:24101:0:(rw26.c:629:ll_write_begin()) Writing 16138033 of 0 to 4096 bytes
00000080:00008000:7.0:1594347137.845678:0:24101:0:(rw26.c:629:ll_write_begin()) Writing 16138034 of 0 to 1876 bytes
00000020:00000001:7.0:1594347137.845715:0:24101:0:(cl_io.c:1225:cl_2queue_init_page()) Process entered
00000020:00000001:7.0:1594347137.845721:0:24101:0:(cl_io.c:1228:cl_2queue_init_page()) Process leaving
00000080:00008000:7.0:1594347137.849143:0:24101:0:(rw26.c:629:ll_write_begin()) Writing 16138034 of 1876 to 2220 bytes
00000080:00008000:7.0:1594347137.849247:0:24101:0:(rw26.c:629:ll_write_begin()) Writing 16138035 of 0 to 4096 bytes
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;We can figure out the write_begin was invoked with unalign page size. Even it writes full page in fact. This must relate to kernel rather than Lustre.&lt;/p&gt;
</comment>
                            <comment id="275306" author="yliping" created="Tue, 14 Jul 2020 01:36:12 +0000"  >&lt;p&gt;Thank you for YangSheng&apos;s help. It may not be Lustre&apos;s issue. I will try to dig deeper into function calls in the kernel. As I know, YangSheng is also working on this too. Thanks.&lt;/p&gt;</comment>
                            <comment id="275308" author="yliping" created="Tue, 14 Jul 2020 01:57:22 +0000"  >&lt;p&gt;Here&apos;s the call stack for &lt;em&gt;ll_write_begin()&lt;/em&gt;:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
 0xffffffffc0b17e80 : ll_write_begin+0x0/0x830 [lustre]
 0xffffffff83fb8652 : __generic_file_aio_write+0x1e2/0x400 [kernel]
 0xffffffffc0b26ffb : __generic_file_write_iter+0xcb/0x340 [lustre]
 0xffffffffc0b2aef6 : vvp_io_write_start+0x4a6/0x810 [lustre]
 0xffffffffc070eb88 : cl_io_start+0x68/0x130 [obdclass]
 0xffffffffc0710f6e : cl_io_loop+0x12e/0xc90 [obdclass]
 0xffffffffc0ad53c8 : ll_file_io_generic+0x498/0xc80 [lustre]
 0xffffffffc0ad5e8d : ll_file_aio_write+0x12d/0x1f0 [lustre]
 0xffffffffc0ad601e : ll_file_write+0xce/0x1e0 [lustre]
 0xffffffff840410a0 : vfs_write+0xc0/0x1f0 [kernel]
 0xffffffff84041ebf : sys_write+0x7f/0xf0 [kernel]
 0xffffffff84574ddb : system_call_fastpath+0x22/0x27 [kernel]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;And variables dump for __&lt;em&gt;generic_file_aio_write()&lt;/em&gt;:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
iocb={.ki_users={...}, .ki_filp=0xffff917134065500, .ki_ctx=0x0, .ki_cancel=0x0, .ki_dtor=0x0, .ki_obj={...}, .ki_user_data=0, .ki_pos=285453206744, .&lt;span class=&quot;code-keyword&quot;&gt;private&lt;/span&gt;=0x0, .ki_opcode=0, .ki_nbytes=0, .ki_buf=          (&lt;span class=&quot;code-keyword&quot;&gt;null&lt;/span&gt;), .ki_left=8192, .ki_inline_vec={...}, .ki_iovec=0x0, .ki_nr_segs=0, .ki_cur_seg=0, .ki_list={...}, .ki_eventfd=0x0} iov={.iov_base=0x7fe118033a84, .iov_len=8192} nr_segs=1 ppos=-121559921193912
iocb={.ki_users={...}, .ki_filp=0xffff9172b1e3e800, .ki_ctx=0x0, .ki_cancel=0x0, .ki_dtor=0x0, .ki_obj={...}, .ki_user_data=0, .ki_pos=178328534, .&lt;span class=&quot;code-keyword&quot;&gt;private&lt;/span&gt;=0x0, .ki_opcode=0, .ki_nbytes=0, .ki_buf=          (&lt;span class=&quot;code-keyword&quot;&gt;null&lt;/span&gt;), .ki_left=8192, .ki_inline_vec={...}, .ki_iovec=0x0, .ki_nr_segs=0, .ki_cur_seg=0, .ki_list={...}, .ki_eventfd=0x0} iov={.iov_base=0x7fe670010a44, .iov_len=8192} nr_segs=1 ppos=-121560167639160
iocb={.ki_users={...}, .ki_filp=0xffff91759ccf0000, .ki_ctx=0x0, .ki_cancel=0x0, .ki_dtor=0x0, .ki_obj={...}, .ki_user_data=0, .ki_pos=1180926976, .&lt;span class=&quot;code-keyword&quot;&gt;private&lt;/span&gt;=0x0, .ki_opcode=0, .ki_nbytes=0, .ki_buf=          (&lt;span class=&quot;code-keyword&quot;&gt;null&lt;/span&gt;), .ki_left=8192, .ki_inline_vec={...}, .ki_iovec=0x0, .ki_nr_segs=0, .ki_cur_seg=0, .ki_list={...}, .ki_eventfd=0x0} iov={.iov_base=0x7fe66c057944, .iov_len=8192} nr_segs=1 ppos=-121560012130552
iocb={.ki_users={...}, .ki_filp=0xffff91725e938300, .ki_ctx=0x0, .ki_cancel=0x0, .ki_dtor=0x0, .ki_obj={...}, .ki_user_data=0, .ki_pos=1646606336, .&lt;span class=&quot;code-keyword&quot;&gt;private&lt;/span&gt;=0x0, .ki_opcode=0, .ki_nbytes=0, .ki_buf=          (&lt;span class=&quot;code-keyword&quot;&gt;null&lt;/span&gt;), .ki_left=8192, .ki_inline_vec={...}, .ki_iovec=0x0, .ki_nr_segs=0, .ki_cur_seg=0, .ki_list={...}, .ki_eventfd=0x0} iov={.iov_base=0x7fe65000c844, .iov_len=8192} nr_segs=1 ppos=-121558346451768
iocb={.ki_users={...}, .ki_filp=0xffff917227dfdb00, .ki_ctx=0x0, .ki_cancel=0x0, .ki_dtor=0x0, .ki_obj={...}, .ki_user_data=0, .ki_pos=5931597062, .&lt;span class=&quot;code-keyword&quot;&gt;private&lt;/span&gt;=0x0, .ki_opcode=0, .ki_nbytes=0, .ki_buf=          (&lt;span class=&quot;code-keyword&quot;&gt;null&lt;/span&gt;), .ki_left=8192, .ki_inline_vec={...}, .ki_iovec=0x0, .ki_nr_segs=0, .ki_cur_seg=0, .ki_list={...}, .ki_eventfd=0x0} iov={.iov_base=0x7fe4fc012484, .iov_len=8192} nr_segs=1 ppos=-121559394141880
iocb={.ki_users={...}, .ki_filp=0xffff917477840200, .ki_ctx=0x0, .ki_cancel=0x0, .ki_dtor=0x0, .ki_obj={...}, .ki_user_data=0, .ki_pos=1804162048, .&lt;span class=&quot;code-keyword&quot;&gt;private&lt;/span&gt;=0x0, .ki_opcode=0, .ki_nbytes=0, .ki_buf=          (&lt;span class=&quot;code-keyword&quot;&gt;null&lt;/span&gt;), .ki_left=8192, .ki_inline_vec={...}, .ki_iovec=0x0, .ki_nr_segs=0, .ki_cur_seg=0, .ki_list={...}, .ki_eventfd=0x0} iov={.iov_base=0x7fe6909c6000, .iov_len=8192} nr_segs=1 ppos=-121559475956984
iocb={.ki_users={...}, .ki_filp=0xffff91789589a600, .ki_ctx=0x0, .ki_cancel=0x0, .ki_dtor=0x0, .ki_obj={...}, .ki_user_data=0, .ki_pos=3343207424, .&lt;span class=&quot;code-keyword&quot;&gt;private&lt;/span&gt;=0x0, .ki_opcode=0, .ki_nbytes=0, .ki_buf=          (&lt;span class=&quot;code-keyword&quot;&gt;null&lt;/span&gt;), .ki_left=8192, .ki_inline_vec={...}, .ki_iovec=0x0, .ki_nr_segs=0, .ki_cur_seg=0, .ki_list={...}, .ki_eventfd=0x0} iov={.iov_base=0x7fe654011594, .iov_len=8192} nr_segs=1 ppos=-121559922230968
iocb={.ki_users={...}, .ki_filp=0xffff9175e8345600, .ki_ctx=0x0, .ki_cancel=0x0, .ki_dtor=0x0, .ki_obj={...}, .ki_user_data=0, .ki_pos=304232425736, .&lt;span class=&quot;code-keyword&quot;&gt;private&lt;/span&gt;=0x0, .ki_opcode=0, .ki_nbytes=0, .ki_buf=          (&lt;span class=&quot;code-keyword&quot;&gt;null&lt;/span&gt;), .ki_left=8192, .ki_inline_vec={...}, .ki_iovec=0x0, .ki_nr_segs=0, .ki_cur_seg=0, .ki_list={...}, .ki_eventfd=0x0} iov={.iov_base=0x7fe0ec02d104, .iov_len=8192} nr_segs=1 ppos=-121552870644472
iocb={.ki_users={...}, .ki_filp=0xffff917109388e00, .ki_ctx=0x0, .ki_cancel=0x0, .ki_dtor=0x0, .ki_obj={...}, .ki_user_data=0, .ki_pos=593866584, .&lt;span class=&quot;code-keyword&quot;&gt;private&lt;/span&gt;=0x0, .ki_opcode=0, .ki_nbytes=0, .ki_buf=          (&lt;span class=&quot;code-keyword&quot;&gt;null&lt;/span&gt;), .ki_left=8192, .ki_inline_vec={...}, .ki_iovec=0x0, .ki_nr_segs=0, .ki_cur_seg=0, .ki_list={...}, .ki_eventfd=0x0} iov={.iov_base=0x7fe654050fb4, .iov_len=8192} nr_segs=1 ppos=-121559630369848
iocb={.ki_users={...}, .ki_filp=0xffff9171b84be500, .ki_ctx=0x0, .ki_cancel=0x0, .ki_dtor=0x0, .ki_obj={...}, .ki_user_data=0, .ki_pos=8213792930, .&lt;span class=&quot;code-keyword&quot;&gt;private&lt;/span&gt;=0x0, .ki_opcode=0, .ki_nbytes=0, .ki_buf=          (&lt;span class=&quot;code-keyword&quot;&gt;null&lt;/span&gt;), .ki_left=8192, .ki_inline_vec={...}, .ki_iovec=0x0, .ki_nr_segs=0, .ki_cur_seg=0, .ki_list={...}, .ki_eventfd=0x0} iov={.iov_base=0x7fe4f00130c4, .iov_len=8192} nr_segs=1 ppos=-121559395743800
iocb={.ki_users={...}, .ki_filp=0xffff91724177f900, .ki_ctx=0x0, .ki_cancel=0x0, .ki_dtor=0x0, .ki_obj={...}, .ki_user_data=0, .ki_pos=869166080, .&lt;span class=&quot;code-keyword&quot;&gt;private&lt;/span&gt;=0x0, .ki_opcode=0, .ki_nbytes=0, .ki_buf=          (&lt;span class=&quot;code-keyword&quot;&gt;null&lt;/span&gt;), .ki_left=8192, .ki_inline_vec={...}, .ki_iovec=0x0, .ki_nr_segs=0, .ki_cur_seg=0, .ki_list={...}, .ki_eventfd=0x0} iov={.iov_base=0x7fe6909c0000, .iov_len=8192} nr_segs=1 ppos=-121559394935800
iocb={.ki_users={...}, .ki_filp=0xffff91749a6c9200, .ki_ctx=0x0, .ki_cancel=0x0, .ki_dtor=0x0, .ki_obj={...}, .ki_user_data=0, .ki_pos=3654809342, .&lt;span class=&quot;code-keyword&quot;&gt;private&lt;/span&gt;=0x0, .ki_opcode=0, .ki_nbytes=0, .ki_buf=          (&lt;span class=&quot;code-keyword&quot;&gt;null&lt;/span&gt;), .ki_left=8192, .ki_inline_vec={...}, .ki_iovec=0x0, .ki_nr_segs=0, .ki_cur_seg=0, .ki_list={...}, .ki_eventfd=0x0} iov={.iov_base=0x7fd42a70c000, .iov_len=8192} nr_segs=1 ppos=-121559394677688
iocb={.ki_users={...}, .ki_filp=0xffff917227dfd000, .ki_ctx=0x0, .ki_cancel=0x0, .ki_dtor=0x0, .ki_obj={...}, .ki_user_data=0, .ki_pos=4789573432, .&lt;span class=&quot;code-keyword&quot;&gt;private&lt;/span&gt;=0x0, .ki_opcode=0, .ki_nbytes=0, .ki_buf=          (&lt;span class=&quot;code-keyword&quot;&gt;null&lt;/span&gt;), .ki_left=8192, .ki_inline_vec={...}, .ki_iovec=0x0, .ki_nr_segs=0, .ki_
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="275315" author="yliping" created="Tue, 14 Jul 2020 05:13:28 +0000"  >&lt;p&gt;Here&apos;s some debug info that may be helpful to this issue.&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-c&quot;&gt;
kernel.statement(&lt;span class=&quot;code-quote-red&quot;&gt;&quot;generic_perform_write@mm/filemap.c:2985&quot;&lt;/span&gt;), inode: 144116736864585972, page: 0xffffffffc0ba5ecb, offset: 3624, bytes:  472, iov_iter_count: 8192, pos:  51715628584, written: 0
kernel.statement(&lt;span class=&quot;code-quote-red&quot;&gt;&quot;generic_perform_write@mm/filemap.c:2985&quot;&lt;/span&gt;), inode: 144116736864585972, page: 0xffffe268128d8c40, offset:    0, bytes: 4096, iov_iter_count: 7720, pos:  51715629056, written: 472
kernel.statement(&lt;span class=&quot;code-quote-red&quot;&gt;&quot;generic_perform_write@mm/filemap.c:2985&quot;&lt;/span&gt;), inode: 144116736864585972, page: 0xffffe268128d8cc0, offset:    0, bytes: 3624, iov_iter_count: 3624, pos:  51715633152, written: 4568
kernel.statement(&lt;span class=&quot;code-quote-red&quot;&gt;&quot;generic_perform_write@mm/filemap.c:2985&quot;&lt;/span&gt;), inode: 144116736864585972, page: 0xffffffffc0ba5ecb, offset:  262, bytes: 3834, iov_iter_count: 8192, pos:  53041811718, written: 0
kernel.statement(&lt;span class=&quot;code-quote-red&quot;&gt;&quot;generic_perform_write@mm/filemap.c:2985&quot;&lt;/span&gt;), inode: 144116736864585972, page: 0xffffe26837f93a00, offset:    0, bytes: 4096, iov_iter_count: 4358, pos:  53041815552, written: 3834
kernel.statement(&lt;span class=&quot;code-quote-red&quot;&gt;&quot;generic_perform_write@mm/filemap.c:2985&quot;&lt;/span&gt;), inode: 144116736864585972, page: 0xffffe26831375ec0, offset:    0, bytes:  262, iov_iter_count: 262, pos:  53041819648, written: 7930
kernel.statement(&lt;span class=&quot;code-quote-red&quot;&gt;&quot;generic_perform_write@mm/filemap.c:2985&quot;&lt;/span&gt;), inode: 144116736864585972, page: 0xffffffffc0ba5ecb, offset:  570, bytes: 3526, iov_iter_count: 8192, pos:  59417190970, written: 0
kernel.statement(&lt;span class=&quot;code-quote-red&quot;&gt;&quot;generic_perform_write@mm/filemap.c:2985&quot;&lt;/span&gt;), inode: 144116736864585972, page: 0xffffe26806891140, offset:    0, bytes: 4096, iov_iter_count: 4666, pos:  59417194496, written: 3526
kernel.statement(&lt;span class=&quot;code-quote-red&quot;&gt;&quot;generic_perform_write@mm/filemap.c:2985&quot;&lt;/span&gt;), inode: 144116736864585972, page: 0xffffe26818533e00, offset:    0, bytes:  570, iov_iter_count: 570, pos:  59417198592, written: 7622
kernel.statement(&lt;span class=&quot;code-quote-red&quot;&gt;&quot;generic_perform_write@mm/filemap.c:2985&quot;&lt;/span&gt;), inode: 144116736864585972, page: 0xffffffffc0ba5ecb, offset: 3198, bytes:  898, iov_iter_count: 8192, pos:  57102630014, written: 0
kernel.statement(&lt;span class=&quot;code-quote-red&quot;&gt;&quot;generic_perform_write@mm/filemap.c:2985&quot;&lt;/span&gt;), inode: 144116736864585972, page: 0xffffe26824b6eec0, offset:    0, bytes: 4096, iov_iter_count: 7294, pos:  57102630912, written: 898
kernel.statement(&lt;span class=&quot;code-quote-red&quot;&gt;&quot;generic_perform_write@mm/filemap.c:2985&quot;&lt;/span&gt;), inode: 144116736864585972, page: 0xffffe268276ef040, offset:    0, bytes: 3198, iov_iter_count: 3198, pos:  57102635008, written: 4994
kernel.statement(&lt;span class=&quot;code-quote-red&quot;&gt;&quot;generic_perform_write@mm/filemap.c:2985&quot;&lt;/span&gt;), inode: 144116736864585972, page: 0xffffffffc0ba5ecb, offset: 3932, bytes:  164, iov_iter_count: 8192, pos:  58240249692, written: 0
kernel.statement(&lt;span class=&quot;code-quote-red&quot;&gt;&quot;generic_perform_write@mm/filemap.c:2985&quot;&lt;/span&gt;), inode: 144116736864585972, page: 0xffffe26819046f00, offset:    0, bytes: 4096, iov_iter_count: 8028, pos:  58240249856, written: 164
kernel.statement(&lt;span class=&quot;code-quote-red&quot;&gt;&quot;generic_perform_write@mm/filemap.c:2985&quot;&lt;/span&gt;), inode: 144116736864585972, page: 0xffffe268177c3380, offset:    0, bytes: 3932, iov_iter_count: 3932, pos:  58240253952, written: 4260
kernel.statement(&lt;span class=&quot;code-quote-red&quot;&gt;&quot;generic_perform_write@mm/filemap.c:2985&quot;&lt;/span&gt;), inode: 144116736864585972, page: 0xffffffffc0ba5ecb, offset: 1730, bytes: 2366, iov_iter_count: 8192, pos:  54806689474, written: 0
kernel.statement(&lt;span class=&quot;code-quote-red&quot;&gt;&quot;generic_perform_write@mm/filemap.c:2985&quot;&lt;/span&gt;), inode: 144116736864585972, page: 0xffffe2682c8abe00, offset:    0, bytes: 4096, iov_iter_count: 5826, pos:  54806691840, written: 2366
kernel.statement(&lt;span class=&quot;code-quote-red&quot;&gt;&quot;generic_perform_write@mm/filemap.c:2985&quot;&lt;/span&gt;), inode: 144116736864585972, page: 0xffffe268106249c0, offset:    0, bytes: 1730, iov_iter_count: 1730, pos:  54806695936, written: 6462
kernel.statement(&lt;span class=&quot;code-quote-red&quot;&gt;&quot;generic_perform_write@mm/filemap.c:2985&quot;&lt;/span&gt;), inode: 144116736864585972, page: 0xffffffffc0ba5ecb, offset: 2464, bytes: 1632, iov_iter_count: 8192, pos:  55943039392, written: 0
kernel.statement(&lt;span class=&quot;code-quote-red&quot;&gt;&quot;generic_perform_write@mm/filemap.c:2985&quot;&lt;/span&gt;), inode: 144116736864585972, page: 0xffffe26819766740, offset:    0, bytes: 4096, iov_iter_count: 6560, pos:  55943041024, written: 1632
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The code pasted below&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-c&quot;&gt;
2943 &lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; ssize_t generic_perform_write(&lt;span class=&quot;code-keyword&quot;&gt;struct&lt;/span&gt; file *file,
2944                 &lt;span class=&quot;code-keyword&quot;&gt;struct&lt;/span&gt; iov_iter *i, loff_t pos)

2958     &lt;span class=&quot;code-keyword&quot;&gt;do&lt;/span&gt; {
2959         &lt;span class=&quot;code-keyword&quot;&gt;struct&lt;/span&gt; page *page;
2960         &lt;span class=&quot;code-keyword&quot;&gt;&lt;span class=&quot;code-object&quot;&gt;unsigned&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;&lt;span class=&quot;code-object&quot;&gt;long&lt;/span&gt;&lt;/span&gt; offset;    &lt;span class=&quot;code-comment&quot;&gt;/* Offset into pagecache page */&lt;/span&gt;
2961         &lt;span class=&quot;code-keyword&quot;&gt;&lt;span class=&quot;code-object&quot;&gt;unsigned&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;&lt;span class=&quot;code-object&quot;&gt;long&lt;/span&gt;&lt;/span&gt; bytes;    &lt;span class=&quot;code-comment&quot;&gt;/* Bytes to write to page */&lt;/span&gt;
2962         size_t copied;      &lt;span class=&quot;code-comment&quot;&gt;/* Bytes copied from user */&lt;/span&gt;
2963         &lt;span class=&quot;code-keyword&quot;&gt;&lt;span class=&quot;code-object&quot;&gt;void&lt;/span&gt;&lt;/span&gt; *fsdata;
2964
2965         offset = (pos &amp;amp; (PAGE_CACHE_SIZE - 1));
2966         bytes = min_t(&lt;span class=&quot;code-keyword&quot;&gt;&lt;span class=&quot;code-object&quot;&gt;unsigned&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;&lt;span class=&quot;code-object&quot;&gt;long&lt;/span&gt;&lt;/span&gt;, PAGE_CACHE_SIZE - offset,
2967                                 iov_iter_count(i));

2985         status = a_ops-&amp;gt;write_begin(file, mapping, pos, bytes, flags,
2986                                     &amp;amp;page, &amp;amp;fsdata);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="275318" author="ys" created="Tue, 14 Jul 2020 06:13:07 +0000"  >&lt;p&gt;The _generic_file_write_iter part of really helpful. I&apos;ll look into it.&lt;/p&gt;</comment>
                            <comment id="275361" author="yliping" created="Tue, 14 Jul 2020 15:08:59 +0000"  >&lt;p&gt;After reading the source code, I believe that how many bytes &lt;em&gt;ll_write_begin()&lt;/em&gt; would write, only depends on &lt;em&gt;count&lt;/em&gt; value which specified in &lt;em&gt;write()&lt;/em&gt; syscall, and &lt;em&gt;offset&lt;/em&gt; in the file writing to.&lt;/p&gt;

&lt;p&gt;I tried to trace the running program and found out that, the program use multi-thread to write to different parts of the file. Each write&apos;s chunk size is 8k. The problem is the start of each part of the file is not strictly aligned with the 4k boundary. The first and the third write to page cache would not be a full page. It will repeat writing 2 partial writes and 1 full-page write for every 8k write. See the trace log attached for more information.&#160;&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/35414/35414_generic_perform_write.txt.tar.gz&quot; title=&quot;generic_perform_write.txt.tar.gz attached to LU-13771&quot;&gt;generic_perform_write.txt.tar.gz&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;Ideally, every program&apos;s write should align with the 4k boundary perfectly. However, I think it would be better if Lustre could eliminate the unnecessary read RPCs.&#160;&lt;/p&gt;</comment>
                            <comment id="275367" author="ys" created="Tue, 14 Jul 2020 15:40:19 +0000"  >&lt;p&gt;Hi, Liping, &lt;/p&gt;

&lt;p&gt;Could you please test whether no partial write if you write file from position of 4k align?&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
YangSheng &lt;/p&gt;</comment>
                            <comment id="275375" author="yliping" created="Tue, 14 Jul 2020 17:24:43 +0000"  >&lt;p&gt;The test is reproducible. Simple dd with 4k block size is OK. Here&apos;s the command I use:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-bash&quot;&gt;
lctl dk &amp;gt; /tmp/lu-13771/lustre-log.`date +%s`; lctl set_param debug=-1; lctl get_param debug; dd &lt;span class=&quot;code-object&quot;&gt;if&lt;/span&gt;=/dev/zero of=/zfssz3/test100/ylp_test6666.tmp bs=4k count=5120; lctl set_param debug=&lt;span class=&quot;code-quote-red&quot;&gt;&apos;ioctl neterror warning error emerg ha config console lfsck&apos;&lt;/span&gt;; lctl dk &amp;gt; /tmp/lu-13771/lustre-log.`date +%s`
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And the relative log below:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-bash&quot;&gt;
00000080:00008000:8.0:1594745597.138945:0:13390:0:(rw26.c:629:ll_write_begin()) Writing 5048 of 0 to 4096 bytes
00000080:00008000:8.0:1594745597.139374:0:13390:0:(rw26.c:629:ll_write_begin()) Writing 5049 of 0 to 4096 bytes
00000080:00008000:8.0:1594745597.139769:0:13390:0:(rw26.c:629:ll_write_begin()) Writing 5050 of 0 to 4096 bytes
00000080:00008000:8.0:1594745597.140190:0:13390:0:(rw26.c:629:ll_write_begin()) Writing 5051 of 0 to 4096 bytes
00000080:00008000:8.0:1594745597.140592:0:13390:0:(rw26.c:629:ll_write_begin()) Writing 5052 of 0 to 4096 bytes
00000080:00008000:8.0:1594745597.140984:0:13390:0:(rw26.c:629:ll_write_begin()) Writing 5053 of 0 to 4096 bytes
00000080:00008000:8.0:1594745597.141406:0:13390:0:(rw26.c:629:ll_write_begin()) Writing 5054 of 0 to 4096 bytes
00000080:00008000:8.0:1594745597.141814:0:13390:0:(rw26.c:629:ll_write_begin()) Writing 5055 of 0 to 4096 bytes
00000080:00008000:8.0:1594745597.142237:0:13390:0:(rw26.c:629:ll_write_begin()) Writing 5056 of 0 to 4096 bytes
00000080:00008000:8.0:1594745597.142639:0:13390:0:(rw26.c:629:ll_write_begin()) Writing 5057 of 0 to 4096 bytes
00000080:00008000:8.0:1594745597.143040:0:13390:0:(rw26.c:629:ll_write_begin()) Writing 5058 of 0 to 4096 bytes
00000080:00008000:8.0:1594745597.143452:0:13390:0:(rw26.c:629:ll_write_begin()) Writing 5059 of 0 to 4096 bytes
00000080:00008000:8.0:1594745597.143847:0:13390:0:(rw26.c:629:ll_write_begin()) Writing 5060 of 0 to 4096 bytes
00000080:00008000:8.0:1594745597.144268:0:13390:0:(rw26.c:629:ll_write_begin()) Writing 5061 of 0 to 4096 bytes
00000080:00008000:8.0:1594745597.144670:0:13390:0:(rw26.c:629:ll_write_begin()) Writing 5062 of 0 to 4096 bytes
00000080:00008000:8.0:1594745597.145090:0:13390:0:(rw26.c:629:ll_write_begin()) Writing 5063 of 0 to 4096 bytes
00000080:00008000:8.0:1594745597.145510:0:13390:0:(rw26.c:629:ll_write_begin()) Writing 5064 of 0 to 4096 bytes
00000080:00008000:8.0:1594745597.145903:0:13390:0:(rw26.c:629:ll_write_begin()) Writing 5065 of 0 to 4096 bytes
00000080:00008000:8.0:1594745597.146370:0:13390:0:(rw26.c:629:ll_write_begin()) Writing 5066 of 0 to 4096 bytes
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Just add an option to make dd set the offset to not align with the 4k boundary.&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-bash&quot;&gt;
lctl dk &amp;gt; /tmp/lu-13771/lustre-log.`date +%s`; lctl set_param debug=-1; lctl get_param debug; dd &lt;span class=&quot;code-object&quot;&gt;if&lt;/span&gt;=/dev/zero of=/zfssz3/test100/ylp_test7777.tmp bs=4k count=5120 seek=1024 oflag=seek_bytes; lctl set_param debug=&lt;span class=&quot;code-quote-red&quot;&gt;&apos;ioctl neterror warning error emerg ha config console lfsck&apos;&lt;/span&gt;; lctl dk &amp;gt; /tmp/lu-13771/lustre-log.`date +%s`
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And the result would be just as I expected.&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-bash&quot;&gt;
00000080:00008000:7.0:1594746674.062656:0:14533:0:(rw26.c:629:ll_write_begin()) Writing 5061 of 1024 to 3072 bytes
00000080:00008000:7.0:1594746674.062738:0:14533:0:(rw26.c:629:ll_write_begin()) Writing 5062 of 0 to 1024 bytes
00000080:00008000:7.0:1594746674.063141:0:14533:0:(rw26.c:629:ll_write_begin()) Writing 5062 of 1024 to 3072 bytes
00000080:00008000:7.0:1594746674.063224:0:14533:0:(rw26.c:629:ll_write_begin()) Writing 5063 of 0 to 1024 bytes
00000080:00008000:7.0:1594746674.063633:0:14533:0:(rw26.c:629:ll_write_begin()) Writing 5063 of 1024 to 3072 bytes
00000080:00008000:7.0:1594746674.063716:0:14533:0:(rw26.c:629:ll_write_begin()) Writing 5064 of 0 to 1024 bytes
00000080:00008000:7.0:1594746674.064126:0:14533:0:(rw26.c:629:ll_write_begin()) Writing 5064 of 1024 to 3072 bytes
00000080:00008000:7.0:1594746674.064209:0:14533:0:(rw26.c:629:ll_write_begin()) Writing 5065 of 0 to 1024 bytes
00000080:00008000:7.0:1594746674.064618:0:14533:0:(rw26.c:629:ll_write_begin()) Writing 5065 of 1024 to 3072 bytes
00000080:00008000:7.0:1594746674.064700:0:14533:0:(rw26.c:629:ll_write_begin()) Writing 5066 of 0 to 1024 bytes
00000080:00008000:7.0:1594746674.065109:0:14533:0:(rw26.c:629:ll_write_begin()) Writing 5066 of 1024 to 3072 bytes
00000080:00008000:7.0:1594746674.065191:0:14533:0:(rw26.c:629:ll_write_begin()) Writing 5067 of 0 to 1024 bytes
00000080:00008000:7.0:1594746674.065595:0:14533:0:(rw26.c:629:ll_write_begin()) Writing 5067 of 1024 to 3072 bytes
00000080:00008000:7.0:1594746674.065676:0:14533:0:(rw26.c:629:ll_write_begin()) Writing 5068 of 0 to 1024 bytes
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Log file attached.  &lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/35416/35416_lustre-log.1594746674.tar.gz&quot; title=&quot;lustre-log.1594746674.tar.gz attached to LU-13771&quot;&gt;lustre-log.1594746674.tar.gz&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt; &lt;/p&gt;</comment>
                            <comment id="275408" author="yliping" created="Wed, 15 Jul 2020 01:18:31 +0000"  >&lt;p&gt;From the log&#160;&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/35416/35416_lustre-log.1594746674.tar.gz&quot; title=&quot;lustre-log.1594746674.tar.gz attached to LU-13771&quot;&gt;lustre-log.1594746674.tar.gz&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;, I notice that though &lt;em&gt;ll_write_begin()&lt;/em&gt; not write at 4k boundary, there aren&apos;t any read RPC. Probably those read RPCs are not from partial write.&lt;/p&gt;</comment>
                            <comment id="275412" author="yliping" created="Wed, 15 Jul 2020 03:11:29 +0000"  >&lt;p&gt;&lt;del&gt;There aren&apos;t any full page write log in &lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/35416/35416_lustre-log.1594746674.tar.gz&quot; title=&quot;lustre-log.1594746674.tar.gz attached to LU-13771&quot;&gt;lustre-log.1594746674.tar.gz&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt; . Which means &lt;em&gt;vmpage&lt;/em&gt; is always up to date, and it won&apos;t cause &lt;em&gt;ll_prepare_partial_page()&lt;/em&gt; be triggered. That&apos;s the difference.&lt;/del&gt;&#160; Actually, there aren&apos;t any full page write for every write in &lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/35416/35416_lustre-log.1594746674.tar.gz&quot; title=&quot;lustre-log.1594746674.tar.gz attached to LU-13771&quot;&gt;lustre-log.1594746674.tar.gz&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;. It&apos;s always partial write. Sorry for my mistake.&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
&lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; ll_write_begin(struct file *file, struct address_space *mapping,
			  loff_t pos, unsigned len, unsigned flags,
			  struct page **pagep, void **fsdata)
{


	&lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (!PageUptodate(vmpage)) {
		/*
		 * We&apos;re completely overwriting an existing page,
		 * so _don&apos;t_ set it up to date until commit_write
		 */
		&lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (from == 0 &amp;amp;&amp;amp; to == PAGE_SIZE) {
			CL_PAGE_HEADER(D_PAGE, env, page, &lt;span class=&quot;code-quote&quot;&gt;&quot;full page write\n&quot;&lt;/span&gt;);
			POISON_PAGE(vmpage, 0x11);
		} &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt; {
			/* TODO: can be optimized at OSC layer to check &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; it
			 * is a lockless IO. In that &lt;span class=&quot;code-keyword&quot;&gt;case&lt;/span&gt;, it&apos;s not necessary
			 * to read the data. */
			result = ll_prepare_partial_page(env, io, page);
			&lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (result == 0)
				SetPageUptodate(vmpage);
		}
	}


}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="275417" author="ys" created="Wed, 15 Jul 2020 04:05:08 +0000"  >&lt;p&gt;Hi, Liping,&lt;/p&gt;

&lt;p&gt;Does the file is new one or exist one in your test?&lt;/p&gt;

&lt;p&gt;YS&lt;/p&gt;</comment>
                            <comment id="275420" author="yliping" created="Wed, 15 Jul 2020 04:26:26 +0000"  >&lt;p&gt;I redo the test. Create a new file, and upload a new log.&#160;&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/35418/35418_lustre-log.1594786368.tar.gz&quot; title=&quot;lustre-log.1594786368.tar.gz attached to LU-13771&quot;&gt;lustre-log.1594786368.tar.gz&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;Here&apos;s the&#160;dd command I used. The file is newly created in this test. And I change the block size to 8k to mimic what the program does.&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
dd &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt;=/dev/zero of=/zfssz3/test100/ylp_test1111.tmp bs=8k count=2560 seek=1314 oflag=seek_bytes
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="275421" author="ys" created="Wed, 15 Jul 2020 04:43:02 +0000"  >&lt;p&gt;From log, We can see no read RPC even partial write.&lt;/p&gt;

&lt;p&gt;Then i think what you found no read RPC since write a new file. &lt;/p&gt;</comment>
                            <comment id="275422" author="ys" created="Wed, 15 Jul 2020 04:46:17 +0000"  >&lt;p&gt;So the problem is clear enough, Then we can overcome the issue by 3 ways:&lt;br/&gt;
1. you can remove old file before write.&lt;br/&gt;
2. application write file align with 4K.&lt;br/&gt;
3. using mmap instead of write.&lt;/p&gt;</comment>
                            <comment id="275423" author="yliping" created="Wed, 15 Jul 2020 04:56:21 +0000"  >&lt;p&gt;I still don&apos;t get your point. No matter the file exists or not, there is no read RPC now. If you need the log for overwritten an old file. I can upload it. I don&apos;t see any difference.&lt;/p&gt;

&lt;p&gt;Partial write don&apos;t trigger read RPC in my test. I want to know why.&lt;/p&gt;</comment>
                            <comment id="275424" author="ys" created="Wed, 15 Jul 2020 04:59:32 +0000"  >&lt;p&gt;Please drop the pagecache before test, then you should get read RPC. &lt;/p&gt;</comment>
                            <comment id="275426" author="yliping" created="Wed, 15 Jul 2020 05:06:45 +0000"  >&lt;p&gt;I use the command below to drop all cache&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;sync; sync; sync; echo 3 &amp;gt; /proc/sys/vm/drop_caches; echo 3 &amp;gt; /proc/sys/vm/drop_caches; echo 3 &amp;gt; /proc/sys/vm/drop_caches
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then rerun the command to overwrite the old file, and still no read RPC.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;lctl dk &amp;gt; /tmp/lu-13771/lustre-log.`date +%s`; lctl set_param debug=-1; lctl get_param debug; dd if=/dev/zero of=/zfssz3/test100/ylp_test1111.tmp bs=8k count=2560 seek=1314 oflag=seek_bytes; lctl set_param debug=&apos;ioctl neterror warning error emerg ha config console lfsck&apos;; lctl dk &amp;gt; /tmp/lu-13771/lustre-log.`date +%s`
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="275430" author="ys" created="Wed, 15 Jul 2020 05:36:31 +0000"  >&lt;p&gt;Could you please upload the latest log?&lt;/p&gt;</comment>
                            <comment id="275431" author="yliping" created="Wed, 15 Jul 2020 05:41:44 +0000"  >&lt;p&gt;The latest log uploaded.&#160;&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/35419/35419_lustre-log.1594789491.tar.gz&quot; title=&quot;lustre-log.1594789491.tar.gz attached to LU-13771&quot;&gt;lustre-log.1594789491.tar.gz&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;</comment>
                            <comment id="275432" author="ys" created="Wed, 15 Jul 2020 05:41:45 +0000"  >&lt;p&gt;I forgot mention to cancel the ldlm lock.&lt;/p&gt;</comment>
                            <comment id="275436" author="yliping" created="Wed, 15 Jul 2020 06:12:45 +0000"  >&lt;p&gt;I clear LRU cache, drop all system cache, and redo the test. Still no read RPC. What else I can do to cancel the LDLM lock? Here&apos;s the command I used.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;lctl set_param ldlm.namespaces.zfsz3*.lru_size=clear &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="275439" author="ys" created="Wed, 15 Jul 2020 06:22:49 +0000"  >&lt;p&gt;Please hold on. I&apos;ll back to look into first log.&lt;/p&gt;</comment>
                            <comment id="275465" author="yliping" created="Wed, 15 Jul 2020 09:53:50 +0000"  >&lt;p&gt;I think I&apos;ve found the root cause. The KMS is the key.&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
&lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; ll_prepare_partial_page(&lt;span class=&quot;code-keyword&quot;&gt;const&lt;/span&gt; struct lu_env *env, struct cl_io *io,
				   struct cl_page *pg)
{


		/*
		 * If are writing to a &lt;span class=&quot;code-keyword&quot;&gt;new&lt;/span&gt; page, no need to read old data.
		 * The extent locking will have updated the KMS, and &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; our
		 * purposes here we can treat it like i_size.
		 */
		&lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (attr-&amp;gt;cat_kms &amp;lt;= offset) {  &lt;span class=&quot;code-comment&quot;&gt;// &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; is where the magic happens
&lt;/span&gt;			&lt;span class=&quot;code-object&quot;&gt;char&lt;/span&gt; *kaddr = ll_kmap_atomic(vpg-&amp;gt;vpg_page, KM_USER0);

			memset(kaddr, 0, cl_page_size(obj));
			ll_kunmap_atomic(kaddr, KM_USER0);
		} &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (vpg-&amp;gt;vpg_defer_uptodate)
			vpg-&amp;gt;vpg_ra_used = 1;
		&lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt;
			result = ll_page_sync_io(env, io, pg, CRT_READ);   &lt;span class=&quot;code-comment&quot;&gt;// &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; code will read data and cause read RPC
&lt;/span&gt;
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If you have only one process writing the file, the KMS value will increase gradually. Then next write offset would be larger or equal to KMS.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000080:00008000:8.0:1594793274.210077:0:23419:0:(rw26.c:629:ll_write_begin()) Writing 173 of 0 to 4096 bytes
00000080:00008000:8.0:1594793274.210131:0:23419:0:(rw26.c:629:ll_write_begin()) Writing 174 of 0 to 1314 bytes
00000008:00000002:8.0:1594793274.210243:0:23419:0:(osc_io.c:256:osc_page_touch_at()) stripe KMS increasing 708608-&amp;gt;712704 708608
00000008:00000002:8.0:1594793274.210276:0:23419:0:(osc_io.c:256:osc_page_touch_at()) stripe KMS increasing 712704-&amp;gt;714018 712704
00000080:00008000:8.0:1594793274.210597:0:23419:0:(rw26.c:629:ll_write_begin()) Writing 174 of 1314 to 2782 bytes
00000008:00000002:8.0:1594793274.210639:0:23419:0:(osc_io.c:256:osc_page_touch_at()) stripe KMS increasing 714018-&amp;gt;716800 714018
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;However, when multiprocess writing to the same file. Only some processes will increase the KMS value. For some processes, the next write offset value is less than KMS. Then for these processes, they will need to read the page first. &lt;em&gt;vvp_page_prep_read()&lt;/em&gt; is invoked for this purpose. And then follow by &lt;em&gt;osc_send_read_rpc()&lt;/em&gt;.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000080:00008000:15.0:1594347135.807736:0:24096:0:(rw26.c:629:ll_write_begin()) Writing 4066461 of 586 to 3510 bytes
00000008:00000002:15.0:1594347135.807831:0:24096:0:(osc_io.c:256:osc_page_touch_at()) stripe KMS not increasing 69578154924-&amp;gt;16656228352 69578154924
00020000:00000002:15.0:1594347135.807889:0:24096:0:(lov_merge.c:99:lov_merge_lvb_kms()) MDT ID 0x1dec8:422495 on OST[88]: s=69578154924 m=1594347135 a=1594347
119 c=1594347135 b=150930907
00000080:00008000:15.0:1594347135.807905:0:24096:0:(rw26.c:629:ll_write_begin()) Writing 4066462 of 0 to 4096 bytes
00000080:00008000:15.0:1594347135.807972:0:24096:0:(rw26.c:629:ll_write_begin()) Writing 4066463 of 0 to 586 bytes
00020000:00000002:15.0:1594347135.808015:0:24096:0:(lov_merge.c:99:lov_merge_lvb_kms()) MDT ID 0x1dec8:422495 on OST[88]: s=69578158080 m=1594347135 a=1594347
119 c=1594347135 b=150930907
00000080:00000001:15.0:1594347135.808034:0:24096:0:(vvp_page.c:202:vvp_page_prep_read()) Process entered
00000008:00000001:15.0:1594347135.808110:0:6424:0:(osc_cache.c:2183:osc_send_read_rpc()) Process entered
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If we look at the writes do increasing the KMS value, there will be no read RPC send.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000080:00008000:10.0:1594347136.342094:0:24098:0:(rw26.c:629:ll_write_begin()) Writing 16987702 of 940 to 3156 bytes
00000008:00000002:10.0:1594347136.342136:0:24098:0:(osc_io.c:256:osc_page_touch_at()) stripe KMS increasing 69581628332-&amp;gt;69581631488 69581628332
00020000:00000002:10.0:1594347136.342173:0:24098:0:(lov_merge.c:99:lov_merge_lvb_kms()) MDT ID 0x1dec8:422495 on OST[88]: s=69581631488 m=1594347136 a=1594347119 c=1594347136 b=151070931
00000080:00008000:10.0:1594347136.342185:0:24098:0:(rw26.c:629:ll_write_begin()) Writing 16987703 of 0 to 4096 bytes
00000080:00008000:10.0:1594347136.342237:0:24098:0:(rw26.c:629:ll_write_begin()) Writing 16987704 of 0 to 940 bytes
00020000:00000002:10.0:1594347136.342326:0:24098:0:(lov_merge.c:99:lov_merge_lvb_kms()) MDT ID 0x1dec8:422495 on OST[88]: s=69581631488 m=1594347136 a=1594347119 c=1594347136 b=151070931
00000008:00000002:10.0:1594347136.342617:0:24098:0:(osc_io.c:256:osc_page_touch_at()) stripe KMS increasing 69581631488-&amp;gt;69581635584 69581631488
00020000:00000002:10.0:1594347136.342653:0:24098:0:(lov_merge.c:99:lov_merge_lvb_kms()) MDT ID 0x1dec8:422495 on OST[88]: s=69581635584 m=1594347136 a=1594347119 c=1594347136 b=151070931
00000008:00000002:10.0:1594347136.342673:0:24098:0:(osc_io.c:256:osc_page_touch_at()) stripe KMS increasing 69581635584-&amp;gt;69581636524 69581635584
00020000:00000002:10.0:1594347136.342709:0:24098:0:(lov_merge.c:99:lov_merge_lvb_kms()) MDT ID 0x1dec8:422495 on OST[88]: s=69581636524 m=1594347136 a=1594347119 c=1594347136 b=151070931
00000080:00008000:10.0:1594347136.343214:0:24098:0:(rw26.c:629:ll_write_begin()) Writing 16987704 of 940 to 3156 bytes
00000008:00000002:10.0:1594347136.343257:0:24098:0:(osc_io.c:256:osc_page_touch_at()) stripe KMS increasing 69581636524-&amp;gt;69581639680 69581636524
00020000:00000002:10.0:1594347136.343303:0:24098:0:(lov_merge.c:99:lov_merge_lvb_kms()) MDT ID 0x1dec8:422495 on OST[88]: s=69581639680 m=1594347136 a=1594347119 c=1594347136 b=151070931
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;A better solution is needed for this scenario. Multiprocess downloading is a common use case.&lt;/p&gt;</comment>
                            <comment id="275472" author="ys" created="Wed, 15 Jul 2020 10:42:41 +0000"  >&lt;p&gt;Hi, LiPing,&lt;/p&gt;

&lt;p&gt;KMS records max length, So any offset less than it need read first. It is a reasonable case. So in your downloading application, multiple thread are downloading data concurrent. Only need set the start position align with 4k can ensure each write cover &lt;span class=&quot;error&quot;&gt;&amp;#91;0,4096&amp;#93;&lt;/span&gt;.  Or enlarge the write buffer(8k is too small). So only first and last page needs read RPC.&lt;/p&gt;

&lt;p&gt;YS&lt;/p&gt;</comment>
                            <comment id="275520" author="yliping" created="Thu, 16 Jul 2020 03:05:23 +0000"  >&lt;p&gt;The comment&#160;in the code already provides a possible solution in the filesystem side, not the user side. It&apos;s up to Whamcloud to decide Lustre should have this feature or not.&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
			/* TODO: can be optimized at OSC layer to check &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; it
			 * is a lockless IO. In that &lt;span class=&quot;code-keyword&quot;&gt;case&lt;/span&gt;, it&apos;s not necessary
			 * to read the data. */
			result = ll_prepare_partial_page(env, io, page);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;I have no more questions on this issue now. Thank you.&lt;/p&gt;</comment>
                            <comment id="277043" author="yliping" created="Mon, 10 Aug 2020 06:01:15 +0000"  >&lt;p&gt;This issue can be closed now. Thanks.&#160;&lt;/p&gt;</comment>
                            <comment id="277044" author="ys" created="Mon, 10 Aug 2020 06:04:29 +0000"  >&lt;p&gt;Many Thanks, LiPing.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="35414" name="generic_perform_write.txt.tar.gz" size="1168088" author="yliping" created="Tue, 14 Jul 2020 15:08:49 +0000"/>
                            <attachment id="35416" name="lustre-log.1594746674.tar.gz" size="830494" author="yliping" created="Tue, 14 Jul 2020 17:23:57 +0000"/>
                            <attachment id="35418" name="lustre-log.1594786368.tar.gz" size="824602" author="yliping" created="Wed, 15 Jul 2020 04:24:14 +0000"/>
                            <attachment id="35419" name="lustre-log.1594789491.tar.gz" size="790827" author="yliping" created="Wed, 15 Jul 2020 05:40:59 +0000"/>
                            <attachment id="35359" name="lustre-log_online2.1594310743.tar.gz" size="5717466" author="yliping" created="Thu, 9 Jul 2020 16:13:32 +0000"/>
                            <attachment id="35361" name="lustre-log_online3.1594347137.tar.gz" size="7703516" author="yliping" created="Fri, 10 Jul 2020 05:02:50 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i014sf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>