<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:14:57 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-8135] sanity test_101g fails with &apos;not all RPCs are 16 MiB BRW rpcs&apos; </title>
                <link>https://jira.whamcloud.com/browse/LU-8135</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;sanity test 101g fails with &lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;&apos;not all RPCs are 16 MiB BRW rpcs&apos; 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In the test log on the client, we can see the &#8216;File too large&#8217; error returned when dd tries to write to the file system&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;== sanity test 101g: Big bulk(4/16 MiB) readahead ==================================================== 11:11:42 (1462903902)
CMD: onyx-50vm8 /usr/sbin/lctl get_param obdfilter.*.brw_size |
			while read s; do echo ost1 \$s; done
CMD: onyx-50vm8 /usr/sbin/lctl get_param obdfilter.*.brw_size |
			while read s; do echo ost2 \$s; done
CMD: onyx-50vm8 /usr/sbin/lctl set_param -n obdfilter.lustre-OST*.brw_size=16M 		osd-*.lustre-OST*.brw_size=16M 2&amp;gt;&amp;amp;1
remount client to enable large RPC size
CMD: onyx-50vm1.onyx.hpdd.intel.com grep -c /mnt/lustre&apos; &apos; /proc/mounts
Stopping client onyx-50vm1.onyx.hpdd.intel.com /mnt/lustre (opts:)
CMD: onyx-50vm1.onyx.hpdd.intel.com lsof -t /mnt/lustre
CMD: onyx-50vm1.onyx.hpdd.intel.com umount  /mnt/lustre 2&amp;gt;&amp;amp;1
Starting client: onyx-50vm1.onyx.hpdd.intel.com:  -o user_xattr,flock onyx-50vm7@tcp:/lustre /mnt/lustre
CMD: onyx-50vm1.onyx.hpdd.intel.com mkdir -p /mnt/lustre
CMD: onyx-50vm1.onyx.hpdd.intel.com mount -t lustre -o user_xattr,flock onyx-50vm7@tcp:/lustre /mnt/lustre
dd: error writing &apos;/mnt/lustre/f101g.sanity&apos;: File too large
2+0 records in
1+0 records out
16777216 bytes (17 MB) copied, 0.978339 s, 17.1 MB/s
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.00599718 s, 0.0 kB/s
0 RPCs
 sanity test_101g: @@@@@@ FAIL: not all RPCs are 16 MiB BRW rpcs 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We&#8217;ve had three occurrences of this failure since the test was introduced by &lt;a href=&quot;http://review.whamcloud.com/19368&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/19368&lt;/a&gt; a little over a week ago and, so far, only seen on review-zfs-part-1:&lt;/p&gt;

&lt;p&gt;2016-05-09 - &lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/145be8f2-1661-11e6-b5f1-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/145be8f2-1661-11e6-b5f1-5254006e85c2&lt;/a&gt;&lt;br/&gt;
2016-05-10 - &lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/f2b9b50c-1701-11e6-b5f1-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/f2b9b50c-1701-11e6-b5f1-5254006e85c2&lt;/a&gt;&lt;br/&gt;
2016-05-11 - &lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/73c7a9a4-179f-11e6-b5f1-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/73c7a9a4-179f-11e6-b5f1-5254006e85c2&lt;/a&gt;&lt;/p&gt;</description>
                <environment>autotest review-zfs</environment>
        <key id="36888">LU-8135</key>
            <summary>sanity test_101g fails with &apos;not all RPCs are 16 MiB BRW rpcs&apos; </summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="jay">Jinshan Xiong</assignee>
                                    <reporter username="jamesanunez">James Nunez</reporter>
                        <labels>
                    </labels>
                <created>Thu, 12 May 2016 15:54:40 +0000</created>
                <updated>Tue, 6 Nov 2018 11:53:31 +0000</updated>
                            <resolved>Wed, 21 Sep 2016 05:31:52 +0000</resolved>
                                    <version>Lustre 2.9.0</version>
                                    <fixVersion>Lustre 2.9.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>9</watches>
                                                                            <comments>
                            <comment id="152028" author="pjones" created="Thu, 12 May 2016 17:12:41 +0000"  >&lt;p&gt;Gu Zheng&lt;/p&gt;

&lt;p&gt;Could you please advise on this failure?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="152169" author="lixi" created="Fri, 13 May 2016 07:21:16 +0000"  >&lt;p&gt;It seems following codes returns -EFBIG&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;	LASSERT(obj-&amp;gt;oo_sa_hdl != NULL);
	LASSERT(oh-&amp;gt;ot_tx != NULL);
	dmu_tx_hold_sa(oh-&amp;gt;ot_tx, obj-&amp;gt;oo_sa_hdl, 0);
	if (oh-&amp;gt;ot_tx-&amp;gt;tx_err != 0)
		GOTO(out, rc = -oh-&amp;gt;ot_tx-&amp;gt;tx_err);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;
00080000:00000001:0.0:1462903904.567380:0:13254:0:(osd_object.c:861:osd_declare_attr_set()) Process entered
00080000:00000001:0.0:1462903904.567381:0:13254:0:(osd_object.c:877:osd_declare_attr_set()) Process leaving via out (rc=18446744073709551589 : -27 : 0xffffffffffffffe5)
00080000:00000001:0.0:1462903904.567382:0:13254:0:(osd_object.c:918:osd_declare_attr_set()) Process leaving (rc=18446744073709551589 : -27 : ffffffffffffffe5)
00002000:00000001:0.0:1462903904.567383:0:13254:0:(ofd_io.c:1079:ofd_commitrw_write()) Process leaving via out_stop (rc=18446744073709551589 : -27 : 0xffffffffffffffe5)
00080000:00000001:0.0:1462903904.567384:0:13254:0:(osd_handler.c:282:osd_trans_stop()) Process entered
00040000:00000001:0.0:1462903904.567386:0:13254:0:(qsd_handler.c:1073:qsd_op_end()) Process entered
00040000:00000001:0.0:1462903904.567387:0:13254:0:(qsd_handler.c:1101:qsd_op_end()) Process leaving
00080000:00000010:0.0:1462903904.567387:0:13254:0:(osd_handler.c:296:osd_trans_stop()) kfreed &apos;oh&apos;: 384 at ffff880021666a00.
00080000:00000001:0.0:1462903904.567388:0:13254:0:(osd_handler.c:297:osd_trans_stop()) Process leaving (rc=0 : 0 : 0)
00000020:00000001:0.0:1462903904.567949:0:13254:0:(lustre_fid.h:740:fid_flatten32()) Process leaving (rc=252707909 : 252707909 : f100445)
00000020:00000001:0.0:1462903904.567951:0:13254:0:(lustre_fid.h:740:fid_flatten32()) Process leaving (rc=252707909 : 252707909 : f100445)
00000020:00000002:0.0:1462903904.567952:0:13254:0:(lu_object.c:162:lu_object_put()) Add ffff8800461c7178 to site lru. hash: ffff8800494dc900, bkt: ffff88004ced5118, lru_len: 2
00002000:00000001:0.0:1462903904.567954:0:13254:0:(ofd_grant.c:1268:ofd_grant_commit()) Process entered
00002000:00000001:0.0:1462903904.567955:0:13254:0:(ofd_grant.c:1323:ofd_grant_commit()) Process leaving
00002000:00000001:0.0:1462903904.567956:0:13254:0:(ofd_io.c:1138:ofd_commitrw_write()) Process leaving (rc=18446744073709551589 : -27 : ffffffffffffffe5)
00002000:00000001:0.0:1462903904.567957:0:13254:0:(ofd_io.c:1255:ofd_commitrw()) Process leaving (rc=18446744073709551589 : -27 : ffffffffffffffe5)
00000020:00000001:0.0:1462903904.567958:0:13254:0:(obd_class.h:1129:obd_commitrw()) Process leaving (rc=18446744073709551589 : -27 : ffffffffffffffe5)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="152249" author="jay" created="Fri, 13 May 2016 17:00:21 +0000"  >&lt;p&gt;Alex may provide some feedback on this issue.&lt;/p&gt;</comment>
                            <comment id="152395" author="bzzz" created="Mon, 16 May 2016 13:20:23 +0000"  >&lt;p&gt;this is because of the limit on a single tx:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;	&lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (txh-&amp;gt;txh_space_towrite + txh-&amp;gt;txh_space_tooverwrite &amp;gt;
	    2 * DMU_MAX_ACCESS)
		err = SET_ERROR(EFBIG);

#define	DMU_MAX_ACCESS (64 * 1024 * 1024) &lt;span class=&quot;code-comment&quot;&gt;/* 64MB */&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;ZFS is known for huge overestimation.. I guess we hit this with large writes.&lt;/p&gt;</comment>
                            <comment id="152460" author="jay" created="Mon, 16 May 2016 18:53:03 +0000"  >&lt;p&gt;I remembered that Matt once mentioned that it&apos;s pretty safe to reduce spa_asize_inflation, which is defined to 24 by default, to a reasonable value, and then try if it can fix this problem. spa_asize_inflation is a module parameter that can be set on the fly by setting &apos;/sys/module/zfs/parameters/spa_asize_inflation&apos;.&lt;/p&gt;</comment>
                            <comment id="165024" author="jay" created="Tue, 6 Sep 2016 21:09:36 +0000"  >&lt;p&gt;After taking a further look, I realized that this ticket reveals a severe problem in ZFS in terms of space reservation, especially when large block size is used.&lt;/p&gt;

&lt;p&gt;In the current implementation of client, it could send discontiguous pages in the same RPC for optimal throughput. However, If those pages in the same RPC locate different blocks in underlying ZFS, it will need to reserve humongous space in the transaction. For ZFS with 1MB block size and 16MB RPC size, it would need to reserve 4GB space in one trans in the worst case. In comparison, current ZFS reports error when a trans size is over 64MB.&lt;/p&gt;

&lt;p&gt;This potential issue even exists with 1MB RPC size and 1MB block size in ZFS. I&apos;m going to create a test case to demonstrate the problem.&lt;/p&gt;

&lt;p&gt;To fix this problem, I would propose a new parameter from OFD to client that limits the maximum of chunks in the same RPC.&lt;/p&gt;</comment>
                            <comment id="165133" author="paf" created="Wed, 7 Sep 2016 15:58:27 +0000"  >&lt;p&gt;Jinshan -&lt;/p&gt;

&lt;p&gt;Since you asked me to take a look...  Nasty problem, an interesting side effect of the ZFS design.  Your solution sounds good to me, it should solve the problem and allows the more common case (one or just a few chunks) to benefit from large RPC size.  I&apos;ll be curious to see the exact patch - I&apos;m not sure how the server would communicate this limit to the client.&lt;/p&gt;</comment>
                            <comment id="165197" author="gerrit" created="Wed, 7 Sep 2016 20:15:07 +0000"  >&lt;p&gt;Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/22369&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/22369&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8135&quot; title=&quot;sanity test_101g fails with &amp;#39;not all RPCs are 16 MiB BRW rpcs&amp;#39; &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8135&quot;&gt;&lt;del&gt;LU-8135&lt;/del&gt;&lt;/a&gt; osc: limits the number of chunks in write RPC&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 1a490d624c492cf6856623b2fe3e9c0afb1dbdde&lt;/p&gt;</comment>
                            <comment id="166658" author="gerrit" created="Wed, 21 Sep 2016 02:55:53 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/22369/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/22369/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8135&quot; title=&quot;sanity test_101g fails with &amp;#39;not all RPCs are 16 MiB BRW rpcs&amp;#39; &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8135&quot;&gt;&lt;del&gt;LU-8135&lt;/del&gt;&lt;/a&gt; osc: limits the number of chunks in write RPC&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 7f2aae8d80a73de7408668bbe569d5f4d8553efe&lt;/p&gt;</comment>
                            <comment id="166673" author="pjones" created="Wed, 21 Sep 2016 05:31:52 +0000"  >&lt;p&gt;Landed for 2.9&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="36911">LU-8139</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="49310">LU-10239</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="39962">LU-8632</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="32205">LU-7181</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzybfb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>