<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:08:27 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-7385] Bulk IO write error</title>
                <link>https://jira.whamcloud.com/browse/LU-7385</link>
                <project id="10000" key="LU">Lustre</project>
                    <description></description>
                <environment></environment>
        <key id="32997">LU-7385</key>
            <summary>Bulk IO write error</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="doug">Doug Oucharek</assignee>
                                    <reporter username="aromanenko">alyona romanenko</reporter>
                        <labels>
                    </labels>
                <created>Wed, 4 Nov 2015 13:18:09 +0000</created>
                <updated>Mon, 17 Apr 2017 04:24:07 +0000</updated>
                            <resolved>Mon, 17 Apr 2017 04:24:07 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>11</watches>
                                                                            <comments>
                            <comment id="132613" author="aromanenko" created="Wed, 4 Nov 2015 14:18:55 +0000"  >&lt;p&gt;The issue which reported our customer is bulk write from client node(s) repeatedly fail resulting in a dropped connection. &lt;br/&gt;
The connection is restored, the bulk write is attempted again, and again fails. &lt;br/&gt;
Ultimately the filesystem stops responding to the client node. &lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;&amp;gt; Jun 10 06:54:15 snx11026n010 kernel: LustreError: 106837:0:(events.c:393:server_bulk_callback()) event type 2, status -90, desc ffff88023c269000
&amp;gt; Jun 10 06:54:15 snx11026n010 kernel: LustreError: 24204:0:(ldlm_lib.c:2953:target_bulk_io()) @@@ network error on bulk GET 0(527648)  req@ffff8802da0f6050 x1501695686238744/t0(0) o4-&amp;gt;028fa37e-9825-10ed-52ee-9971d416f647@532@gni:0/0 lens 488/416 e 0 to 0 dl 1433912110 ref 1 fl Interpret:/0/0 rc 0/0
&amp;gt; Jun 10 06:54:15 snx11026n010 kernel: Lustre: snx11026-OST001b: Bulk IO write error with 028fa37e-9825-10ed-52ee-9971d416f647 (at 532@gni), client will retry: rc -110
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="132718" author="aromanenko" created="Thu, 5 Nov 2015 09:51:38 +0000"  >&lt;p&gt;Hi all,&lt;br/&gt;
Test setup for bug reproducing:&lt;br/&gt;
client1 (sjsc-34) - router (sich-33) - client2 (pink05). &lt;br/&gt;
Lustre version 2.5.1 + patch &lt;a href=&quot;http://review.whamcloud.com/#/c/12496/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/12496/&lt;/a&gt; (&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5718&quot; title=&quot;RDMA too fragmented with router&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5718&quot;&gt;&lt;del&gt;LU-5718&lt;/del&gt;&lt;/a&gt; lnet: add offset for selftest brw).&lt;br/&gt;
Also, the session_features was set to LST_FEATS_MASK.&lt;br/&gt;
The script:&lt;br/&gt;
lst new_session tst&lt;br/&gt;
lst add_group pink05 172.18.56.133@o2ib&lt;br/&gt;
lst add_group sjsc34 172.24.62.76@o2ib1&lt;br/&gt;
lst add_batch test&lt;br/&gt;
lst add_test --batch test --loop 5 --to pink05 --from sjsc34 brw write size=1000k off=64&lt;br/&gt;
lst run test&lt;br/&gt;
lst stat pink05 &amp;amp; /bin/sleep 5; kill $!&lt;br/&gt;
lst end_session&lt;/p&gt;

&lt;p&gt;the test constantly reproduces the error -90:&lt;br/&gt;
router: kiblnd_init_rdma()) RDMA too fragmented for 172.24.62.74@o2ib1 (256): 128/251 src 128/250 dst frags&lt;/p&gt;

&lt;p&gt;thanks,&lt;br/&gt;
Alyona&lt;/p&gt;</comment>
                            <comment id="132720" author="aromanenko" created="Thu, 5 Nov 2015 09:56:18 +0000"  >&lt;p&gt;the patch  is on the &lt;a href=&quot;http://review.whamcloud.com/#/c/16141/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/16141/&lt;/a&gt;&lt;br/&gt;
The issue was not reproduced with different sizes and offsets if path applied.&lt;/p&gt;</comment>
                            <comment id="139037" author="doug" created="Fri, 15 Jan 2016 16:57:48 +0000"  >&lt;p&gt;I updated the patch to include Andreas&apos;s suggestions and also fixed a problem with socklnd.  The patch worked for o2iblnd, but broke socklnd whenever the message size is greater than 16K.  When the message is large, socklnd wants to use zero-copy to send the message.  Zero-copy uses tcp_sendpage() which will crash if the kiov passed to it is bigger than PAGE_SIZE.  My fix is to avoid zero-copy when the kiov is larger than PAGE_SIZE.&lt;/p&gt;</comment>
                            <comment id="139038" author="doug" created="Fri, 15 Jan 2016 16:59:15 +0000"  >&lt;p&gt;James: Would you be able to verify that having kiov&apos;s  bigger than PAGE_SIZE does not break gnilnd?&lt;/p&gt;</comment>
                            <comment id="140705" author="doug" created="Mon, 1 Feb 2016 23:37:25 +0000"  >&lt;p&gt;I am still working on a better patch for this issue, but have come to ask a more fundamental question: how is this situation happening with Lustre (rather than modified lnet_selftest)?  How is an offset of 64 happening?  Is this due to a partial I/O write?  Lustre developers have told me that should not be happening.&lt;/p&gt;</comment>
                            <comment id="172032" author="doug" created="Wed, 2 Nov 2016 15:29:43 +0000"  >&lt;p&gt;After playing around with the lnet-selftest change to reproduce this issue (using an offset of 64), I have found this issue is not specific to LNet routers.  A client sending directly to a server will also fail.  So this LNet router-specific fix will only address part of the problem.  The fix for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5718&quot; title=&quot;RDMA too fragmented with router&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5718&quot;&gt;&lt;del&gt;LU-5718&lt;/del&gt;&lt;/a&gt;, when applied to all nodes and activated, will address the entire problem making it the better solution.&lt;/p&gt;</comment>
                            <comment id="192202" author="pjones" created="Mon, 17 Apr 2017 04:24:07 +0000"  >&lt;p&gt;If I understand corrently, this is believed to be a duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5718&quot; title=&quot;RDMA too fragmented with router&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5718&quot;&gt;&lt;del&gt;LU-5718&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="26914">LU-5718</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzxs87:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>