<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:52:25 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-12419] ppc64le: &quot;LNetError: RDMA has too many fragments for peer_ni&quot; when reading two files</title>
                <link>https://jira.whamcloud.com/browse/LU-12419</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;I&apos;m trying to install and configure lustre client on ibm power9 64le machine to mount&#160;existing lustre system through router. The power9 host is similar to ORNL Summit worker&#160;and we going to use it for debugging software before running it on leadership facilities&#160;so this case can be of interest for others.&lt;/p&gt;

&lt;p&gt;At first there were issues with connecting to the lnet (lctl ping); the issues get resolved after&#160;explicitly setting options in /etc/modprobe.d/ko2iblnd.conf as per &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3322&quot; title=&quot;ko2iblnd support for different map_on_demand and peer_credits between systems&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3322&quot;&gt;&lt;del&gt;LU-3322&lt;/del&gt;&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&#160; map_on_demand=16 - on ibmpower9&lt;/p&gt;

&lt;p&gt;&#160; map_on_demand=256 - on x86_64 all servers and router.&lt;/p&gt;

&lt;p&gt;Lustre was restarted and modules were reloaded after these changes.&lt;/p&gt;

&lt;p&gt;Now I can mount lustre on power9 client and execute SINGLE file read with dd, this works&#160;for files located on any of six OSTs but doing one read transfer at a time.&#160;I did not try writes.&lt;/p&gt;

&lt;p&gt;When I start two transfers in parallel or start one transfer, then start the other&#160;10 seconds later I&apos;m getting LNET error when the transfer starts for the second file.&#160;I can kill -9 dd process (but not always all processes); sometimes one of the processes&#160;can not be killed with signal -9. Even all IO processes (&quot;dd&quot;) are killed on the client&#160;the router and servers continue to report errors in the logs; and I do observe IO on both&#160;OSTs where files reside.&#160;&#160;I can not unmount lustre on power9 client or remove modules.&lt;/p&gt;

&lt;p&gt;&quot;lctl net unconfigure&quot; reports &quot;LNET busy&quot;. I have to reboot power9 client.&#160;Only after the client reboot errors stop being reported on servers and router.&lt;/p&gt;</description>
                <environment>Client:&lt;br/&gt;
hostname:       ibmpower9&lt;br/&gt;
NID:            &lt;a href=&apos;mailto:192.168.177.202@o2ib177&apos;&gt;192.168.177.202@o2ib177&lt;/a&gt;&lt;br/&gt;
kernel:         4.14.0-115.2.2.el7a.ppc64le&lt;br/&gt;
Linux:          Red Hat Enterprise Linux Server release 7.5 (Maipo)&lt;br/&gt;
Architecture:   ppc64le&lt;br/&gt;
Byte Order:     Little Endian Model: 2.1 (pvr 004e 1201)&lt;br/&gt;
Model name:     POWER9, altivec supported&lt;br/&gt;
Lustre (custom rebuild from git distro on this host):&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;kmod-lustre-client-2.12.1-1.el7.ppc64le&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;lustre-client-2.12.1-1.el7.ppc64le&lt;br/&gt;
&lt;br/&gt;
Router:&lt;br/&gt;
hostname:       newtevnfs&lt;br/&gt;
NIDs:           &lt;a href=&apos;mailto:192.168.176.28@o2ib&apos;&gt;192.168.176.28@o2ib&lt;/a&gt;&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;a href=&apos;mailto:192.168.177.28@o2ib177&apos;&gt;192.168.177.28@o2ib177&lt;/a&gt;&lt;br/&gt;
kernel:         2.6.32-696.1.1.el6.x86_64&lt;br/&gt;
Linux:          Scientific Linux Fermi release 6.10 (Ramsey)&lt;br/&gt;
Architecture:   x86_64&lt;br/&gt;
CPU op-mode(s): 32-bit, 64-bit&lt;br/&gt;
Byte Order:     Little Endian&lt;br/&gt;
Model name:     Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz&lt;br/&gt;
Lustre version (custom rebuild from from source rpm on this host):&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;lustre-client-2.10.6-1.el6.x86_64&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;kmod-lustre-client-2.10.6-1.el6.x86_64&lt;br/&gt;
&lt;br/&gt;
Server(s): tevlfsa (MDS), tevlfs1-6 (OSS)&lt;br/&gt;
&amp;nbsp;&amp;nbsp;tevlfsa &lt;a href=&apos;mailto:192.168.176.140@o2ib&apos;&gt;192.168.176.140@o2ib&lt;/a&gt;&lt;br/&gt;
&amp;nbsp;&amp;nbsp;tevlfs1 &lt;a href=&apos;mailto:192.168.176.141@o2ib&apos;&gt;192.168.176.141@o2ib&lt;/a&gt;&lt;br/&gt;
...&lt;br/&gt;
&amp;nbsp;&amp;nbsp;tevlfs5 &lt;a href=&apos;mailto:192.168.176.145@o2ib&apos;&gt;192.168.176.145@o2ib&lt;/a&gt;&lt;br/&gt;
&amp;nbsp;&amp;nbsp;tevlfs6 &lt;a href=&apos;mailto:192.168.176.146@o2ib&apos;&gt;192.168.176.146@o2ib&lt;/a&gt;&lt;br/&gt;
kernel:        3.10.0-862.6.3.el7.x86_64&lt;br/&gt;
Linux:         Scientific Linux release 7.4 (Nitrogen)&lt;br/&gt;
Architecture: x86_64&lt;br/&gt;
CPU op-mode(s): 32-bit, 64-bit&lt;br/&gt;
Byte Order: Little Endian&lt;br/&gt;
Model name: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz&lt;br/&gt;
Lustre version (custom rebuild from from source rpm on this host tevlfs6):&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;lustre-2.10.6-1.el7.x86_64&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;zfs-0.7.9-1.el7.x86_64&lt;br/&gt;
There are 6 OSTs, one OST per OSS.&lt;br/&gt;
</environment>
        <key id="55913">LU-12419</key>
            <summary>ppc64le: &quot;LNetError: RDMA has too many fragments for peer_ni&quot; when reading two files</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="6" iconUrl="https://jira.whamcloud.com/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="2">Won&apos;t Fix</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="alex.ku">Alex Kulyavtsev</reporter>
                        <labels>
                    </labels>
                <created>Tue, 11 Jun 2019 04:34:42 +0000</created>
                <updated>Mon, 22 Jul 2019 19:15:03 +0000</updated>
                            <resolved>Mon, 22 Jul 2019 19:15:03 +0000</resolved>
                                    <version>Lustre 2.10.6</version>
                    <version>Lustre 2.12.1</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>2</watches>
                                                                            <comments>
                            <comment id="248941" author="alex.ku" created="Tue, 11 Jun 2019 04:52:26 +0000"  >&lt;p&gt;log after mount on the power9 client:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Jun 10 14:48:50 ibmpower9 kernel: LNet: HW NUMA nodes: 6, HW CPU cores: 128, npartitions: 2&lt;/p&gt;

&lt;p&gt;Jun 10 14:48:50 ibmpower9 kernel: alg: No test for adler32 (adler32-zlib)&lt;/p&gt;

&lt;p&gt;Jun 10 14:48:51 ibmpower9 kernel: Lustre: Lustre: Build Version: 2.12.1&lt;/p&gt;

&lt;p&gt;Jun 10 14:48:51 ibmpower9 kernel: LNet: Using FastReg for registration&lt;/p&gt;

&lt;p&gt;Jun 10 14:48:51 ibmpower9 kernel: LNet: Added LNI 192.168.177.202@o2ib177 &lt;span class=&quot;error&quot;&gt;&amp;#91;8/256/0/180&amp;#93;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;Jun 10 14:48:51 ibmpower9 kernel: Lustre: Mounted lfstev-client&lt;/p&gt;

&lt;p&gt;Jun 10 14:50:31 ibmpower9 kernel: NVRM: Xid (PCI:0004:04:00): 43, Ch 00000010, engmask 00000101&lt;/p&gt;

&lt;p&gt;Jun 10 14:52:55 ibmpower9 kernel: NVRM: Xid (PCI:0004:04:00): 43, Ch 00000010, engmask 00000101&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Read two files with 10 sec delay (first file still being read when second read starts)&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;dd of=/dev/null bs=1M if=/lfstev/admin/aik/iotest/osd5/10.GB&lt;/p&gt;

&lt;p&gt;sleep 10&lt;/p&gt;

&lt;p&gt;dd of=/dev/null bs=1M if=/lfstev/admin/aik/iotest/osd4/10.GB&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Client errors:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Jun 10 16:32:21 ibmpower9 kernel: Lustre: Unmounted lfstev-client&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:23 ibmpower9 kernel: Lustre: Mounted lfstev-client&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:56 ibmpower9 kernel: LustreError: 101884:0:(events.c:200:client_bulk_callback()) event type 2, status -90, desc c000207265a20a00&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:56 ibmpower9 kernel: Lustre: 101916:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: &lt;span class=&quot;error&quot;&gt;&amp;#91;sent 1560202375/real 1560202375&amp;#93;&lt;/span&gt;&#160; req@c00&lt;/p&gt;

&lt;p&gt;0007fb6779480 x1635984221122096/t0(0) o3-&amp;gt;lfstev-OST0005-osc-c0002072ee3bf800@192.168.176.146@o2ib:6/4 lens 488/440 e 0 to 1 dl 1560202382 ref 2 fl Rpc:eX/0/ffffffff rc 0/-1&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:56 ibmpower9 kernel: Lustre: lfstev-OST0005-osc-c0002072ee3bf800: Connection to lfstev-OST0005 (at 192.168.176.146@o2ib) was lost; in progress operations using this service w&lt;/p&gt;

&lt;p&gt;ill wait for recovery to complete&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:56 ibmpower9 kernel: LustreError: 101884:0:(events.c:200:client_bulk_callback()) event type 2, status -90, desc c000207265a29c00&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:56 ibmpower9 kernel: Lustre: lfstev-OST0004-osc-c0002072ee3bf800: Connection restored to 192.168.176.145@o2ib (at 192.168.176.145@o2ib)&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:56 ibmpower9 kernel: LustreError: 101882:0:(events.c:200:client_bulk_callback()) event type 2, status -90, desc c000207265a29c00&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:56 ibmpower9 kernel: LustreError: 101883:0:(events.c:200:client_bulk_callback()) event type 2, status -90, desc c000207265a29c00&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:56 ibmpower9 kernel: LustreError: 101883:0:(events.c:200:client_bulk_callback()) event type 2, status -90, desc c000207265a20a00&lt;/p&gt;

&lt;p&gt;...&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:56 ibmpower9 kernel: LustreError: 101882:0:(events.c:200:client_bulk_callback()) event type 2, status -90, desc c000207265a20a00&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:56 ibmpower9 kernel: Lustre: 101916:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: &lt;span class=&quot;error&quot;&gt;&amp;#91;sent 1560202376/real 1560202376&amp;#93;&lt;/span&gt;&#160; req@c000007fb6779480 x1635984221122096/t0(0) o3-&amp;gt;lfstev-OST0005-osc-c0002072ee3bf800@192.168.176.146@o2ib:6/4 lens 488/440 e 0 to 1 dl 1560202383 ref 2 fl Rpc:eX/2/ffffffff rc 0/-1&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:56 ibmpower9 kernel: Lustre: 101916:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 62 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:56 ibmpower9 kernel: Lustre: lfstev-OST0005-osc-c0002072ee3bf800: Connection to lfstev-OST0005 (at 192.168.176.146@o2ib) was lost; in progress operations using this service will wait for recovery to complete&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:56 ibmpower9 kernel: Lustre: Skipped 62 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:56 ibmpower9 kernel: LustreError: 101882:0:(events.c:200:client_bulk_callback()) event type 2, status -90, desc c000207265a29c00&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:56 ibmpower9 kernel: LustreError: 101881:0:(events.c:200:client_bulk_callback()) event type 2, status -90, desc c000207265a20a00&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Router has a lot of errors like this when second transfer starts&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Jun 10 16:33:33 ibmpower9 kernel: LustreError: 101883:0:(events.c:200:client_bulk_callback()) event type 2, status -90, desc c000207265a29c00&lt;/p&gt;

&lt;p&gt;Jun 10 16:33:33 ibmpower9 kernel: LustreError: 101882:0:(events.c:200:client_bulk_callback()) event type 2, status -90, desc c000207265a29c00&lt;/p&gt;

&lt;p&gt;Jun 10 16:33:33 ibmpower9 kernel: LustreError: 101881:0:(events.c:200:client_bulk_callback()) event type 2, status -90, desc c000207265a29c00&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Router errors with &quot;client_bulk_callback()&quot; filtered out:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Jun 10 16:32:56 newtevnfs kernel: LNetError: 4354:0:(o2iblnd_cb.c:1083:kiblnd_init_rdma()) RDMA has too many fragments for peer_ni 192.168.177.202@o2ib177 (16), src idx/frags: 32/240 dst idx/frags: 0/1&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:56 newtevnfs kernel: LNetError: 4354:0:(o2iblnd_cb.c:1083:kiblnd_init_rdma()) Skipped 126371 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:56 newtevnfs kernel: LNetError: 4354:0:(o2iblnd_cb.c:433:kiblnd_handle_rx()) Can&apos;t setup rdma for PUT to 192.168.177.202@o2ib177: -90&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:56 newtevnfs kernel: LNetError: 4354:0:(o2iblnd_cb.c:433:kiblnd_handle_rx()) Skipped 126396 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:56 newtevnfs kernel: LNet: 4356:0:(o2iblnd_cb.c:396:kiblnd_handle_rx()) PUT_NACK from 192.168.177.202@o2ib177&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:56 newtevnfs kernel: LNet: 4356:0:(o2iblnd_cb.c:396:kiblnd_handle_rx()) Skipped 356 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:34:11 newtevnfs kernel: LNetError: 4353:0:(o2iblnd_cb.c:1083:kiblnd_init_rdma()) RDMA has too many fragments for peer_ni 192.168.177.202@o2ib177 (16), src idx/frags: 32/240 dst idx/frags: 0/1&lt;/p&gt;

&lt;p&gt;Jun 10 16:34:11 newtevnfs kernel: LNetError: 4353:0:(o2iblnd_cb.c:1083:kiblnd_init_rdma()) Skipped 18859 previous similar messages&lt;/p&gt;&lt;/blockquote&gt;
&lt;blockquote&gt;&lt;p&gt;Jun 10 16:34:11 newtevnfs kernel: LNetError: 4353:0:(o2iblnd_cb.c:433:kiblnd_handle_rx()) Can&apos;t setup rdma for PUT to 192.168.177.202@o2ib177: -90&lt;/p&gt;

&lt;p&gt;Jun 10 16:34:11 newtevnfs kernel: LNetError: 4353:0:(o2iblnd_cb.c:433:kiblnd_handle_rx()) Skipped 18859 previous similar messages&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Server errors, first server tevlfs5:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Jun 10 16:31:09 tevlfs5 kernel: Lustre: lfstev-OST0004: Connection restored to 5cef0352-ac9a-6592-6586-45468e615673 (at 192.168.177.202@o2ib177)&lt;/p&gt;

&lt;p&gt;Jun 10 16:33:52 tevlfs5 kernel: Lustre: lfstev-OST0004: Connection restored to 5cef0352-ac9a-6592-6586-45468e615673 (at 192.168.177.202@o2ib177)&lt;/p&gt;

&lt;p&gt;Jun 10 16:34:18 tevlfs5 kernel: Lustre: lfstev-OST0004: Connection restored to 5cef0352-ac9a-6592-6586-45468e615673 (at 192.168.177.202@o2ib177)&lt;/p&gt;

&lt;p&gt;Jun 10 16:34:25 tevlfs5 kernel: Lustre: lfstev-OST0004: Client 232e0955-aa70-fd53-f988-7299ce54b534 (at 192.168.177.202@o2ib177) reconnecting&lt;/p&gt;

&lt;p&gt;Jun 10 16:34:25 tevlfs5 kernel: Lustre: Skipped 1589 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:34:25 tevlfs5 kernel: LustreError: 29667:0:(ldlm_lib.c:3197:target_bulk_io()) @@@ bulk READ failed: rc &lt;del&gt;107&#160; req@ffff965e3ff53c50 x1635984221122768/t0(0) o3&lt;/del&gt;&amp;gt;232e0955-aa70-fd53-f988-7299ce54b534@192.168.177.202@o2ib177:256/0 lens 488/432 e 0 to 0 dl 1560202471 ref 1 fl Interpret:/2/0 rc 0/0&lt;/p&gt;

&lt;p&gt;Jun 10 16:34:25 tevlfs5 kernel: Lustre: lfstev-OST0004: Bulk IO read error with 232e0955-aa70-fd53-f988-7299ce54b534 (at 192.168.177.202@o2ib177), client will retry: rc -107&lt;/p&gt;

&lt;p&gt;Jun 10 16:34:25 tevlfs5 kernel: Lustre: Skipped 1587 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:34:25 tevlfs5 kernel: LustreError: 29650:0:(ldlm_lib.c:3247:target_bulk_io()) @@@ Reconnect on bulk READ&#160; req@ffff965e3ff53450 x1635984221122784/t0(0) o3-&amp;gt;232e0955-aa70-fd53-f988-7299ce54b534@192.168.177.202@o2ib177:256/0 lens 488/432 e 0 to 0 dl 1560202471 ref 1 fl Interpret:/2/0 rc 0/0&lt;/p&gt;

&lt;p&gt;Jun 10 16:34:25 tevlfs5 kernel: LustreError: 29650:0:(ldlm_lib.c:3247:target_bulk_io()) Skipped 1584 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:34:25 tevlfs5 kernel: LustreError: 29667:0:(ldlm_lib.c:3197:target_bulk_io()) Skipped 6 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:34:27 tevlfs5 kernel: Lustre: lfstev-OST0004: Connection restored to 5cef0352-ac9a-6592-6586-45468e615673 (at 192.168.177.202@o2ib177)&lt;/p&gt;

&lt;p&gt;Jun 10 16:34:27 tevlfs5 kernel: Lustre: Skipped 221 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:34:46 tevlfs5 kernel: Lustre: lfstev-OST0004: Connection restored to 5cef0352-ac9a-6592-6586-45468e615673 (at 192.168.177.202@o2ib177)&lt;/p&gt;

&lt;p&gt;Jun 10 16:34:46 tevlfs5 kernel: Lustre: Skipped 2390 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:34:57 tevlfs5 kernel: Lustre: lfstev-OST0004: Client 232e0955-aa70-fd53-f988-7299ce54b534 (at 192.168.177.202@o2ib177) reconnecting&lt;/p&gt;

&lt;p&gt;Jun 10 16:34:57 tevlfs5 kernel: Lustre: Skipped 3995 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:34:57 tevlfs5 kernel: Lustre: lfstev-OST0004: Bulk IO read error with 232e0955-aa70-fd53-f988-7299ce54b534 (at 192.168.177.202@o2ib177), client will retry: rc -110&lt;/p&gt;

&lt;p&gt;Jun 10 16:34:57 tevlfs5 kernel: Lustre: Skipped 4021 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:34:57 tevlfs5 kernel: LustreError: 29650:0:(ldlm_lib.c:3247:target_bulk_io()) @@@ Reconnect on bulk READ&#160; req@ffff965ce9f00050 x1635984221123728/t0(0) o3-&amp;gt;232e0955-aa70-fd53-f988-7299ce54b534@192.168.177.202@o2ib177:312/0 lens 488/432 e 0 to 0 dl 1560202527 ref 1 fl Interpret:/2/0 rc 0/0&lt;/p&gt;

&lt;p&gt;Jun 10 16:34:57 tevlfs5 kernel: LustreError: 29650:0:(ldlm_lib.c:3247:target_bulk_io()) Skipped 3969 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:35:23 tevlfs5 kernel: Lustre: lfstev-OST0004: Connection restored to 5cef0352-ac9a-6592-6586-45468e615673 (at 192.168.177.202@o2ib177)&lt;/p&gt;

&lt;p&gt;Jun 10 16:35:23 tevlfs5 kernel: Lustre: Skipped 4735 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:36:01 tevlfs5 kernel: Lustre: lfstev-OST0004: Client 232e0955-aa70-fd53-f988-7299ce54b534 (at 192.168.177.202@o2ib177) reconnecting&lt;/p&gt;

&lt;p&gt;Jun 10 16:36:01 tevlfs5 kernel: Lustre: Skipped 8064 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:36:01 tevlfs5 kernel: Lustre: lfstev-OST0004: Bulk IO read error with 232e0955-aa70-fd53-f988-7299ce54b534 (at 192.168.177.202@o2ib177), client will retry: rc -110&lt;/p&gt;

&lt;p&gt;Jun 10 16:36:01 tevlfs5 kernel: Lustre: Skipped 8063 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:36:01 tevlfs5 kernel: LustreError: 29681:0:(ldlm_lib.c:3247:target_bulk_io()) @@@ Reconnect on bulk READ&#160; req@ffff965d412d8850 x1635984221123728/t0(0) o3-&amp;gt;232e0955-aa70-fd53-f988-7299ce54b534@192.168.177.202@o2ib177:376/0 lens 488/432 e 0 to 0 dl 1560202591 ref 1 fl Interpret:/2/0 rc 0/0&lt;/p&gt;

&lt;p&gt;Jun 10 16:36:01 tevlfs5 kernel: LustreError: 29681:0:(ldlm_lib.c:3247:target_bulk_io()) Skipped 8061 previous similar messages&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Jun 10 16:36:38 tevlfs5 kernel: Lustre: lfstev-OST0004: Connection restored to 5cef0352-ac9a-6592-6586-45468e615673 (at 192.168.177.202@o2ib177)&lt;/p&gt;

&lt;p&gt;Jun 10 16:36:38 tevlfs5 kernel: Lustre: Skipped 9432 previous similar messages&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Jun 10 16:38:09 tevlfs5 kernel: Lustre: lfstev-OST0004: Bulk IO read error with 232e0955-aa70-fd53-f988-7299ce54b534 (at 192.168.177.202@o2ib177), client will retry: rc -110&lt;/p&gt;

&lt;p&gt;Jun 10 16:38:09 tevlfs5 kernel: Lustre: Skipped 16127 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:38:09 tevlfs5 kernel: Lustre: lfstev-OST0004: Client 232e0955-aa70-fd53-f988-7299ce54b534 (at 192.168.177.202@o2ib177) reconnecting&lt;/p&gt;

&lt;p&gt;Jun 10 16:38:09 tevlfs5 kernel: Lustre: Skipped 16128 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:38:09 tevlfs5 kernel: LustreError: 29681:0:(ldlm_lib.c:3247:target_bulk_io()) @@@ Reconnect on bulk READ&#160; req@ffff965e35238050 x1635984221123728/t0(0) o3-&amp;gt;232e0955-aa70-fd53-f988-7299ce54b534@192.168.177.202@o2ib177:504/0 lens 488/432 e 0 to 0 dl 1560202719 ref 1 fl Interpret:/2/0 rc 0/0&lt;/p&gt;

&lt;p&gt;Jun 10 16:38:09 tevlfs5 kernel: LustreError: 29681:0:(ldlm_lib.c:3247:target_bulk_io()) Skipped 16141 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:39:08 tevlfs5 kernel: Lustre: lfstev-OST0004: Connection restored to 5cef0352-ac9a-6592-6586-45468e615673 (at 192.168.177.202@o2ib177)&lt;/p&gt;

&lt;p&gt;Jun 10 16:39:08 tevlfs5 kernel: Lustre: Skipped 18844 previous similar messages&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Jun 10 16:42:25 tevlfs5 kernel: Lustre: lfstev-OST0004: Bulk IO read error with 232e0955-aa70-fd53-f988-7299ce54b534 (at 192.168.177.202@o2ib177), client will retry: rc -110&lt;/p&gt;

&lt;p&gt;Jun 10 16:42:25 tevlfs5 kernel: Lustre: Skipped 32255 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:42:25 tevlfs5 kernel: Lustre: lfstev-OST0004: Client 232e0955-aa70-fd53-f988-7299ce54b534 (at 192.168.177.202@o2ib177) reconnecting&lt;/p&gt;

&lt;p&gt;Jun 10 16:42:25 tevlfs5 kernel: Lustre: Skipped 32255 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:42:25 tevlfs5 kernel: LustreError: 29650:0:(ldlm_lib.c:3247:target_bulk_io()) @@@ Reconnect on bulk READ&#160; req@ffff965e5d9c8850 x1635984221123728/t0(0) o3-&amp;gt;232e0955-aa70-fd53-f988-7299ce54b534@192.168.177.202@o2ib177:5/0 lens 488/432 e 0 to 0 dl 1560202975 ref 1 fl Interpret:/2/0 rc 0/0&lt;/p&gt;

&lt;p&gt;Jun 10 16:42:25 tevlfs5 kernel: LustreError: 29650:0:(ldlm_lib.c:3247:target_bulk_io()) Skipped 32227 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:44:08 tevlfs5 kernel: Lustre: lfstev-OST0004: Connection restored to 5cef0352-ac9a-6592-6586-45468e615673 (at 192.168.177.202@o2ib177)&lt;/p&gt;

&lt;p&gt;Jun 10 16:44:08 tevlfs5 kernel: Lustre: Skipped 37797 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:56 ibmpower9 kernel: LustreError: 101883:0:(events.c:200:client_bulk_callback()) event type 2, status -90, desc c000207265a20a00&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Server errors, second server tevlfs6:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Jun 10 16:31:45 tevlfs6 kernel: Lustre: lfstev-OST0005: Connection restored to 24e6a956-37a9-6c64-bff4-98a232c29f9a (at 192.168.177.202@o2ib177)&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:18 tevlfs6 kernel: Lustre: lfstev-OST0005: Client 232e0955-aa70-fd53-f988-7299ce54b534 (at 192.168.177.202@o2ib177) reconnecting&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:18 tevlfs6 kernel: Lustre: Skipped 1638 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:18 tevlfs6 kernel: Lustre: lfstev-OST0005: Connection restored to 24e6a956-37a9-6c64-bff4-98a232c29f9a (at 192.168.177.202@o2ib177)&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:18 tevlfs6 kernel: LustreError: 14016:0:(ldlm_lib.c:3197:target_bulk_io()) @@@ bulk READ failed: rc &lt;del&gt;107&#160; req@ffff96bb291e4450 x1635984221122544/t0(0) o3&lt;/del&gt;&amp;gt;232e0955-aa70-fd53-f988-7299ce54b534@192.168.177.202@o2ib177:129/0 lens 488/432 e 0 to 0 dl 1560202344 ref 1 fl Interpret:/0/0 rc 0/0&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:18 tevlfs6 kernel: LustreError: 14016:0:(ldlm_lib.c:3197:target_bulk_io()) Skipped 122 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:18 tevlfs6 kernel: Lustre: lfstev-OST0005: Bulk IO read error with 232e0955-aa70-fd53-f988-7299ce54b534 (at 192.168.177.202@o2ib177), client will retry: rc -107&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:18 tevlfs6 kernel: Lustre: Skipped 2 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:18 tevlfs6 kernel: LustreError: 14006:0:(ldlm_lib.c:3247:target_bulk_io()) @@@ Reconnect on bulk READ&#160; req@ffff96b9f53d4c50 x1635984221122656/t0(0) o3-&amp;gt;232e0955-aa70-fd53-f988-7299ce54b534@192.168.177.202@o2ib177:129/0 lens 488/432 e 0 to 0 dl 1560202344 ref 1 fl Interpret:/0/0 rc 0/0&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:18 tevlfs6 kernel: LustreError: 14006:0:(ldlm_lib.c:3247:target_bulk_io()) Skipped 4709 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:22 tevlfs6 kernel: Lustre: lfstev-OST0005: Connection restored to 24e6a956-37a9-6c64-bff4-98a232c29f9a (at 192.168.177.202@o2ib177)&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:22 tevlfs6 kernel: Lustre: Skipped 501 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:30 tevlfs6 kernel: Lustre: lfstev-OST0005: Connection restored to 24e6a956-37a9-6c64-bff4-98a232c29f9a (at 192.168.177.202@o2ib177)&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:30 tevlfs6 kernel: Lustre: Skipped 1008 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:46 tevlfs6 kernel: Lustre: lfstev-OST0005: Connection restored to 24e6a956-37a9-6c64-bff4-98a232c29f9a (at 192.168.177.202@o2ib177)&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:46 tevlfs6 kernel: Lustre: Skipped 2016 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:55 tevlfs6 kernel: Lustre: lfstev-OST0005: Client 232e0955-aa70-fd53-f988-7299ce54b534 (at 192.168.177.202@o2ib177) reconnecting&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:55 tevlfs6 kernel: Lustre: Skipped 4713 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:56 tevlfs6 kernel: LustreError: 14018:0:(ldlm_lib.c:3247:target_bulk_io()) @@@ Reconnect on bulk READ&#160; req@ffff96bae7793050 x1635984221122096/t0(0) o3-&amp;gt;232e0955-aa70-fd53-f988-7299ce54b534@192.168.177.202@o2ib177:190/0 lens 488/432 e 0 to 0 dl 1560202405 ref 1 fl Interpret:/2/0 rc 0/0&lt;/p&gt;

&lt;p&gt;Jun 10 16:32:56 tevlfs6 kernel: LustreError: 14018:0:(ldlm_lib.c:3247:target_bulk_io()) Skipped 4691 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:33:18 tevlfs6 kernel: Lustre: lfstev-OST0005: Connection restored to 24e6a956-37a9-6c64-bff4-98a232c29f9a (at 192.168.177.202@o2ib177)&lt;/p&gt;

&lt;p&gt;Jun 10 16:33:18 tevlfs6 kernel: Lustre: Skipped 4032 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:34:10 tevlfs6 kernel: Lustre: lfstev-OST0005: Client 232e0955-aa70-fd53-f988-7299ce54b534 (at 192.168.177.202@o2ib177) reconnecting&lt;/p&gt;

&lt;p&gt;Jun 10 16:34:10 tevlfs6 kernel: Lustre: Skipped 9430 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:34:11 tevlfs6 kernel: LustreError: 14018:0:(ldlm_lib.c:3247:target_bulk_io()) @@@ Reconnect on bulk READ&#160; req@ffff96b998978850 x1635984221122096/t0(0) o3-&amp;gt;232e0955-aa70-fd53-f988-7299ce54b534@192.168.177.202@o2ib177:265/0 lens 488/432 e 0 to 0 dl 1560202480 ref 1 fl Interpret:/2/0 rc 0/0&lt;/p&gt;

&lt;p&gt;Jun 10 16:34:11 tevlfs6 kernel: LustreError: 14018:0:(ldlm_lib.c:3247:target_bulk_io()) Skipped 9449 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:34:22 tevlfs6 kernel: Lustre: lfstev-OST0005: Connection restored to 24e6a956-37a9-6c64-bff4-98a232c29f9a (at 192.168.177.202@o2ib177)&lt;/p&gt;

&lt;p&gt;Jun 10 16:34:22 tevlfs6 kernel: Lustre: Skipped 8064 previous similar messages&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Both processes reading files were killed on client host. ltop reports data being transferred on OSTs:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Filesystem: lfstev&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; RECORDING&lt;/p&gt;

&lt;p&gt;&#160; &#160; Inodes:&#160; &#160; 169.254m total,&#160; &#160; &#160; 4.819m used (&#160; 3%),&#160; &#160; 164.435m free&lt;/p&gt;

&lt;p&gt;&#160;&#160; &#160; Space: &#160; &#160; 83.622t total,&#160; &#160; &#160; 7.306t used (&#160; 9%), &#160; &#160; 76.316t free&lt;/p&gt;

&lt;p&gt;&#160;&#160; Bytes/s: 0.231g read, &#160; &#160; &#160; 0.000g write, &#160; &#160; &#160; &#160; &#160; &#160; &#160; 252 IOPS&lt;/p&gt;

&lt;p&gt;&#160;&#160; MDops/s: 1 open,&#160; &#160; &#160; &#160; 0 close, &#160; &#160; &#160; 0 getattr, 0 setattr&lt;/p&gt;

&lt;p&gt;&#160;&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; 0 link,&#160; &#160; &#160; &#160; 0 unlink,&#160; &#160; &#160; 0 mkdir, &#160; &#160; &#160; &#160; 0 rmdir&lt;/p&gt;

&lt;p&gt;&#160;&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; 0 statfs, 0 rename,&#160; &#160; &#160; 0 getxattr&lt;/p&gt;

&lt;p&gt;&amp;gt;OST S&#160; &#160; &#160; &#160; OSS &#160; Exp &#160; CR rMB/s wMB/s&#160; IOPS &#160; LOCKS&#160; LGR&#160; LCR %cpu %mem %spc&lt;/p&gt;

&lt;p&gt;0000 &#160; tevlfs1&#160; &#160; 70&#160; &#160; 0 &#160; &#160; 0 &#160; &#160; 0 &#160; &#160; 0 &#160; &#160; &#160; 0&#160; &#160; 0&#160; &#160; 0&#160; &#160; 0 &#160; 69&#160; &#160; 9&lt;/p&gt;

&lt;p&gt;0001 &#160; tevlfs2&#160; &#160; 70&#160; &#160; 0 &#160; &#160; 0 &#160; &#160; 0 &#160; &#160; 0 &#160; &#160; &#160; 0&#160; &#160; 0&#160; &#160; 0&#160; &#160; 1 &#160; 69&#160; &#160; 9&lt;/p&gt;

&lt;p&gt;0002 &#160; tevlfs3&#160; &#160; 70&#160; &#160; 0 &#160; &#160; 0 &#160; &#160; 0 &#160; &#160; 0 &#160; &#160; &#160; 0&#160; &#160; 0&#160; &#160; 0&#160; &#160; 1 &#160; 88&#160; &#160; 8&lt;/p&gt;

&lt;p&gt;0003 &#160; tevlfs4&#160; &#160; 70&#160; &#160; 0 &#160; &#160; 0 &#160; &#160; 0 &#160; &#160; 0 &#160; &#160; &#160; 0&#160; &#160; 0&#160; &#160; 0&#160; &#160; 0 &#160; 88&#160; &#160; 9&lt;/p&gt;

&lt;p&gt;0004 &#160; tevlfs5&#160; &#160; 71&#160; 126 &#160; 118 &#160; &#160; 0 &#160; 126 &#160; &#160; &#160; 0&#160; &#160; 0&#160; &#160; 0&#160; &#160; 0 &#160; 90&#160; &#160; 9&lt;/p&gt;

&lt;p&gt;0005 &#160; tevlfs6&#160; &#160; 71&#160; 126 &#160; 118 &#160; &#160; 0 &#160; 126 &#160; &#160; &#160; 0&#160; &#160; 0&#160; &#160; 0&#160; &#160; 1 &#160; 70&#160; &#160; 9&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;server and router still report errors until client host get rebooted:&lt;/p&gt;

&lt;p&gt;Server:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Jun 10 16:41:40 tevlfs6 kernel: Lustre: lfstev-OST0005: Client 232e0955-aa70-fd53-f988-7299ce54b534 (at 192.168.177.202@o2ib177) reconnecting&lt;/p&gt;

&lt;p&gt;Jun 10 16:41:40 tevlfs6 kernel: Lustre: Skipped 37786 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:41:41 tevlfs6 kernel: LustreError: 14018:0:(ldlm_lib.c:3247:target_bulk_io()) @@@ Reconnect on bulk READ&#160; req@ffff96bb07d49050 x1635984221122096/t0(0) o3-&amp;gt;232e0955-aa70-fd53-f988-7299ce54b534@192.168.177.202@o2ib177:716/0 lens 488/432 e 0 to 0 dl 1560202931 ref 1 fl Interpret:/2/0 rc 0/0&lt;/p&gt;

&lt;p&gt;Jun 10 16:41:41 tevlfs6 kernel: LustreError: 14018:0:(ldlm_lib.c:3247:target_bulk_io()) Skipped 37799 previous similar messages&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Router errors:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Jun 10 16:51:41 newtevnfs kernel: LNetError: 4353:0:(o2iblnd_cb.c:1083:kiblnd_init_rdma()) Skipped 151177 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 16:51:41 newtevnfs kernel: LNetError: 4353:0:(o2iblnd_cb.c:433:kiblnd_handle_rx()) Can&apos;t setup rdma for PUT to 192.168.177.202@o2ib177: -90&lt;/p&gt;

&lt;p&gt;Jun 10 16:51:41 newtevnfs kernel: LNetError: 4353:0:(o2iblnd_cb.c:433:kiblnd_handle_rx()) Skipped 151180 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 17:01:41 newtevnfs kernel: LNetError: 4355:0:(o2iblnd_cb.c:1083:kiblnd_init_rdma()) RDMA has too many fragments for peer_ni 192.168.177.202@o2ib177 (16), src idx/frags: 32/240 dst idx/frags: 0/1&lt;/p&gt;

&lt;p&gt;Jun 10 17:01:41 newtevnfs kernel: LNetError: 4355:0:(o2iblnd_cb.c:1083:kiblnd_init_rdma()) Skipped 151165 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 17:01:41 newtevnfs kernel: LNetError: 4355:0:(o2iblnd_cb.c:433:kiblnd_handle_rx()) Can&apos;t setup rdma for PUT to 192.168.177.202@o2ib177: -90&lt;/p&gt;

&lt;p&gt;Jun 10 17:01:41 newtevnfs kernel: LNetError: 4355:0:(o2iblnd_cb.c:433:kiblnd_handle_rx()) Skipped 151172 previous similar messages&lt;/p&gt;

&lt;p&gt;Jun 10 17:11:41 newtevnfs kernel: LNetError: 4354:0:(o2iblnd_cb.c:1083:kiblnd_init_rdma()) RDMA has too many fragments for peer_ni 192.168.177.202@o2ib177 (16), src idx/frags: 32/240 dst idx/frags: 0/1&lt;/p&gt;&lt;/blockquote&gt;</comment>
                            <comment id="248944" author="alex.ku" created="Tue, 11 Jun 2019 05:15:36 +0000"  >&lt;p&gt;Attached configuration and debug files:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;$ tar tvf client.tgz &lt;/p&gt;

&lt;p&gt;drwxr-xr-x&#160; 0 root &#160; root&#160; &#160; &#160; &#160; 0 Jun 11 00:01 client/&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw&lt;/del&gt;------&#160; 0 root &#160; root&#160; &#160; 65260 Jun 10 15:28 client/debug_kernel.2.out&lt;/p&gt;

&lt;p&gt;drwxr-xr-x&#160; 0 root &#160; root&#160; &#160; &#160; &#160; 0 Jun 11 00:02 client/etc/&lt;/p&gt;

&lt;p&gt;drwxr-xr-x&#160; 0 root &#160; root&#160; &#160; &#160; &#160; 0 Jun 11 00:03 client/etc/modprobe.d/&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root&#160; &#160; &#160; 183 Jun&#160; 3 18:04 client/etc/modprobe.d/ko2iblnd.conf&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root &#160; &#160; &#160; 88 Mar&#160; 7&#160; 2018 client/etc/modprobe.d/lustre.conf&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root&#160; &#160; &#160; 467 Jun&#160; 3 17:56 client/etc/lnet.conf&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root&#160; &#160; &#160; 450 Jun&#160; 3 17:07 client/etc/lnet_routes.conf&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root&#160; &#160; &#160; 903 Jun 10 15:43 client/ibstatus.out&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root&#160; &#160; 10678 Jun 10 15:08 client/lnetctl.export.out&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root &#160; &#160; 1422 Jun 10 15:42 client/ibstat.out&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root &#160; &#160; 2303 Jun 10 15:08 client/systool.lnet.out&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root &#160; &#160; 1992 Jun 10 15:08 client/systool.ko2iblnd.out&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root 70350426 Jun 10 17:02 client/debug_kernel.3.out&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw&lt;/del&gt;------&#160; 0 root &#160; root &#160; 231798 Jun 10 15:15 client/debug_kernel.before.out&lt;/p&gt;

&lt;p&gt;-rwxr-xr-x&#160; 0 root &#160; root&#160; &#160; &#160; 165 Jun 10 16:30 client/read-two.sh&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root &#160; &#160; &#160; 75 Jun 10 15:38 client/rpms&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;router:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;$ tar tvf router.tgz &lt;/p&gt;

&lt;p&gt;drwxr-xr-x&#160; 0 root &#160; root&#160; &#160; &#160; &#160; 0 Jun 11 00:01 router/&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root &#160; &#160; 2944 Jun 10 15:43 router/ibv_devinfo.out&lt;/p&gt;

&lt;p&gt;drwxr-xr-x&#160; 0 root &#160; root&#160; &#160; &#160; &#160; 0 Jun 10 23:57 router/etc/&lt;/p&gt;

&lt;p&gt;drwxr-xr-x&#160; 0 root &#160; root&#160; &#160; &#160; &#160; 0 Jun 10 23:58 router/etc/modprobe.d/&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root&#160; &#160; &#160; 252 Jun&#160; 4 17:35 router/etc/modprobe.d/ko2iblnd.conf&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root&#160; &#160; &#160; 142 Mar 12&#160; 2018 router/etc/modprobe.d/lustre.conf&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root&#160; &#160; &#160; 317 Aug&#160; 1&#160; 2018 router/etc/lnet.conf&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root&#160; &#160; &#160; 406 Feb 14 13:27 router/etc/lnet_routes.conf&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root&#160; &#160; &#160; 451 Jun 10 15:43 router/ibstatus.out&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root&#160; &#160; 10493 Jun 10 15:35 router/lnetctl.export.out&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root&#160; &#160; &#160; 705 Jun 10 15:42 router/ibstat.out&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root &#160; &#160; 1987 Jun 10 15:34 router/systool.lnet.out&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root &#160; &#160; 1884 Jun 10 15:34 router/systool.ko2iblnd.out&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root 11576985 Jun 10 17:06 router/debug_kernel.3.out&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root &#160; &#160; &#160; 73 Jun 10 15:39 router/rpms&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Server:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;$ tar tvf server.tgz &lt;/p&gt;

&lt;p&gt;drwxr-xr-x&#160; 0 root &#160; root&#160; &#160; &#160; &#160; 0 Jun 11 00:06 server/&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root &#160; &#160; 2087 Jun 10 15:43 server/ibv_devinfo.out&lt;/p&gt;

&lt;p&gt;drwxr-xr-x&#160; 0 root &#160; root&#160; &#160; &#160; &#160; 0 Jun 11 00:05 server/etc/&lt;/p&gt;

&lt;p&gt;drwxr-xr-x&#160; 0 root &#160; root&#160; &#160; &#160; &#160; 0 Jun 11 00:04 server/etc/modprobe.d/&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root&#160; &#160; &#160; 252 Jun&#160; 4 17:35 server/etc/modprobe.d/ko2iblnd.conf&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root&#160; &#160; &#160; 142 Mar 12&#160; 2018 server/etc/modprobe.d/lustre.conf&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root&#160; &#160; &#160; 127 Jul 31&#160; 2018 server/etc/lnet.conf&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root&#160; &#160; &#160; 406 Dec 21 16:45 server/etc/lnet_routes.conf&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root&#160; &#160; &#160; 223 Jun 10 15:42 server/ibstatus.out&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root&#160; &#160; 31618 Jun 10 15:33 server/lnetctl.export.out&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root&#160; &#160; &#160; 351 Jun 10 15:42 server/ibstat.out&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root &#160; &#160; 2113 Jun 10 15:33 server/systool.lnet.out&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root &#160; &#160; 2010 Jun 10 15:33 server/systool.ko2iblnd.out&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root 17240520 Jun 10 17:05 server/debug_kernel.3.out&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-&#160; 0 root &#160; root&#160; &#160; &#160; 324 Jun 10 15:39 server/rpms&lt;/p&gt;

&lt;p&gt;mac-129482:&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12419&quot; title=&quot;ppc64le: &amp;quot;LNetError: RDMA has too many fragments for peer_ni&amp;quot; when reading two files&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12419&quot;&gt;&lt;del&gt;LU-12419&lt;/del&gt;&lt;/a&gt; aik_a$ &lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;&lt;/blockquote&gt;</comment>
                            <comment id="249280" author="alex.ku" created="Fri, 14 Jun 2019 16:40:56 +0000"  >&lt;p&gt;&#160; map_on_demand=16 - on client with 64KB page&lt;/p&gt;

&lt;p&gt;&#160; map_on_demand=256 - on x86_64 (4KB page) all servers and router.&lt;/p&gt;</comment>
                            <comment id="249281" author="alex.ku" created="Fri, 14 Jun 2019 16:41:51 +0000"  >&lt;p&gt;Your router needs wrq_sge=2&lt;/p&gt;</comment>
                            <comment id="249718" author="simmonsja" created="Sun, 23 Jun 2019 22:33:47 +0000"  >&lt;p&gt;Is everything working now for you?&lt;/p&gt;</comment>
                            <comment id="250934" author="simmonsja" created="Wed, 10 Jul 2019 02:48:21 +0000"  >&lt;p&gt;Does moving to 2.12 fix everything for you? Can this ticket be closed?&lt;/p&gt;</comment>
                            <comment id="251830" author="simmonsja" created="Mon, 22 Jul 2019 19:15:03 +0000"  >&lt;p&gt;64K&#160; page size for 2.10 will not be supported. Moving to&#160; 2.12 is the answer&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="26914">LU-5718</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="18907">LU-3322</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="34042">LU-7650</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="29170">LU-6387</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="49488">LU-10300</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="32759" name="client.tgz" size="2784957" author="alex.ku" created="Tue, 11 Jun 2019 05:12:45 +0000"/>
                            <attachment id="32760" name="router.tgz" size="423995" author="alex.ku" created="Tue, 11 Jun 2019 05:12:52 +0000"/>
                            <attachment id="32761" name="server.tgz" size="630848" author="alex.ku" created="Tue, 11 Jun 2019 05:13:05 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00hzb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>