[LU-10373] LNet OPA Performance Drop Created: 12/Dec/17 Updated: 17/Jan/18 Resolved: 17/Jan/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.11.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Ian Ziemba | Assignee: | Amir Shehata (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
CentOS 7.3 |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
A drop in OPA LNet bandwidth has occurred since Lustre 2.10.0. # lctl --version lctl 2.10.0 ---------------------------------------------------------- Running test: lst add_test --batch rperf --concurrency 32 --distribute 1:1 --from clients --to servers brw read size=1M Client Read RPC/s: 23426.6428571429 Client Write RPC/s: 11714.1428571429 Client Read MiB/s: 11713.6164285714 Client Write MiB/s: 1.78714285714286 ---------------------------------------------------------- Running test: lst add_test --batch rperf --concurrency 64 --distribute 1:1 --from clients --to servers brw read size=1M Client Read RPC/s: 23577.5714285714 Client Write RPC/s: 11790.2857142857 Client Read MiB/s: 11789.2135714286 Client Write MiB/s: 1.79928571428571 ---------------------------------------------------------- Running test: lst add_test --batch rperf --concurrency 128 --distribute 1:1 --from clients --to servers brw read size=1M Client Read RPC/s: 23595.5714285714 Client Write RPC/s: 11798.2857142857 Client Read MiB/s: 11799.1114285714 Client Write MiB/s: 1.8 ---------------------------------------------------------- Running test: lst add_test --batch wperf --concurrency 32 --distribute 1:1 --from clients --to servers brw write size=1M Client Read RPC/s: 21268.3571428571 Client Write RPC/s: 10635.2142857143 Client Read MiB/s: 1.62357142857143 Client Write MiB/s: 10634.2071428571 ---------------------------------------------------------- Running test: lst add_test --batch wperf --concurrency 64 --distribute 1:1 --from clients --to servers brw write size=1M Client Read RPC/s: 22236.9285714286 Client Write RPC/s: 11118.9285714286 Client Read MiB/s: 1.69714285714286 Client Write MiB/s: 11118.7914285714 ---------------------------------------------------------- Running test: lst add_test --batch wperf --concurrency 128 --distribute 1:1 --from clients --to servers brw write size=1M Client Read RPC/s: 22178.6428571429 Client Write RPC/s: 11087.2142857143 Client Read MiB/s: 1.69142857142857 Client Write MiB/s: 11089.0557142857 # lctl --version lctl 2.10.55_127_g063a83a ---------------------------------------------------------- Running test: lst add_test --batch rperf --concurrency 32 --distribute 1:1 --from clients --to servers brw read size=1M Client Read RPC/s: 16879.5 Client Write RPC/s: 8441.14285714286 Client Read MiB/s: 8439.57857142857 Client Write MiB/s: 1.28785714285714 ---------------------------------------------------------- Running test: lst add_test --batch rperf --concurrency 64 --distribute 1:1 --from clients --to servers brw read size=1M Client Read RPC/s: 21844 Client Write RPC/s: 10923.2857142857 Client Read MiB/s: 10922.4635714286 Client Write MiB/s: 1.66714285714286 ---------------------------------------------------------- Running test: lst add_test --batch rperf --concurrency 128 --distribute 1:1 --from clients --to servers brw read size=1M Client Read RPC/s: 21928.4285714286 Client Write RPC/s: 10964.7857142857 Client Read MiB/s: 10965.17 Client Write MiB/s: 1.67357142857143 ---------------------------------------------------------- Running test: lst add_test --batch wperf --concurrency 32 --distribute 1:1 --from clients --to servers brw write size=1M Client Read RPC/s: 17288.2142857143 Client Write RPC/s: 8645.07142857143 Client Read MiB/s: 1.32 Client Write MiB/s: 8643.84928571428 ---------------------------------------------------------- Running test: lst add_test --batch wperf --concurrency 64 --distribute 1:1 --from clients --to servers brw write size=1M Client Read RPC/s: 18382.8571428571 Client Write RPC/s: 9192.92857142857 Client Read MiB/s: 1.40214285714286 Client Write MiB/s: 9191.25285714285 ---------------------------------------------------------- Running test: lst add_test --batch wperf --concurrency 128 --distribute 1:1 --from clients --to servers brw write size=1M Client Read RPC/s: 14966.3571428571 Client Write RPC/s: 7486.07142857143 Client Read MiB/s: 1.14285714285714 Client Write MiB/s: 7482.79071428571 LNet configuration is: # cat /etc/lnet.conf
net:
- net type: o2ib1
local NI(s):
- nid: 10.2.0.40@o2ib1
interfaces:
0: ib0
tunables:
peer_timeout: 180
peer_credits: 128
peer_buffer_credits: 0
credits: 1024
lnd tunables:
peercredits_hiw: 64
map_on_demand: 256
concurrent_sends: 256
fmr_pool_size: 2048
fmr_flush_trigger: 512
fmr_cache: 1
ntx: 2048
conns_per_peer: 2
CPT: "[0,1]"
OPA driver configuration is: # cat /etc/modprobe.d/hfi1.conf options hfi1 piothreshold=0 sge_copy_mode=2 wss_threshold=70 |
| Comments |
| Comment by Ian Ziemba [ 12/Dec/17 ] |
|
For Lustre 2.10.0 results, map_on_demand was set to 32. |
| Comment by Peter Jones [ 19/Dec/17 ] |
|
Amir Please can you advise Peter |
| Comment by Ian Ziemba [ 20/Dec/17 ] |
|
It looks like with master, 256 RDMA fragments are used for a 1M OPA LNet transfer whereas Lustre 2.10 used a single RDMA fragment. Could this be a possible reason for the performance drop? |
| Comment by Doug Oucharek (Inactive) [ 20/Dec/17 ] |
|
That's strange. I would have thought switching from 256 fragments to one would be better for performance. |
| Comment by Ian Ziemba [ 20/Dec/17 ] |
|
Doug - That is what I seeing. A single RDMA fragment (Lustre 2.10) does perform much better than 256 RDMA fragments (Lustre master). Sorry if my prior comment did not make that clear. |
| Comment by Ian Ziemba [ 20/Dec/17 ] |
|
Here is the the latest data I have with CentOS 7.4. Note that Lustre 2.10.2 does not experience the issues the performance issues that master does. [root@client01 lst_performance]# uname -r 3.10.0-693.11.1.el7.x86_64 [root@client01 lst_performance]# lctl --version lctl 2.10.2 [root@client01 lst_performance]# opaconfig -V 10.6.1.0.2 ---------------------------------------------------------- Running test: lst add_test --batch rperf --concurrency 32 --distribute 1:1 --from clients --to servers brw read size=1m Client Read RPC/s: 16600.1428571429 Client Write RPC/s: 8301.85714285714 Client Read MiB/s: 8299.86857142857 Client Write MiB/s: 1.26785714285714 ---------------------------------------------------------- Running test: lst add_test --batch rperf --concurrency 64 --distribute 1:1 --from clients --to servers brw read size=1m Client Read RPC/s: 16048.0714285714 Client Write RPC/s: 8025.28571428571 Client Read MiB/s: 8023.49428571428 Client Write MiB/s: 1.22428571428571 ---------------------------------------------------------- Running test: lst add_test --batch rperf --concurrency 128 --distribute 1:1 --from clients --to servers brw read size=1m Client Read RPC/s: 16942.7857142857 Client Write RPC/s: 8471.21428571429 Client Read MiB/s: 8471.78357142857 Client Write MiB/s: 1.29428571428571 ---------------------------------------------------------- Running test: lst add_test --batch wperf --concurrency 32 --distribute 1:1 --from clients --to servers brw write size=1m Client Read RPC/s: 21703.3571428571 Client Write RPC/s: 10852.9285714286 Client Read MiB/s: 1.65571428571429 Client Write MiB/s: 10851.7657142857 ---------------------------------------------------------- Running test: lst add_test --batch wperf --concurrency 64 --distribute 1:1 --from clients --to servers brw write size=1m Client Read RPC/s: 21922.0714285714 Client Write RPC/s: 10961.4285714286 Client Read MiB/s: 1.67214285714286 Client Write MiB/s: 10961.2514285714 ---------------------------------------------------------- Running test: lst add_test --batch wperf --concurrency 128 --distribute 1:1 --from clients --to servers brw write size=1m Client Read RPC/s: 21912.2857142857 Client Write RPC/s: 10953.8571428571 Client Read MiB/s: 1.67071428571429 Client Write MiB/s: 10956.0221428571 [root@client01 lst_performance]# uname -r 3.10.0-693.11.1.el7.x86_64 [root@client01 lst_performance]# lctl --version lctl 2.10.56_39_gbe4507f [root@client01 lst_performance]# opaconfig -V 10.6.1.0.2 ---------------------------------------------------------- Running test: lst add_test --batch rperf --concurrency 32 --distribute 1:1 --from clients --to servers brw read size=1m Client Read RPC/s: 14908.8571428571 Client Write RPC/s: 7456 Client Read MiB/s: 7453.895 Client Write MiB/s: 1.13928571428571 ---------------------------------------------------------- Running test: lst add_test --batch rperf --concurrency 64 --distribute 1:1 --from clients --to servers brw read size=1m Client Read RPC/s: 14782.8571428571 Client Write RPC/s: 7393.5 Client Read MiB/s: 7390.86071428571 Client Write MiB/s: 1.12928571428571 ---------------------------------------------------------- Running test: lst add_test --batch rperf --concurrency 128 --distribute 1:1 --from clients --to servers brw read size=1m Client Read RPC/s: 14793.1428571429 Client Write RPC/s: 7397.5 Client Read MiB/s: 7396.55285714286 Client Write MiB/s: 1.13 ---------------------------------------------------------- Running test: lst add_test --batch wperf --concurrency 32 --distribute 1:1 --from clients --to servers brw write size=1m Client Read RPC/s: 14475.2857142857 Client Write RPC/s: 7238.64285714286 Client Read MiB/s: 1.10642857142857 Client Write MiB/s: 7237.25142857143 ---------------------------------------------------------- Running test: lst add_test --batch wperf --concurrency 64 --distribute 1:1 --from clients --to servers brw write size=1m Client Read RPC/s: 18805 Client Write RPC/s: 9403.14285714286 Client Read MiB/s: 1.43428571428571 Client Write MiB/s: 9402.445 ---------------------------------------------------------- Running test: lst add_test --batch wperf --concurrency 128 --distribute 1:1 --from clients --to servers brw write size=1m Client Read RPC/s: 14235 Client Write RPC/s: 7115.71428571429 Client Read MiB/s: 1.08714285714286 Client Write MiB/s: 7116.90714285714 In addition, I am seeing ECONNABORTED with Lustre master that I do not see with 2.10: 00000800:00000100:0.0F:1513796499.976702:0:117:0:(o2iblnd_cb.c:1920:kiblnd_close_conn_locked()) Closing conn to 10.2.0.40@o2ib1: error 0(waiting) 00000400:00000100:11.0F:1513796499.977076:0:2289:0:(rpc.c:1418:srpc_lnet_ev_handler()) LNet event status -103 type 5, RPC errors 1 00000400:00000100:11.0:1513796499.977081:0:2289:0:(rpc.c:1418:srpc_lnet_ev_handler()) LNet event status -103 type 3, RPC errors 2 00000001:00020000:13.0F:1513796499.977088:0:2329:0:(brw_test.c:415:brw_bulk_ready()) BRW bulk WRITE failed for RPC from 12345-10.2.0.40@o2ib1: -103 00000400:00000100:11.0:1513796499.977114:0:2289:0:(rpc.c:1418:srpc_lnet_ev_handler()) LNet event status -103 type 5, RPC errors 3 00000400:00000100:11.0:1513796499.977116:0:2289:0:(rpc.c:1418:srpc_lnet_ev_handler()) LNet event status -103 type 3, RPC errors 4 00000001:00020000:1.0F:1513796499.977122:0:2325:0:(brw_test.c:415:brw_bulk_ready()) BRW bulk WRITE failed for RPC from 12345-10.2.0.40@o2ib1: -103 00000400:00000100:1.0:1513796499.977125:0:2325:0:(rpc.c:905:srpc_server_rpc_done()) Server RPC ffff881049cd9400 done: service brw_test, peer 12345-10.2.0.40@o2ib1, status SWI_STATE_BULK_STARTED:-5 00000001:00020000:1.0:1513796499.977128:0:2325:0:(brw_test.c:389:brw_server_rpc_done()) Bulk transfer from 12345-10.2.0.40@o2ib1 has failed: -5 00000400:00000100:19.0F:1513796499.977146:0:2289:0:(rpc.c:1418:srpc_lnet_ev_handler()) LNet event status -103 type 5, RPC errors 5 00000400:00000100:19.0:1513796499.977149:0:2289:0:(rpc.c:1418:srpc_lnet_ev_handler()) LNet event status -103 type 3, RPC errors 6 00000001:00020000:5.0F:1513796499.977155:0:2330:0:(brw_test.c:415:brw_bulk_ready()) BRW bulk WRITE failed for RPC from 12345-10.2.0.40@o2ib1: -103 |
| Comment by Amir Shehata (Inactive) [ 05/Jan/18 ] |
|
Can you let me know how you determined it's using 256 fragments? Did you conclude that by looking at the map_on_demand value in the stats? This value indicates the maximum number of fragments being negotiated between the peers. However with OPA we should always be collapsing everything in one fragment. There is also a cray OPA bugzzilla open that indicates that there is a ~2GB/s performance drop between different IFS versions: Bug 142506. Is this the same issue? |
| Comment by Ian Ziemba [ 11/Jan/18 ] |
|
Hi Amir, I was monitoring the number of work requests of the transmit message when kiblnd_init_rdma finished. I pulled the latest master version, and have verified that only 1 RDMA fragment is being used with OPA. It appears that the I think we can close this ticket. |
| Comment by Amir Shehata (Inactive) [ 11/Jan/18 ] |
|
Hi Ian, Thanks for verifying. And you are correct that prior to |
| Comment by Joseph Gmitter (Inactive) [ 17/Jan/18 ] |
|
Issue fixed by patch for |