[LU-10394] IB_MR_TYPE_SG_GAPS mlx5 LNet performance drop Created: 14/Dec/17  Updated: 07/Jan/19  Resolved: 09/Feb/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: Lustre 2.11.0

Type: Bug Priority: Critical
Reporter: Ian Ziemba Assignee: Amir Shehata (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

CentOS 7.4
2.10.56_1_g11aae87-1.el7.centos.x86_64


Issue Links:
Duplicate
Related
is related to LU-10373 LNet OPA Performance Drop Resolved
is related to LU-11105 Seeing "Using FastReg with no GAPS su... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

mlx5 performance is down 2+ GB/s when using IB_MR_TYPE_SG_GAPS as compared to IB_MR_TYPE_MEM_REG.

 

mlx5 with SG GAPS

----------------------------------------------------------
Running test: lst add_test --batch rperf --concurrency 32 --distribute 1:1 --from clients --to servers brw read size=1M
Client Read RPC/s: 17426.3333333333
Client Write RPC/s: 8713.77777777778
Client Read MiB/s: 8713.46111111111
Client Write MiB/s: 1.33
----------------------------------------------------------
Running test: lst add_test --batch rperf --concurrency 64 --distribute 1:1 --from clients --to servers brw read size=1M
Client Read RPC/s: 17408.3333333333
Client Write RPC/s: 8705.22222222222
Client Read MiB/s: 8704.06666666667
Client Write MiB/s: 1.33
----------------------------------------------------------
Running test: lst add_test --batch rperf --concurrency 128 --distribute 1:1 --from clients --to servers brw read size=1M
Client Read RPC/s: 17388.4444444444
Client Write RPC/s: 8697
Client Read MiB/s: 8695.54666666667
Client Write MiB/s: 1.32777777777778
----------------------------------------------------------
Running test: lst add_test --batch wperf --concurrency 32 --distribute 1:1 --from clients --to servers brw write size=1M
Client Read RPC/s: 17712.1111111111
Client Write RPC/s: 8856.55555555555
Client Read MiB/s: 1.35
Client Write MiB/s: 8855.53111111111
----------------------------------------------------------
Running test: lst add_test --batch wperf --concurrency 64 --distribute 1:1 --from clients --to servers brw write size=1M
Client Read RPC/s: 17705.7777777778
Client Write RPC/s: 8853.66666666667
Client Read MiB/s: 1.35
Client Write MiB/s: 8853.18555555556
----------------------------------------------------------
Running test: lst add_test --batch wperf --concurrency 128 --distribute 1:1 --from clients --to servers brw write size=1M
Client Read RPC/s: 17697.3333333333
Client Write RPC/s: 8854.44444444445
Client Read MiB/s: 1.34888888888889
Client Write MiB/s: 8850.95777777778


mlx5 without SG GAPS

----------------------------------------------------------
Running test: lst add_test --batch rperf --concurrency 32 --distribute 1:1 --from clients --to servers brw read size=1M
Client Read RPC/s: 22449.5555555556
Client Write RPC/s: 11227
Client Read MiB/s: 11224.5033333333
Client Write MiB/s: 1.71222222222222
----------------------------------------------------------
Running test: lst add_test --batch rperf --concurrency 64 --distribute 1:1 --from clients --to servers brw read size=1M
Client Read RPC/s: 22308.6666666667
Client Write RPC/s: 11154.3333333333
Client Read MiB/s: 11155.7288888889
Client Write MiB/s: 1.7
----------------------------------------------------------
Running test: lst add_test --batch rperf --concurrency 128 --distribute 1:1 --from clients --to servers brw read size=1M
Client Read RPC/s: 21549.1
Client Write RPC/s: 10737.4
Client Read MiB/s: 11135.278
Client Write MiB/s: 1.638
----------------------------------------------------------
Running test: lst add_test --batch wperf --concurrency 32 --distribute 1:1 --from clients --to servers brw write size=1M
Client Read RPC/s: 22178.3333333333
Client Write RPC/s: 11090.8888888889
Client Read MiB/s: 1.69
Client Write MiB/s: 11088.7822222222
----------------------------------------------------------
Running test: lst add_test --batch wperf --concurrency 64 --distribute 1:1 --from clients --to servers brw write size=1M
Client Read RPC/s: 22198.6666666667
Client Write RPC/s: 11099.8888888889
Client Read MiB/s: 1.69111111111111
Client Write MiB/s: 11100.1666666667
----------------------------------------------------------
Running test: lst add_test --batch wperf --concurrency 128 --distribute 1:1 --from clients --to servers brw write size=1M
Client Read RPC/s: 22162.6666666667
Client Write RPC/s: 11085.7777777778
Client Read MiB/s: 1.68777777777778
Client Write MiB/s: 11083.5477777778


o2iblnd parameters:

options ko2iblnd timeout=10
options ko2iblnd peer_timeout=0
options ko2iblnd keepalive=30
options ko2iblnd credits=2048
options ko2iblnd ntx=2048
options ko2iblnd peer_credits=16
options ko2iblnd concurrent_sends=16



 Comments   
Comment by Peter Jones [ 19/Dec/17 ]

Amir

Could you please advise?

Peter

Comment by Amir Shehata (Inactive) [ 04/Jan/18 ]

Here is my proposed solution. Introduce a tunable to enable GAP support. it defaults to 0 in order to use MEM_REG. if the tunable is set to 1 and the mlx card has gap support then use GAP support and add a warning that performance degradation is expected with this configuration.

This is safe because

LU-9983 osp: align the OSP request size by 4k

closes any scenario where we could introduce discontigous fragments.

The tunable can be turned on if in the future a layer uses LNet with discontigous gaps.

Comment by Gerrit Updater [ 05/Jan/18 ]

Amir Shehata (amir.shehata@intel.com) uploaded a new patch: https://review.whamcloud.com/30749
Subject: LU-10394 lnd: default to using MEM_REG
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 74dced457067c9135365ea040e7071c87cd75375

Comment by Ian Ziemba [ 11/Jan/18 ]

Hi Amir,

I tried running twice with this patch, and both times, my system has panic with the following message.

[  872.303780] LNetError: 2362:0:(o2iblnd_cb.c:991:kiblnd_check_sends_locked()) ASSERTION( conn->ibc_nsends_posted <= conn->ibc_queue_depth ) failed:
[  872.304325] LNetError: 2362:0:(o2iblnd_cb.c:991:kiblnd_check_sends_locked()) LBUG
[  872.304699] Pid: 2362, comm: kiblnd_sd_00_01
[  872.304704]
Call Trace:
[  872.304805]  [<ffffffffc0a1e7ae>] libcfs_call_trace+0x4e/0x60 [libcfs]
[  872.304901]  [<ffffffffc0a1e83c>] lbug_with_loc+0x4c/0xb0 [libcfs]
[  872.304967]  [<ffffffffc0ab352b>] kiblnd_check_sends_locked+0xd8b/0xd90 [ko2iblnd]
[  872.305008]  [<ffffffffc0524d53>] ? mlx5_ib_post_recv+0x1f3/0x240 [mlx5_ib]
[  872.305074]  [<ffffffffc0ab4e06>] kiblnd_post_rx+0x156/0x4e0 [ko2iblnd]
[  872.305138]  [<ffffffffc0ab536a>] kiblnd_recv+0x1da/0x7b0 [ko2iblnd]
[  872.305266]  [<ffffffffc08db1fc>] ? lnet_mt_match_md+0x8c/0x1b0 [lnet]
[  872.305392]  [<ffffffffc08e3573>] lnet_ni_recv+0xc3/0x320 [lnet]
[  872.305523]  [<ffffffffc08e3cc1>] lnet_recv_put+0x81/0xb0 [lnet]
[  872.305641]  [<ffffffffc08e5ee6>] lnet_parse_local+0x5a6/0xd40 [lnet]
[  872.305762]  [<ffffffffc08e6f4a>] lnet_parse+0x8ca/0xfc0 [lnet]
[  872.305825]  [<ffffffffc0ab3035>] ? kiblnd_check_sends_locked+0x895/0xd90 [ko2iblnd]
[  872.305860]  [<ffffffffc051c608>] ? mlx5_ib_poll_cq+0x418/0xf10 [mlx5_ib]
[  872.305925]  [<ffffffffc0ab5ce3>] kiblnd_handle_rx+0x213/0x6b0 [ko2iblnd]
[  872.305990]  [<ffffffffc0abc90f>] kiblnd_scheduler+0xf0f/0x1150 [ko2iblnd]
[  872.306004]  [<ffffffff810c93f5>] ? sched_clock_cpu+0x85/0xc0
[  872.306016]  [<ffffffff8102954d>] ? __switch_to+0xcd/0x500
[  872.306031]  [<ffffffff810c6440>] ? default_wake_function+0x0/0x20
[  872.306096]  [<ffffffffc0abba00>] ? kiblnd_scheduler+0x0/0x1150 [ko2iblnd]
[  872.306109]  [<ffffffff810b252f>] kthread+0xcf/0xe0
[  872.306122]  [<ffffffff810b2460>] ? kthread+0x0/0xe0
[  872.306135]  [<ffffffff816b8798>] ret_from_fork+0x58/0x90
[  872.306147]  [<ffffffff810b2460>] ? kthread+0x0/0xe0

I think this panic is unrelated to your patch. Do you want me to open another ticket?

Thanks

Comment by Amir Shehata (Inactive) [ 11/Jan/18 ]

This has already been fixed already. Please checkout LU-10459

Comment by Ian Ziemba [ 11/Jan/18 ]

Thanks. The use_fastreg_gaps parameter works as expected. When set to 0, 1M performance is 11 GBps. When set to 1, 1M performance is 8.9 GBps.

Comment by Amir Shehata (Inactive) [ 11/Jan/18 ]

Ok thanks. good to know that SG_GAPS drops performance. I'll make sure to record that on the LNet Wiki

Comment by Gerrit Updater [ 09/Feb/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30749/
Subject: LU-10394 lnd: default to using MEM_REG
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 27116ee18c04057936837f6f0aef3f4c09c21d78

Comment by Peter Jones [ 09/Feb/18 ]

Landed for 2.11

Comment by Gerrit Updater [ 12/Sep/18 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33149
Subject: LU-10394 lnd: default to using MEM_REG
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: ed0687e222c2714b4a1d6d12ff2bce6612c8bb82

Generated at Sat Feb 10 02:34:41 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.