[LU-14875] LNet multirail and interface binding Created: 21/Jul/21 Updated: 23/Sep/22 Resolved: 11/Jun/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.6 |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Lustre Bull | Assignee: | Cyril Bordage |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
RedHat 8.3 |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
On a machine with 4 IB interfaces, I would like to create a LNet multirail configuration that takes into account NUMA location of each interface, in order to get the highest LNet performance. I have tried several lnet configuration but none of them allow a local binding of each interface.
Here is the NUMA description of the machine. The IB devices ib0, ib1, ib2, ib3 are located on NUMA node 1, 3, 5 and 7 respectively.
# numactl -H available: 8 nodes (0-7) node 0 cpus: 0 1 2 3 4 5 48 49 50 51 52 53 node 0 size: 63832 MB node 0 free: 60103 MB node 1 cpus: 6 7 8 9 10 11 54 55 56 57 58 59 node 1 size: 64268 MB node 1 free: 39220 MB node 2 cpus: 12 13 14 15 16 17 60 61 62 63 64 65 node 2 size: 64317 MB node 2 free: 61323 MB node 3 cpus: 18 19 20 21 22 23 66 67 68 69 70 71 node 3 size: 64281 MB node 3 free: 61558 MB node 4 cpus: 24 25 26 27 28 29 72 73 74 75 76 77 node 4 size: 64269 MB node 4 free: 60741 MB node 5 cpus: 30 31 32 33 34 35 78 79 80 81 82 83 node 5 size: 64305 MB node 5 free: 62450 MB node 6 cpus: 36 37 38 39 40 41 84 85 86 87 88 89 node 6 size: 64275 MB node 6 free: 63133 MB node 7 cpus: 42 43 44 45 46 47 90 91 92 93 94 95 node 7 size: 64337 MB node 7 free: 62429 MB node distances: node 0 1 2 3 4 5 6 7 0: 10 12 12 12 32 32 32 32 1: 12 10 12 12 32 32 32 32 2: 12 12 10 12 32 32 32 32 3: 12 12 12 10 32 32 32 32 4: 32 32 32 32 10 12 12 12 5: 32 32 32 32 12 10 12 12 6: 32 32 32 32 12 12 10 12 7: 32 32 32 32 12 12 12 10 # grep . /sys/class/net/ib*/device/numa_node /sys/class/net/ib0/device/numa_node:1 /sys/class/net/ib1/device/numa_node:3 /sys/class/net/ib2/device/numa_node:5 /sys/class/net/ib3/device/numa_node:7
By default, the libcfs module configures 8 CPTs
# modprobe -v libcfs insmod /lib/modules/4.18.0-240.10.1.el8_3.x86_64/weak-updates/lustre-client/net/libcfs.ko # lctl get_param cpu_partition_table cpu_partition_table= 0 : 0 1 2 3 4 5 48 49 50 51 52 53 1 : 6 7 8 9 10 11 54 55 56 57 58 59 2 : 12 13 14 15 16 17 60 61 62 63 64 65 3 : 18 19 20 21 22 23 66 67 68 69 70 71 4 : 24 25 26 27 28 29 72 73 74 75 76 77 5 : 30 31 32 33 34 35 78 79 80 81 82 83 6 : 36 37 38 39 40 41 84 85 86 87 88 89 7 : 42 43 44 45 46 47 90 91 92 93 94 95
With configuration 1, no LNet binding is specified and we observe each interface is bound to every CPTs
# modprobe -v lnet
insmod /lib/modules/4.18.0-240.10.1.el8_3.x86_64/weak-updates/lustre-client/net/lnet.ko networks=o2ib(ib0,ib1,ib2,ib3)
# lctl net up
LNET configured
# lnetctl net show
net:
- net type: lo
local NI(s):
- nid: 0@lo
status: up
- net type: o2ib
local NI(s):
- nid: 14.128.0.45@o2ib
status: up
interfaces:
0: ib0
- nid: 14.128.0.46@o2ib
status: up
interfaces:
0: ib1
- nid: 14.128.0.47@o2ib
status: up
interfaces:
0: ib2
- nid: 14.128.0.48@o2ib
status: up
interfaces:
0: ib3
# lnetctl net show --verbose | grep -E 'ib|CPT|dev'
dev cpt: 0
CPT: "[0,1,2,3,4,5,6,7]"
- net type: o2ib
- nid: 14.128.0.45@o2ib
0: ib0
dev cpt: 1
CPT: "[0,1,2,3,4,5,6,7]"
- nid: 14.128.0.46@o2ib
0: ib1
dev cpt: 3
CPT: "[0,1,2,3,4,5,6,7]"
- nid: 14.128.0.47@o2ib
0: ib2
dev cpt: 5
CPT: "[0,1,2,3,4,5,6,7]"
- nid: 14.128.0.48@o2ib
0: ib3
dev cpt: 7
CPT: "[0,1,2,3,4,5,6,7]"
With configuration 2, LNet binding is specified as [1,3,5,7] and we observe each interface is bound to CPTs 1,3,5 and 7. It is better, but still not optimal for the performance.
# modprobe -v lnet
insmod /lib/modules/4.18.0-240.10.1.el8_3.x86_64/weak-updates/lustre-client/net/lnet.ko networks=o2ib(ib0,ib1,ib2,ib3)[1,3,5,7]
# lctl net up
LNET configured
# lnetctl net show
net:
- net type: lo
local NI(s):
- nid: 0@lo
status: up
- net type: o2ib
local NI(s):
- nid: 14.128.0.45@o2ib
status: up
interfaces:
0: ib0
- nid: 14.128.0.46@o2ib
status: up
interfaces:
0: ib1
- nid: 14.128.0.47@o2ib
status: up
interfaces:
0: ib2
- nid: 14.128.0.48@o2ib
status: up
interfaces:
0: ib3
# lnetctl net show --verbose | grep -E 'ib|CPT|dev'
dev cpt: 0
CPT: "[0,1,2,3,4,5,6,7]"
- net type: o2ib
- nid: 14.128.0.45@o2ib
0: ib0
dev cpt: 1
CPT: "[1,3,5,7]"
- nid: 14.128.0.46@o2ib
0: ib1
dev cpt: 3
CPT: "[1,3,5,7]"
- nid: 14.128.0.47@o2ib
0: ib2
dev cpt: 5
CPT: "[1,3,5,7]"
- nid: 14.128.0.48@o2ib
0: ib3
dev cpt: 7
CPT: "[1,3,5,7]"
Finally with configuration 3, a fine NUMA binding is specified through a lnetctl yaml import, but it seems not taken into account.
# modprobe -v lnet
insmod /lib/modules/4.18.0-240.10.1.el8_3.x86_64/weak-updates/lustre-client/net/lnet.ko networks=""
# lctl net up
LNET configured
# lnetctl net del --net tcp
# lnetctl net show
net:
- net type: lo
local NI(s):
- nid: 0@lo
status: up
# cat lnetctl.config.txt
net:
- net type: o2ib
local NI(s):
- nid: 14.128.0.45@o2ib
interfaces:
0: ib0
CPT: "[1]"
- nid: 14.128.0.46@o2ib
interfaces:
0: ib1
CPT: "[3]"
- nid: 14.128.0.47@o2ib
interfaces:
0: ib2
CPT: "[5]"
- nid: 14.128.0.48@o2ib
interfaces:
0: ib3
CPT: "[7]"
# lnetctl import lnetctl.config.txt
# echo $?
0
# lnetctl net show
net:
- net type: lo
local NI(s):
- nid: 0@lo
status: up
- net type: o2ib
local NI(s):
- nid: 14.128.0.45@o2ib
status: up
interfaces:
0: ib0
- nid: 14.128.0.46@o2ib
status: up
interfaces:
0: ib1
- nid: 14.128.0.47@o2ib
status: up
interfaces:
0: ib2
- nid: 14.128.0.48@o2ib
status: up
interfaces:
0: ib3
# lnetctl net show --verbose
net:
- net type: lo
local NI(s):
- nid: 0@lo
status: up
statistics:
send_count: 0
recv_count: 0
drop_count: 0
tunables:
peer_timeout: 0
peer_credits: 0
peer_buffer_credits: 0
credits: 0
dev cpt: 0
tcp bonding: 0
CPT: "[0,1,2,3,4,5,6,7]"
- net type: o2ib
local NI(s):
- nid: 14.128.0.45@o2ib
status: up
interfaces:
0: ib0
statistics:
send_count: 0
recv_count: 0
drop_count: 0
tunables:
peer_timeout: 180
peer_credits: 8
peer_buffer_credits: 0
credits: 256
peercredits_hiw: 4
map_on_demand: 0
concurrent_sends: 8
fmr_pool_size: 512
fmr_flush_trigger: 384
fmr_cache: 1
ntx: 512
conns_per_peer: 1
lnd tunables:
dev cpt: 1
tcp bonding: 0
CPT: "[0,1,2,3,4,5,6,7]"
- nid: 14.128.0.46@o2ib
status: up
interfaces:
0: ib1
statistics:
send_count: 0
recv_count: 0
drop_count: 0
tunables:
peer_timeout: 180
peer_credits: 8
peer_buffer_credits: 0
credits: 256
peercredits_hiw: 4
map_on_demand: 0
concurrent_sends: 8
fmr_pool_size: 512
fmr_flush_trigger: 384
fmr_cache: 1
ntx: 512
conns_per_peer: 1
lnd tunables:
dev cpt: 3
tcp bonding: 0
CPT: "[0,1,2,3,4,5,6,7]"
- nid: 14.128.0.47@o2ib
status: up
interfaces:
0: ib2
statistics:
send_count: 0
recv_count: 0
drop_count: 0
tunables:
peer_timeout: 180
peer_credits: 8
peer_buffer_credits: 0
credits: 256
peercredits_hiw: 4
map_on_demand: 0
concurrent_sends: 8
fmr_pool_size: 512
fmr_flush_trigger: 384
fmr_cache: 1
ntx: 512
conns_per_peer: 1
lnd tunables:
dev cpt: 5
tcp bonding: 0
CPT: "[0,1,2,3,4,5,6,7]"
- nid: 14.128.0.48@o2ib
status: up
interfaces:
0: ib3
statistics:
send_count: 0
recv_count: 0
drop_count: 0
tunables:
peer_timeout: 180
peer_credits: 8
peer_buffer_credits: 0
credits: 256
peercredits_hiw: 4
map_on_demand: 0
concurrent_sends: 8
fmr_pool_size: 512
fmr_flush_trigger: 384
fmr_cache: 1
ntx: 512
conns_per_peer: 1
lnd tunables:
dev cpt: 7
tcp bonding: 0
CPT: "[0,1,2,3,4,5,6,7]"
Why the CPT specified for each interface of the multirail LNet interface has not been taken into account ?
|
| Comments |
| Comment by Peter Jones [ 21/Jul/21 ] |
|
Cyril Could you please advise here? Thanks Peter |
| Comment by Andreas Dilger [ 22/Jul/21 ] |
|
Does the recently-landed " |
| Comment by Lustre Bull [ 23/Jul/21 ] |
|
I read through the UDSP design documentation mentioned above. It seems to me this does not provide the requested functionality. As I understand it, the UDSP feature allows to setup policies affecting LNet interface selection at runtime. Although, the requested functionality is related to CPT configuration and binding of LNet interfaces in a multi-rail LNet network. |
| Comment by Gregoire Pichon [ 23/Jul/21 ] |
|
I have just realized that I was using the impersonal "Lustre Bull" account to create the Jira ticket and post comments. I am sorry about that. From now on, I will use my own Jira account to continue the discussion. Grégoire. |
| Comment by Cyril Bordage [ 27/Jul/21 ] |
|
Hello Grégoire ! You are right, it does not seem you could achieve what you expect with UDSP. I will study how we can provide a solution for your goal. |
| Comment by Stephen Champion [ 15/Aug/21 ] |
|
Restricting the CPT of lnds used for an interface via yaml has never worked as expected. lnetctl will show the CPT correctly, but not import it. Adding interfaces individually with lnetctl and using the --cpt option to set the binding is working as expected. I think that fixing the yaml import function will address Gregoire's needs. |
| Comment by Cyril Bordage [ 10/Sep/21 ] |
|
pichong, do you confirm you can achieve your goal by adding the interfaces individually? |
| Comment by Gregoire Pichon [ 16/Sep/21 ] |
|
Yes, I confirm that adding interfaces individually with lnetctl and option --cpt sets the binding, as expected. Thanks Stephen for the suggestion. Nevertheless, I would have expected the yaml import to do it also. |
| Comment by Cyril Bordage [ 16/Sep/21 ] |
|
Sure, it will be fixed. I just wanted to be sure you had another working way to do that. |
| Comment by Gerrit Updater [ 17/Feb/22 ] |
|
"Cyril Bordage <cbordage@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46541 |
| Comment by Gerrit Updater [ 11/Jun/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46541/ |
| Comment by Peter Jones [ 11/Jun/22 ] |
|
Landed for 2.16 |