socklnd needs improved interface selection and configuration
(LU-14064)
|
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.14.0 |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Technical task | Priority: | Minor |
| Reporter: | Shuichi Ihara | Assignee: | Serguei Smirnov |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
a server (1 x IB-EDR) and a client (2 x IB-HDR100) and MR enabled |
||
| Issue Links: |
|
||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
If server has more than one CPT, each peer connection should be able to distributed to different CPT as a load-balancing perspective. Here is an example. server# cat /sys/kernel/debug/lnet/cpu_partition_table
0 : 0 1 2 3 4 5 6 7 8 9
1 : 10 11 12 13 14 15 16 17 18 19
server# lnetctl net show
net:
- net type: lo
local NI(s):
- nid: 0@lo
status: up
- net type: o2ib10
local NI(s):
- nid: 10.0.11.224@o2ib10
status: up
interfaces:
0: ib0
client # cat /sys/kernel/debug/lnet/cpu_partition_table
0 : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 : 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
2 : 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
3 : 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
4 : 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
5 : 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
6 : 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111
7 : 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
client # lnetctl net show -v
- net type: o2ib10
local NI(s):
- nid: 10.0.11.81@o2ib10
status: up
interfaces:
0: ib0
- snip -
lnd tunables:
dev cpt: 0
tcp bonding: 0
CPT: "[0,1,2,3]"
- nid: 10.4.11.71@o2ib10
status: up
interfaces:
0: ib4
- snip -
lnd tunables:
dev cpt: 4
tcp bonding: 0
CPT: "[4,5,6,7]"
on client. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20263 root 20 0 0 0 0 R 98.3 0.0 0:29.85 kiblnd_sd_06_01 20264 root 20 0 0 0 0 R 98.3 0.0 0:29.85 kiblnd_sd_06_02 20265 root 20 0 0 0 0 R 98.3 0.0 0:29.85 kiblnd_sd_06_03 20262 root 20 0 0 0 0 R 98.0 0.0 0:29.84 kiblnd_sd_06_00 20247 root 20 0 0 0 0 R 89.1 0.0 1:19.11 kiblnd_sd_02_01 20248 root 20 0 0 0 0 R 88.7 0.0 1:19.20 kiblnd_sd_02_02 20249 root 20 0 0 0 0 R 88.7 0.0 1:19.15 kiblnd_sd_02_03 20246 root 20 0 0 0 0 R 87.7 0.0 1:19.24 kiblnd_sd_02_00 Two CPT are busy becouse of two interfaces. On server PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 27651 root 20 0 0 0 0 R 86.0 0.0 2:22.27 kiblnd_sd_00_00 27652 root 20 0 0 0 0 R 86.0 0.0 2:22.30 kiblnd_sd_00_01 27653 root 20 0 0 0 0 R 86.0 0.0 2:22.27 kiblnd_sd_00_02 27654 root 20 0 0 0 0 R 85.4 0.0 2:22.28 kiblnd_sd_00_03 Only an CPT is busy even for two peers are connected to server. Amir added an debug patch and confirmed both peers went to first CPT. 00000800:00000200:18.0:1591055201.186835:0:20660:0:(o2iblnd.c:795:kiblnd_create_conn()) peer_ni = 10.0.11.81@o2ib10, ni = 10.0.11.224@o2ib10, cpt = 0 00000800:00000200:18.0:1591055201.189343:0:20660:0:(o2iblnd.c:795:kiblnd_create_conn()) peer_ni = 10.4.11.81@o2ib10, ni = 10.0.11.224@o2ib10, cpt = 0 The problem hash function retuns same value even client IP address chagned below, then both peers eventually go to same CPT on server if server has only single interface. 1407418001001297 nid1 of client 64 bit representation 1407418001263431 nid2 of client 64 bit rpresentation |
| Comments |
| Comment by Gerrit Updater [ 19/Jun/20 ] |
|
Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39113 |
| Comment by Amir Shehata (Inactive) [ 20/Jun/20 ] |
|
I added a command to print the cpt number (or index of the cpt if the NI is bound to a set of CPTs). I think it would be useful to be able to pull this information out without having to dive into the kernel. Using this utility it shows that varying the first 2 octets of the IP address and the net name/number does not change the cpt value the NID is being hashed to. This is something to be aware of on existing installation. Depending on the addressing scheme the site uses, we could endup with a situation where all the NIDs are being hashed into the same CPT. This will create a problem with CPT locking and will create a problem at the LND, since we'll be picking a scheduler thread from the same CPT pool. |
| Comment by Andreas Dilger [ 21/Jun/21 ] |
|
Shuichi, is it true that the CPT hash function is imbalanced even if there are multiple CPTs and multiple clients connecting (e.g. 32 clients connecting to a server with 4 CPTs)? There are always going to be cases where two clients will map to a single CPT (in this case 10.4.11.71 and 10.4.11.81) no matter which mapping function is used. However, it is a much bigger problem if, say, 32 clients with sequential NIDs are not uniformly distributed across the CPTs on the server, or within 1 of an even split. |
| Comment by Gerrit Updater [ 31/Jan/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/39113/ |
| Comment by Peter Jones [ 05/May/22 ] |
|
Seems to be landed for 2.15 |
| Comment by Gerrit Updater [ 22/Mar/23 ] |
|
"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50381 |