Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
The number of patches improved hash for better CPT distribution (LU-14676, LU-16797), but they are still not enough and getting unbalanced CPT RR in recent use cases.
For instance, there are three following examples for 64 clients with various IP address assignment policy.
sequential order-1 sequential order-2 random order(by DHCP) src01-c0-n0 10.0.46.1 10.0.46.1 10.0.46.107 src01-c0-n1 10.0.46.2 10.0.46.33 10.0.45.41 src01-c1-n0 10.0.46.3 10.0.46.2 10.0.45.140 src01-c1-n1 10.0.46.4 10.0.46.34 10.0.47.74 src02-c0-n0 10.0.46.5 10.0.46.3 10.0.45.40 src02-c0-n1 10.0.46.6 10.0.46.35 10.0.46.219 src02-c1-n0 10.0.46.7 10.0.46.4 10.0.47.205 src02-c1-n1 10.0.46.8 10.0.46.36 10.0.47.96 ... ... ... src16-c0-n0 10.0.46.61 10.0.46.31 10.0.47.178 src16-c0-n1 10.0.46.62 10.0.46.63 10.0.47.34 src16-c1-n0 10.0.46.63 10.0.46.32 10.0.46.83 src16-c1-n1 10.0.46.64 10.0.46.64 10.0.45.226
If all 64 clients src[01-16]-c[0-1]-n[0-1] NIDs distribute across 8 CPTs
[root@src01-c0-n0 ~]# for i in `seq 1 64`; do lnetctl cpt-of-nid --ncpt 8 --nid 10.0.46.$i@o2ib12; done | grep value | sort | uniq -c 8 value: 0 8 value: 1 8 value: 2 8 value: 3 8 value: 4 8 value: 5 8 value: 6 8 value: 7
this balanced well. However, if we only use half of clients src[01-16]-c[0-1]-n0 (10.0.46.1, 10.0.46.3, 10.0.46.5...)
[root@src01-c0-n0 ~]# for i in `seq 1 2 64`; do lnetctl cpt-of-nid --ncpt 8 --nid 10.0.46.$i@o2ib12; done | grep value | sort | uniq -c 8 value: 0 8 value: 2 8 value: 4 8 value: 6
it's still good CPT distribution, but 4/8 CPTs are only used. That means half of CPU cores are idle.
if there are another 32 clients src[01-08]-c[0-1]-n[0-1] with sequential IP address order. (10.0.46.1 - 10.0.46.32), that works well.
[root@src01-c0-n0 ~]# for i in `seq 1 32`; do lnetctl cpt-of-nid --ncpt 8 --nid 10.0.46.$i@o2ib12; done | grep value | sort | uniq -c 4 value: 0 4 value: 1 4 value: 2 4 value: 3 4 value: 4 4 value: 5 4 value: 6 4 value: 7
CPT distribution for random IP addresses which are assigned by DHCP.
7 value: 0 7 value: 1 6 value: 2 8 value: 3 3 value: 4 12 value: 5 12 value: 6 9 value: 7
it causes unbalanced CPT distribution that makes busy and less busy CPT. This causes number of problems.
- it doesn't make consistent performance for different group of NIDs
- good sequential IP address order could ONLY make the maximum or hero performance
It needs some re-work to have better hash or it would have some mechanisms to make better CPT distributions
Sequential IP order might work in general, but there are still many cases which sequential order doesn't work properly.
e.g.
For HPC, few clients will be picked up from each rack for the performance. (balanced network distribution)
Cloud use case, client's IP address are assigned by DHCP and managed by dynamic DNS