[LU-8703] rework CPU partition code Created: 13/Oct/16  Updated: 11/Feb/18  Resolved: 19/Jul/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.10.1, Lustre 2.11.0

Type: Bug Priority: Minor
Reporter: Dmitry Eremin (Inactive) Assignee: Dmitry Eremin (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-8710 libcfs fails to install when some CPU... Resolved
Related
is related to LU-9715 Crash in libcfs_init() Resolved
is related to LU-9448 Assert on an empty NUMA node Resolved
is related to LU-9859 libcfs simplification Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

KNL systems have two NUMA nodes but only one node have CPUs.

# numactl -H                                                                    
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255
node 0 size: 98200 MB
node 0 free: 93543 MB
node 1 cpus:
node 1 size: 16384 MB
node 1 free: 15927 MB
node distances:
node   0   1 
  0:  10  31 
  1:  31  10 

So, this brings fail "LNetError: 288641:0:(linux-cpu.c:1102:cfs_cpu_init()) Failed to create cptab from pattern N" during libcfs load.

int cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask)
{
	int	i;

	if (cpumask_weight(mask) == 0 ||  <== *** HERE ***
	    cpumask_any_and(mask, cpu_online_mask) >= nr_cpu_ids) {
		CDEBUG(D_INFO, "No online CPU is found in the CPU mask "
			       "for CPU partition %d\n", cpt);
		return 0;
	}

cpumask_weight(mask) for node 1 is zero!



 Comments   
Comment by Gerrit Updater [ 18/Oct/16 ]

Dmitry Eremin (dmitry.eremin@intel.com) uploaded a new patch: http://review.whamcloud.com/23222
Subject: LU-8703 libcfs: rework CPU partition code
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c20768e9f7d61ab996476245845b97493a6a1c19

Comment by Dmitry Eremin (Inactive) [ 18/Oct/16 ]

Proposed patch can be treated as temporary workable solution. In general current CPU affinity and partition approach should be significantly elaborated.

Comment by Dmitry Eremin (Inactive) [ 21/Oct/16 ]

I will split this patch into the sequence of several small patches. So, rename ticket name for following work.

Comment by Gerrit Updater [ 21/Oct/16 ]

Dmitry Eremin (dmitry.eremin@intel.com) uploaded a new patch: http://review.whamcloud.com/23303
Subject: LU-8703 libcfs: remove usless CPU partition code
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 749f12fd45789b7567d62354134383d6f8dbd67b

Comment by Gerrit Updater [ 21/Oct/16 ]

Dmitry Eremin (dmitry.eremin@intel.com) uploaded a new patch: http://review.whamcloud.com/23304
Subject: LU-8703 libcfs: use int type for CPU identification.
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 64a25ef0d2fe4454996a98c1142ff8d60329a360

Comment by Gerrit Updater [ 21/Oct/16 ]

Dmitry Eremin (dmitry.eremin@intel.com) uploaded a new patch: http://review.whamcloud.com/23306
Subject: LU-8703 libcfs: rework CPU pattern parsing code
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4c8bfd503e20eaa1b14993e8d09b49ef99ff664d

Comment by James A Simmons [ 21/Oct/16 ]

Thanks Dmitry for breaking it up.

Comment by Gerrit Updater [ 21/Oct/16 ]

Dmitry Eremin (dmitry.eremin@intel.com) uploaded a new patch: http://review.whamcloud.com/23307
Subject: LU-8703 libcfs: fix error messages
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: de8deb74fbe7eac4da0bef8e6ab4fea5a0ab0450

Comment by Gerrit Updater [ 12/Dec/16 ]

Dmitry Eremin (dmitry.eremin@intel.com) uploaded a new patch: https://review.whamcloud.com/24304
Subject: LU-8703 libcfs: change CPT estimate algorithm
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5a928f213bfb24a6c7e5ae353bc7872528e32a1d

Comment by Gerrit Updater [ 17/Dec/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23307/
Subject: LU-8703 libcfs: fix error messages
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 20c748658000f5454f38576af506643e370bb1bc

Comment by Gerrit Updater [ 24/Jan/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23303/
Subject: LU-8703 libcfs: remove usless CPU partition code
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: c96c0043e9794c6e7e72c241d11908381b9fbbc3

Comment by Gerrit Updater [ 24/Jan/17 ]

Dmitry Eremin (dmitry.eremin@intel.com) uploaded a new patch: https://review.whamcloud.com/25048
Subject: LU-8703 libcfs: remove usless abstraction
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 204666bbd750a577500c2d9c9d2d31e0bc1c4544

Comment by Gerrit Updater [ 23/Feb/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/25048/
Subject: LU-8703 libcfs: remove usless abstraction
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 40fe3cd7283dfd1cee5f989483c517601ac773f8

Comment by Gerrit Updater [ 19/Apr/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23304/
Subject: LU-8703 libcfs: use int type for CPT identification.
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: bcb737a19433e3e32df6a826f29d15a3666f54d8

Comment by Gerrit Updater [ 10/Jun/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23222/
Subject: LU-8703 libcfs: make tolerant to offline CPUs and empty NUMA nodes
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 42bf19a573a5c967e54302cc08c7b51effac3dd9

Comment by Peter Jones [ 10/Jun/17 ]

Landed for 2.10

Comment by James A Simmons [ 11/Jun/17 ]

Actually their are two patches left.

Comment by Gerrit Updater [ 19/Jun/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23306/
Subject: LU-8703 libcfs: rework CPU pattern parsing code
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 62bc3afea210eed59dd25fa4cf0fd5ecd083a7ae

Comment by Gerrit Updater [ 19/Jul/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24304/
Subject: LU-8703 libcfs: change CPT estimate algorithm
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 02dea319b2ef21868b3fa3fad7b3f5cab7eb244e

Comment by Peter Jones [ 19/Jul/17 ]

Now everything has landed for 2.11

Comment by Gerrit Updater [ 19/Jul/17 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/28111
Subject: LU-8703 libcfs: change CPT estimate algorithm
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: f44a81459d9409f0572b120a186d933825dbb5f5

Comment by Gerrit Updater [ 03/Aug/17 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28111/
Subject: LU-8703 libcfs: change CPT estimate algorithm
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 3c5d093308bf7103cdd87cec1d7170b61482e9c9

Generated at Sat Feb 10 02:19:50 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.