[LU-9448] Assert on an empty NUMA node Created: 04/May/17  Updated: 03/Jun/17  Resolved: 03/Jun/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Critical
Reporter: Amir Shehata (Inactive) Assignee: Amir Shehata (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-8703 rework CPU partition code Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Testing on a system that has 4 numa nodes, but 1 of the nodes has no CPUs assigned triggered an assert in o2iblnd:

LASSERT(sched->ibs_nthreads > 0);

LU-6325 libcfs: shortcut to create CPT from NUMA topology
introduced a method where if the module parameter cpu_pattern was set to "N" or "n", it would create the CPTs from the NUMA topology. This has the potential of exposing the assert where a schedule's ibs_nthreads could be 0 because there are no CPUs assigned to that CPT to which the scheduler is bound (IE cfs_cpt_weight() for the CPT in question returns 0).

LU-8703 libcfs: use int type for CPT identification.
In fact exposed this bug when the default value for the module parameter cpu_pattern was set to "N".

We should be able to handle this case in the LND, by only creating schedulers for non empty CPTs.

Or by not creating an empty CPT in the first place in the libcfs code.



 Comments   
Comment by Dmitry Eremin (Inactive) [ 04/May/17 ]

The fix is still under review https://review.whamcloud.com/23222/

Comment by Gerrit Updater [ 16/May/17 ]

Amir Shehata (amir.shehata@intel.com) uploaded a new patch: https://review.whamcloud.com/27145
Subject: LU-9448 lnet: handle empty CPTs
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 94307f8eb7791d8d56e55b5db184b2006a59f9fa

Comment by Gerrit Updater [ 03/Jun/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27145/
Subject: LU-9448 lnet: handle empty CPTs
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e711370e13dcbe059e2551aa575c41d62cbcfca9

Comment by Peter Jones [ 03/Jun/17 ]

Landed for 2.10

Generated at Sat Feb 10 02:26:17 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.