[LU-6285] Assert fails in staging client module crashes kernel if CPUMASK_OFFSTACK set Created: 25/Feb/15 Updated: 10/Aug/16 Resolved: 27/Jul/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.8.0, Lustre 2.9.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Tyson Whitehead | Assignee: | Oleg Drokin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Epic/Theme: | staging | ||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 17617 | ||||||||||||||||||||
| Description |
|
Enabling CONFIG_CPUMASK_OFFSTACK in stock kernel 3.18.0 causes the staging ptlrpc module to emit the message LustreError: 1203:0:(service.c:2796:ptlrpc_hr_init()) ASSERTION( hrp->hrp_nthrs > 0 ) failed: followed by a backtrace and kernel lockup upon loading. I'll attach my dmesg dump and the .config file I used. I picked version 2.4.0 above as there doesn't seem to be anyway to indicate the staging client version. |
| Comments |
| Comment by Oleg Drokin [ 26/Feb/15 ] |
|
Thank you for the report. I submitted a bugreport upstream with a couple of proposed patches and hopefully that would be taken care of: https://lkml.org/lkml/2015/2/26/29 |
| Comment by Tyson Whitehead [ 26/Feb/15 ] |
|
Wow. That's great! Thanks for the very quick turn around. We are really looking forward to being able to use the latest Fedora and Ubuntu releases as lustre clients. |
| Comment by Oleg Drokin [ 27/Feb/15 ] |
|
You can also use this as a workaround (and a minor performance optimization): diff --git a/drivers/staging/lustre/lustre/ptlrpc/service.c b/drivers/staging/lustre/lustre/ptlrpc/service.c
index 635b12b..4a27c79 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/service.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/service.c
@@ -2752,7 +2752,6 @@ int ptlrpc_start_thread(struct ptlrpc_service_part *svcpt, int wait)
int ptlrpc_hr_init(void)
{
- cpumask_t mask;
struct ptlrpc_hr_partition *hrp;
struct ptlrpc_hr_thread *hrt;
int rc;
@@ -2770,8 +2769,7 @@ int ptlrpc_hr_init(void)
init_waitqueue_head(&ptlrpc_hr.hr_waitq);
- cpumask_copy(&mask, topology_thread_cpumask(0));
- weight = cpus_weight(mask);
+ weight = cpus_weight(*topology_thread_cpumask(0));
cfs_percpt_for_each(hrp, i, ptlrpc_hr.hr_partitions) {
hrp->hrp_cpt = i;
I'll send a separate patch for this to Greg, but who knows when it'll actually make it to Fedora. |
| Comment by Gerrit Updater [ 27/Feb/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: http://review.whamcloud.com/13904 |
| Comment by Gerrit Updater [ 27/Feb/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: http://review.whamcloud.com/13905 |
| Comment by Gerrit Updater [ 02/Mar/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: http://review.whamcloud.com/13925 |
| Comment by Gerrit Updater [ 02/Mar/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: http://review.whamcloud.com/13926 |
| Comment by Gerrit Updater [ 03/Mar/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: http://review.whamcloud.com/13954 |
| Comment by Gerrit Updater [ 06/Apr/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13904/ |
| Comment by Gerrit Updater [ 06/Apr/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13905/ |
| Comment by Gerrit Updater [ 06/Apr/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13954/ |
| Comment by Gerrit Updater [ 06/Apr/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13925/ |
| Comment by Gerrit Updater [ 01/May/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13926/ |
| Comment by James A Simmons [ 24/Jun/15 ] |
|
All the patches have landed. We can close this ticket. |
| Comment by Tyson Whitehead [ 24/Jun/15 ] |
|
Excellent! Thanks everyone. |
| Comment by James A Simmons [ 26/Jun/15 ] |
|
Please close this ticket |
| Comment by Amir Shehata (Inactive) [ 23/Mar/16 ] |
|
There is still an issue which could cause the assert. cpu_pattern can sepcify exactly 1 cpu in a partition: weight = cfs_cpu_ht_nsiblings(0); hrp->hrp_nthrs = cfs_cpt_weight(ptlrpc_hr.hr_cpt_table, i); hrp->hrp_nthrs /= weight; evaluating to 0. Where cfs_cpt_weight(ptlrpc_hr.hr_cpt_table, i) == 1 weight == 2 Therefore only divide out with weight if hrp->hrp_nthrs >= weight This will avoid the assert: LASSERT(hrp->hrp_nthrs > 0); |
| Comment by Gerrit Updater [ 23/Mar/16 ] |
|
Amir Shehata (amir.shehata@intel.com) uploaded a new patch: http://review.whamcloud.com/19106 |
| Comment by Peter Jones [ 27/Jul/16 ] |
|
Bulk of work landed for 2.9. Amir, please open a new ticket to track the landing of http://review.whamcloud.com/#/c/19106/ |