[LU-12352] libcfs crashes with certain cpu_npartitions values Created: 29/May/19 Updated: 22/Oct/20 Resolved: 04/Jun/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.13.0, Lustre 2.12.6 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Andrew Perepechko | Assignee: | Andrew Perepechko |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
Due to a bug in the code, libcfs will crash if the number of online cpus does not divide by the number of cpu partitions. Based on the checks in cfs_cpt_table_create(), it appears that the original intent was to push the remaining cpus into the initial partitions. A simple reproducer for a system with cpus number that is not a multiple of 3 is: insmod libcfs.ko cpu_pattern="" cpu_npartitions=3 [112628.427628] LNetError: 14786:0:(libcfs_cpu.c:770:cfs_cpt_choose_ncpus()) ASSERTION( number > 0 ) failed: [112628.427862] LNetError: 14786:0:(libcfs_cpu.c:770:cfs_cpt_choose_ncpus()) LBUG [112628.428073] Pid: 14786, comm: insmod 3.10.0-693.21.1.x3.1.10.x86_64 #1 SMP Wed Nov 14 12:16:53 CST 2018 [112628.428082] Call Trace: [112628.428180] [<ffffffff8103a212>] save_stack_trace_tsk+0x22/0x40 [112628.428198] [<ffffffffc067d7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [112628.428231] [<ffffffffc067d87c>] lbug_with_loc+0x4c/0xa0 [libcfs] [112628.428261] [<ffffffffc069137a>] cfs_cpt_choose_ncpus+0x81a/0x820 [libcfs] [112628.428294] [<ffffffffc06915ba>] cfs_cpt_table_create+0x23a/0x8d0 [libcfs] [112628.428325] [<ffffffffc0691d4b>] cfs_cpu_init+0xbb/0xb70 [libcfs] [112628.428356] [<ffffffffc06df031>] libcfs_init+0x31/0x1000 [libcfs] [112628.428388] [<ffffffff810020ea>] do_one_initcall+0xba/0x240 [112628.428400] [<ffffffff81104424>] load_module+0x1f84/0x2a10 [112628.428413] [<ffffffff81105066>] SyS_finit_module+0xa6/0xd0 [112628.428423] [<ffffffff816c1715>] system_call_fastpath+0x1c/0x21 [112628.428436] [<ffffffffffffffff>] 0xffffffffffffffff [112628.428469] Kernel panic - not syncing: LBUG [112628.428572] CPU: 3 PID: 14786 Comm: insmod Tainted: G OE ------------ 3.10.0-693.21.1.x3.1.10.x86_64 #1 [112628.428782] Hardware name: /D525MWV, BIOS MWPNT10N.86A.0083.2011.0524.1600 05/24/2011 [112628.428970] Call Trace: [112628.429046] [<ffffffff816ae7c8>] dump_stack+0x19/0x1b [112628.429049] [<ffffffff816a8634>] panic+0xe8/0x21f [112628.429049] [<ffffffffc067d8cb>] lbug_with_loc+0x9b/0xa0 [libcfs] [112628.429049] [<ffffffffc069137a>] cfs_cpt_choose_ncpus+0x81a/0x820 [libcfs] [112628.429049] [<ffffffffc06915ba>] cfs_cpt_table_create+0x23a/0x8d0 [libcfs] [112628.429049] [<ffffffffc06df000>] ? 0xffffffffc06defff [112628.429049] [<ffffffffc0691d4b>] cfs_cpu_init+0xbb/0xb70 [libcfs] [112628.429049] [<ffffffffc06df000>] ? 0xffffffffc06defff [112628.429049] [<ffffffffc06df031>] libcfs_init+0x31/0x1000 [libcfs] [112628.429049] [<ffffffff810020ea>] do_one_initcall+0xba/0x240 [112628.429049] [<ffffffff81104424>] load_module+0x1f84/0x2a10 [112628.429049] [<ffffffff813523e0>] ? ddebug_proc_write+0xf0/0xf0 [112628.429049] [<ffffffff816c514a>] ? ftrace_graph_caller+0x5a/0x85 [112628.429049] [<ffffffff81100a83>] ? copy_module_from_fd.isra.42+0x53/0x150 [112628.429049] [<ffffffff81105066>] SyS_finit_module+0xa6/0xd0 [112628.429049] [<ffffffff816c1715>] system_call_fastpath+0x1c/0x21 [112628.429049] [<ffffffff816c1661>] ? system_call_after_swapgs+0xae/0x146 A fix will be uploaded shortly. |
| Comments |
| Comment by Gerrit Updater [ 29/May/19 ] |
|
Andrew Perepechko (c17827@cray.com) uploaded a new patch: https://review.whamcloud.com/34991 |
| Comment by Andrew Perepechko [ 29/May/19 ] |
|
With the fix: [root@panda-testbox libcfs]# insmod libcfs.ko cpu_pattern="" cpu_npartitions=3 [root@panda-testbox libcfs]# cat /sys/kernel/debug/lnet/cpu_partition_table 0 : 0 1 1 : 2 2 : 3 |
| Comment by Gerrit Updater [ 04/Jun/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34991/ |
| Comment by Peter Jones [ 04/Jun/19 ] |
|
Landed for 2.13 |
| Comment by Gerrit Updater [ 20/Mar/20 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37994 |
| Comment by Gerrit Updater [ 22/Oct/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37994/ |