[LU-8819] lprocfs_alloc_stats() Segmentation fault (core dumped) Created: 10/Nov/16  Updated: 14/Mar/17

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: ShijunDeng Assignee: Lai Siyao
Resolution: Unresolved Votes: 0
Labels: lprocfs
Environment:

lustre 2.8.0 centos7 kernel-3.10.0_3.10.0_327.3.1.el7_lustre.x86_64-1.x86_64


Attachments: PNG File QQ截图20161110201410.png    
Epic/Theme: Lustre-2.8.0
Severity: 4
Epic: interoperability
Project: Test Infrastructure
Rank (Obsolete): 9223372036854775807

 Description   

In lustre/obdclass/lprocfs_status.c ,and in the function "lprocfs_alloc_stats",
when alloc percpu pointers for all possible cpu slots,the corresponding code:
LIBCFS_ALLOC(stats, offsetof(typeof(*stats), ls_percpu[num_entry]));
In fact ,this code didn't alloc space for its member struct lprocfs_counter lp_cntr[0],
but in other operations like funcion:
lprocfs_counter_init which call lprocfs_stats_counter_get
the code in fuction lprocfs_stats_counter_get such as
stats->ls_percpu[cpuid]->lp_cntr[index]
may access memory that not belong to var stats itself,
it's not safe and may lead to Segmentation fault.especially when there is not enough memory.



 Comments   
Comment by Peter Jones [ 10/Nov/16 ]

Lai

Could you please advise on this issue?

Thanks

Peter

Comment by John Hammond [ 10/Nov/16 ]

lprocfs_stats_counter_get() is only used after successful calls to lprocfs_stats_lock() or checks that stats->ls_percpu[i] is not NULL.

Comment by ShijunDeng [ 11/Nov/16 ]

struct lprocfs_counter {
__s64 lc_count;
__s64 lc_min;
__s64 lc_max;
__s64 lc_sumsquare;
/*

  • Every counter has lc_array_sum[0], while lc_array_sum[1] is only
  • for irq context counter, i.e. stats with
  • LPROCFS_STATS_FLAG_IRQ_SAFE flag, its counter need
  • lc_array_sum[1]
    */
    __s64 lc_array_sum[1];
    };

struct lprocfs_percpu {
struct lprocfs_counter lp_cntr[0];
};

struct lprocfs_stats {
/* # of counters */
unsigned short ls_num;
/* 1 + the biggest cpu # whose ls_percpu slot has been allocated */
unsigned short ls_biggest_alloc_num;
enum lprocfs_stats_flags ls_flags;
/* Lock used when there are no percpu stats areas; For percpu stats,

  • it is used to protect ls_biggest_alloc_num change */
    spinlock_t ls_lock;

/* has ls_num of counter headers */
struct lprocfs_counter_header *ls_cnt_header;
struct lprocfs_percpu *ls_percpu[0];
};

the above code is corresponding definition,In lustre/obdclass/lprocfs_status.c ,and in the function "lprocfs_alloc_stats",
when alloc percpu pointers for all possible cpu slots,the corresponding code:
LIBCFS_ALLOC(stats, offsetof(typeof(*stats), ls_percpu[num_entry]));

the access to stats->ls_percpu[i] is safe,but it's not safe to acccess lp_cntr[index],to simplify the analysis for the problem,I will justify my position by giving one example as follow:

#include<stdio.h>
struct en{
int a;
int b;
int c[0];
};

int main(){

struct en e;
e.c[0]=1000;
e.c[1]=2000;
e.c[2]=3000;
e.c[1000]=3000;
printf("e.c[0]=%d\n",e.c[0]);
printf("e.c[1]=%d\n",e.c[1]);
printf("e.c[2]=%d\n",e.c[2]);
printf("e.c[1000]=%d\n",e.c[1000]);
return 0;
}

the small demo can run correctly,sometimes.but it's not safe.For the same reson,
The code in lustre2.8.0 such as cntr = &stats->ls_percpu[cpuid]->lp_cntr[index] isn't safe,too.
Possibly because the value of index is small(it's cannot up to 100,not to 1000),the error "Segmentation fault (core dumped)" occurs rarely.

Comment by Lai Siyao [ 14/Nov/16 ]

Did you really meet segfault? If so, can you post logs or backtraces?

I verified the code

LIBCFS_ALLOC(stats, offsetof(typeof(*stats), ls_percpu[num_entry]));

does allocate percpu data, and can be accessed successfully.

While your example code is different from above code, if you change to below:

#include<stdio.h>
struct en
{ int a; int b; int c[0]; }
;
int main()
{
 struct en *e = malloc(offset(struct en, c[1001]);

 e->c[0]=1000;
 e->c[1]=2000;
 e->c[2]=3000;
 e->c[1000]=3000;
 printf("e.c[0]=%d\n",e->c[0]);
 printf("e.c[1]=%d\n",e->c[1]);
 printf("e.c[2]=%d\n",e->c[2]);
 printf("e.c[1000]=%d\n",e->c[1000]);
 return 0;
}

It should always work, could you take a try?

Comment by Gerrit Updater [ 04/Dec/16 ]

please ignore.

Generated at Sat Feb 10 02:20:49 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.