Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8819

lprocfs_alloc_stats() Segmentation fault (core dumped)

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.8.0
    • lustre 2.8.0 centos7 kernel-3.10.0_3.10.0_327.3.1.el7_lustre.x86_64-1.x86_64

    Description

      In lustre/obdclass/lprocfs_status.c ,and in the function "lprocfs_alloc_stats",
      when alloc percpu pointers for all possible cpu slots,the corresponding code:
      LIBCFS_ALLOC(stats, offsetof(typeof(*stats), ls_percpu[num_entry]));
      In fact ,this code didn't alloc space for its member struct lprocfs_counter lp_cntr[0],
      but in other operations like funcion:
      lprocfs_counter_init which call lprocfs_stats_counter_get
      the code in fuction lprocfs_stats_counter_get such as
      stats->ls_percpu[cpuid]->lp_cntr[index]
      may access memory that not belong to var stats itself,
      it's not safe and may lead to Segmentation fault.especially when there is not enough memory.

      Attachments

        Activity

          [LU-8819] lprocfs_alloc_stats() Segmentation fault (core dumped)
          gerrit Gerrit Updater added a comment - - edited

          please ignore.

          gerrit Gerrit Updater added a comment - - edited please ignore.
          laisiyao Lai Siyao added a comment - - edited

          Did you really meet segfault? If so, can you post logs or backtraces?

          I verified the code

          LIBCFS_ALLOC(stats, offsetof(typeof(*stats), ls_percpu[num_entry]));
          

          does allocate percpu data, and can be accessed successfully.

          While your example code is different from above code, if you change to below:

          #include<stdio.h>
          struct en
          { int a; int b; int c[0]; }
          ;
          int main()
          {
           struct en *e = malloc(offset(struct en, c[1001]);
          
           e->c[0]=1000;
           e->c[1]=2000;
           e->c[2]=3000;
           e->c[1000]=3000;
           printf("e.c[0]=%d\n",e->c[0]);
           printf("e.c[1]=%d\n",e->c[1]);
           printf("e.c[2]=%d\n",e->c[2]);
           printf("e.c[1000]=%d\n",e->c[1000]);
           return 0;
          }
          

          It should always work, could you take a try?

          laisiyao Lai Siyao added a comment - - edited Did you really meet segfault? If so, can you post logs or backtraces? I verified the code LIBCFS_ALLOC(stats, offsetof(typeof(*stats), ls_percpu[num_entry])); does allocate percpu data, and can be accessed successfully. While your example code is different from above code, if you change to below: #include<stdio.h> struct en { int a; int b; int c[0]; } ; int main() { struct en *e = malloc(offset(struct en, c[1001]); e->c[0]=1000; e->c[1]=2000; e->c[2]=3000; e->c[1000]=3000; printf("e.c[0]=%d\n",e->c[0]); printf("e.c[1]=%d\n",e->c[1]); printf("e.c[2]=%d\n",e->c[2]); printf("e.c[1000]=%d\n",e->c[1000]); return 0; } It should always work, could you take a try?

          struct lprocfs_counter {
          __s64 lc_count;
          __s64 lc_min;
          __s64 lc_max;
          __s64 lc_sumsquare;
          /*

          • Every counter has lc_array_sum[0], while lc_array_sum[1] is only
          • for irq context counter, i.e. stats with
          • LPROCFS_STATS_FLAG_IRQ_SAFE flag, its counter need
          • lc_array_sum[1]
            */
            __s64 lc_array_sum[1];
            };

          struct lprocfs_percpu {
          struct lprocfs_counter lp_cntr[0];
          };

          struct lprocfs_stats {
          /* # of counters */
          unsigned short ls_num;
          /* 1 + the biggest cpu # whose ls_percpu slot has been allocated */
          unsigned short ls_biggest_alloc_num;
          enum lprocfs_stats_flags ls_flags;
          /* Lock used when there are no percpu stats areas; For percpu stats,

          • it is used to protect ls_biggest_alloc_num change */
            spinlock_t ls_lock;

          /* has ls_num of counter headers */
          struct lprocfs_counter_header *ls_cnt_header;
          struct lprocfs_percpu *ls_percpu[0];
          };

          the above code is corresponding definition,In lustre/obdclass/lprocfs_status.c ,and in the function "lprocfs_alloc_stats",
          when alloc percpu pointers for all possible cpu slots,the corresponding code:
          LIBCFS_ALLOC(stats, offsetof(typeof(*stats), ls_percpu[num_entry]));

          the access to stats->ls_percpu[i] is safe,but it's not safe to acccess lp_cntr[index],to simplify the analysis for the problem,I will justify my position by giving one example as follow:

          #include<stdio.h>
          struct en{
          int a;
          int b;
          int c[0];
          };

          int main(){

          struct en e;
          e.c[0]=1000;
          e.c[1]=2000;
          e.c[2]=3000;
          e.c[1000]=3000;
          printf("e.c[0]=%d\n",e.c[0]);
          printf("e.c[1]=%d\n",e.c[1]);
          printf("e.c[2]=%d\n",e.c[2]);
          printf("e.c[1000]=%d\n",e.c[1000]);
          return 0;
          }

          the small demo can run correctly,sometimes.but it's not safe.For the same reson,
          The code in lustre2.8.0 such as cntr = &stats->ls_percpu[cpuid]->lp_cntr[index] isn't safe,too.
          Possibly because the value of index is small(it's cannot up to 100,not to 1000),the error "Segmentation fault (core dumped)" occurs rarely.

          邓仕军 ShijunDeng (Inactive) added a comment - struct lprocfs_counter { __s64 lc_count; __s64 lc_min; __s64 lc_max; __s64 lc_sumsquare; /* Every counter has lc_array_sum [0] , while lc_array_sum [1] is only for irq context counter, i.e. stats with LPROCFS_STATS_FLAG_IRQ_SAFE flag, its counter need lc_array_sum [1] */ __s64 lc_array_sum [1] ; }; struct lprocfs_percpu { struct lprocfs_counter lp_cntr [0] ; }; struct lprocfs_stats { /* # of counters */ unsigned short ls_num; /* 1 + the biggest cpu # whose ls_percpu slot has been allocated */ unsigned short ls_biggest_alloc_num; enum lprocfs_stats_flags ls_flags; /* Lock used when there are no percpu stats areas; For percpu stats, it is used to protect ls_biggest_alloc_num change */ spinlock_t ls_lock; /* has ls_num of counter headers */ struct lprocfs_counter_header *ls_cnt_header; struct lprocfs_percpu *ls_percpu [0] ; }; the above code is corresponding definition,In lustre/obdclass/lprocfs_status.c ,and in the function "lprocfs_alloc_stats", when alloc percpu pointers for all possible cpu slots,the corresponding code: LIBCFS_ALLOC(stats, offsetof(typeof(*stats), ls_percpu [num_entry] )); the access to stats->ls_percpu [i] is safe,but it's not safe to acccess lp_cntr [index] ,to simplify the analysis for the problem,I will justify my position by giving one example as follow: #include<stdio.h> struct en{ int a; int b; int c [0] ; }; int main(){ struct en e; e.c [0] =1000; e.c [1] =2000; e.c [2] =3000; e.c [1000] =3000; printf("e.c [0] =%d\n",e.c [0] ); printf("e.c [1] =%d\n",e.c [1] ); printf("e.c [2] =%d\n",e.c [2] ); printf("e.c [1000] =%d\n",e.c [1000] ); return 0; } the small demo can run correctly,sometimes.but it's not safe.For the same reson, The code in lustre2.8.0 such as cntr = &stats->ls_percpu [cpuid] ->lp_cntr [index] isn't safe,too. Possibly because the value of index is small(it's cannot up to 100,not to 1000),the error "Segmentation fault (core dumped)" occurs rarely.
          jhammond John Hammond added a comment -

          lprocfs_stats_counter_get() is only used after successful calls to lprocfs_stats_lock() or checks that stats->ls_percpu[i] is not NULL.

          jhammond John Hammond added a comment - lprocfs_stats_counter_get() is only used after successful calls to lprocfs_stats_lock() or checks that stats->ls_percpu [i] is not NULL.
          pjones Peter Jones added a comment -

          Lai

          Could you please advise on this issue?

          Thanks

          Peter

          pjones Peter Jones added a comment - Lai Could you please advise on this issue? Thanks Peter

          People

            laisiyao Lai Siyao
            邓仕军 ShijunDeng (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: