Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-300

Oops in cl_page_put() during execve()/page-fault on a binary mapped from a Lustre-filesystem and executed by a parallel application

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.1.0
    • Lustre 2.0.0
    • None
    • 3
    • 5034

    Description

      Hi,

      During parallel applications execution, either mapping their binary or some of their dynamic-libs from Lustre, CEA at Tera-100 frequently encounters Lustre clients crashes with the following sample stack:

      =====================================
      crash_kexec()
      oops_end()
      no_context()
      __bad_area_nosemaphore()
      bad_area()
      do_page_fault()
      page_fault()
      [exception RIP: cl_page_put+29]
      vvp_io_fault_fini()
      cl_io_fini()
      ll_fault()
      __do_fault()
      hadle_pte_fault()
      handle_mm_fault()
      do_page_fault()
      page_fault()
      =====================================

      Further crash dump analysis clearly indicates that in vvp_io_fault_fini() routine, io->u.ci_fault.ft_page is found non-NULL and thus passed to cl_page_put(). The problem is this pointer is not a valid address, but a simple integer instead (or maybe a timestamp), whereas we have ci_type == CIT_FAULT.

      I add that the customer is running with the fix from LU-122.
      This problem is pretty annoying as it disturbs regular cluster production by preventing normal job launch.

      Sebastien.

      Attachments

        Activity

          [LU-300] Oops in cl_page_put() during execve()/page-fault on a binary mapped from a Lustre-filesystem and executed by a parallel application
          pjones Peter Jones added a comment -

          Thanks Sebastien. In that case I will mark this ticket as resolved for now. The patch has landed on master for almost two weeks now with no observed side-effects and this issue was previously appearing daily at CEA before the patch was applied. If a problem is found with the patch at CEA then the ticket can simply be reopened.

          pjones Peter Jones added a comment - Thanks Sebastien. In that case I will mark this ticket as resolved for now. The patch has landed on master for almost two weeks now with no observed side-effects and this issue was previously appearing daily at CEA before the patch was applied. If a problem is found with the patch at CEA then the ticket can simply be reopened.

          As far as I know, no new occurrence of this bug since last Tuesday. We will have more news from CEA by the end of the week.

          Sebastien.

          sebastien.buisson Sebastien Buisson (Inactive) added a comment - As far as I know, no new occurrence of this bug since last Tuesday. We will have more news from CEA by the end of the week. Sebastien.
          pjones Peter Jones added a comment -

          Sebastien

          How has this patch fared running in production at CEA?

          Thanks

          Peter

          pjones Peter Jones added a comment - Sebastien How has this patch fared running in production at CEA? Thanks Peter

          Integrated in lustre-master » i686,server,el6,inkernel #122
          LU-300: Oops in cl_page_put of page fault path

          Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e
          Files :

          • lustre/llite/file.c
          • lustre/liblustre/rw.c
          • lustre/include/lclient.h
          • lustre/llite/llite_mmap.c
          • lustre/lclient/glimpse.c
          • lustre/lclient/lcommon_misc.c
          • lustre/lclient/lcommon_cl.c
          • lustre/llite/rw.c
          hudson Build Master (Inactive) added a comment - Integrated in lustre-master » i686,server,el6,inkernel #122 LU-300 : Oops in cl_page_put of page fault path Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e Files : lustre/llite/file.c lustre/liblustre/rw.c lustre/include/lclient.h lustre/llite/llite_mmap.c lustre/lclient/glimpse.c lustre/lclient/lcommon_misc.c lustre/lclient/lcommon_cl.c lustre/llite/rw.c

          Integrated in lustre-master » x86_64,server,el6,inkernel #122
          LU-300: Oops in cl_page_put of page fault path

          Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e
          Files :

          • lustre/liblustre/rw.c
          • lustre/include/lclient.h
          • lustre/llite/llite_mmap.c
          • lustre/llite/file.c
          • lustre/lclient/lcommon_misc.c
          • lustre/lclient/glimpse.c
          • lustre/llite/rw.c
          • lustre/lclient/lcommon_cl.c
          hudson Build Master (Inactive) added a comment - Integrated in lustre-master » x86_64,server,el6,inkernel #122 LU-300 : Oops in cl_page_put of page fault path Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e Files : lustre/liblustre/rw.c lustre/include/lclient.h lustre/llite/llite_mmap.c lustre/llite/file.c lustre/lclient/lcommon_misc.c lustre/lclient/glimpse.c lustre/llite/rw.c lustre/lclient/lcommon_cl.c

          Integrated in lustre-master » x86_64,server,el5,ofa #122
          LU-300: Oops in cl_page_put of page fault path

          Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e
          Files :

          • lustre/include/lclient.h
          • lustre/llite/llite_mmap.c
          • lustre/lclient/lcommon_cl.c
          • lustre/lclient/glimpse.c
          • lustre/llite/rw.c
          • lustre/llite/file.c
          • lustre/liblustre/rw.c
          • lustre/lclient/lcommon_misc.c
          hudson Build Master (Inactive) added a comment - Integrated in lustre-master » x86_64,server,el5,ofa #122 LU-300 : Oops in cl_page_put of page fault path Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e Files : lustre/include/lclient.h lustre/llite/llite_mmap.c lustre/lclient/lcommon_cl.c lustre/lclient/glimpse.c lustre/llite/rw.c lustre/llite/file.c lustre/liblustre/rw.c lustre/lclient/lcommon_misc.c

          Integrated in lustre-master » i686,server,el5,ofa #122
          LU-300: Oops in cl_page_put of page fault path

          Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e
          Files :

          • lustre/llite/llite_mmap.c
          • lustre/liblustre/rw.c
          • lustre/llite/file.c
          • lustre/lclient/glimpse.c
          • lustre/llite/rw.c
          • lustre/lclient/lcommon_cl.c
          • lustre/include/lclient.h
          • lustre/lclient/lcommon_misc.c
          hudson Build Master (Inactive) added a comment - Integrated in lustre-master » i686,server,el5,ofa #122 LU-300 : Oops in cl_page_put of page fault path Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e Files : lustre/llite/llite_mmap.c lustre/liblustre/rw.c lustre/llite/file.c lustre/lclient/glimpse.c lustre/llite/rw.c lustre/lclient/lcommon_cl.c lustre/include/lclient.h lustre/lclient/lcommon_misc.c

          Integrated in lustre-master » i686,server,el5,inkernel #122
          LU-300: Oops in cl_page_put of page fault path

          Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e
          Files :

          • lustre/lclient/glimpse.c
          • lustre/liblustre/rw.c
          • lustre/llite/rw.c
          • lustre/llite/llite_mmap.c
          • lustre/include/lclient.h
          • lustre/lclient/lcommon_misc.c
          • lustre/lclient/lcommon_cl.c
          • lustre/llite/file.c
          hudson Build Master (Inactive) added a comment - Integrated in lustre-master » i686,server,el5,inkernel #122 LU-300 : Oops in cl_page_put of page fault path Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e Files : lustre/lclient/glimpse.c lustre/liblustre/rw.c lustre/llite/rw.c lustre/llite/llite_mmap.c lustre/include/lclient.h lustre/lclient/lcommon_misc.c lustre/lclient/lcommon_cl.c lustre/llite/file.c

          Integrated in lustre-master » x86_64,client,ubuntu1004,ofa #122
          LU-300: Oops in cl_page_put of page fault path

          Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e
          Files :

          • lustre/llite/file.c
          • lustre/llite/rw.c
          • lustre/llite/llite_mmap.c
          • lustre/lclient/lcommon_misc.c
          • lustre/lclient/glimpse.c
          • lustre/lclient/lcommon_cl.c
          • lustre/include/lclient.h
          • lustre/liblustre/rw.c
          hudson Build Master (Inactive) added a comment - Integrated in lustre-master » x86_64,client,ubuntu1004,ofa #122 LU-300 : Oops in cl_page_put of page fault path Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e Files : lustre/llite/file.c lustre/llite/rw.c lustre/llite/llite_mmap.c lustre/lclient/lcommon_misc.c lustre/lclient/glimpse.c lustre/lclient/lcommon_cl.c lustre/include/lclient.h lustre/liblustre/rw.c

          Integrated in lustre-master » i686,client,el6,inkernel #122
          LU-300: Oops in cl_page_put of page fault path

          Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e
          Files :

          • lustre/lclient/lcommon_misc.c
          • lustre/llite/file.c
          • lustre/lclient/glimpse.c
          • lustre/liblustre/rw.c
          • lustre/llite/rw.c
          • lustre/lclient/lcommon_cl.c
          • lustre/include/lclient.h
          • lustre/llite/llite_mmap.c
          hudson Build Master (Inactive) added a comment - Integrated in lustre-master » i686,client,el6,inkernel #122 LU-300 : Oops in cl_page_put of page fault path Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e Files : lustre/lclient/lcommon_misc.c lustre/llite/file.c lustre/lclient/glimpse.c lustre/liblustre/rw.c lustre/llite/rw.c lustre/lclient/lcommon_cl.c lustre/include/lclient.h lustre/llite/llite_mmap.c

          People

            jay Jinshan Xiong (Inactive)
            sebastien.buisson Sebastien Buisson (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: