[LU-300] Oops in cl_page_put() during execve()/page-fault on a binary mapped from a Lustre-filesystem and executed by a parallel application - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Blocker
Fix Version/s: Lustre 2.1.0
Affects Version/s: Lustre 2.0.0
Labels:
None

Severity:
3
Rank (Obsolete):
5034

Description

Hi,

During parallel applications execution, either mapping their binary or some of their dynamic-libs from Lustre, CEA at Tera-100 frequently encounters Lustre clients crashes with the following sample stack:

=====================================
crash_kexec()
oops_end()
no_context()
__bad_area_nosemaphore()
bad_area()
do_page_fault()
page_fault()
[exception RIP: cl_page_put+29]
vvp_io_fault_fini()
cl_io_fini()
ll_fault()
__do_fault()
hadle_pte_fault()
handle_mm_fault()
do_page_fault()
page_fault()
=====================================

Further crash dump analysis clearly indicates that in vvp_io_fault_fini() routine, io->u.ci_fault.ft_page is found non-NULL and thus passed to cl_page_put(). The problem is this pointer is not a valid address, but a simple integer instead (or maybe a timestamp), whereas we have ci_type == CIT_FAULT.

I add that the customer is running with the fix from ~~LU-122~~.
This problem is pretty annoying as it disturbs regular cluster production by preventing normal job launch.

Sebastien.

Attachments

Activity

[LU-300] Oops in cl_page_put() during execve()/page-fault on a binary mapped from a Lustre-filesystem and executed by a parallel application

Peter Jones added a comment - 31/May/11 5:24 AM

Thanks Sebastien. In that case I will mark this ticket as resolved for now. The patch has landed on master for almost two weeks now with no observed side-effects and this issue was previously appearing daily at CEA before the patch was applied. If a problem is found with the patch at CEA then the ticket can simply be reopened.

Peter Jones added a comment - 31/May/11 5:24 AM Thanks Sebastien. In that case I will mark this ticket as resolved for now. The patch has landed on master for almost two weeks now with no observed side-effects and this issue was previously appearing daily at CEA before the patch was applied. If a problem is found with the patch at CEA then the ticket can simply be reopened.

Sebastien Buisson (Inactive) added a comment - 30/May/11 11:50 PM

As far as I know, no new occurrence of this bug since last Tuesday. We will have more news from CEA by the end of the week.

Sebastien.

Sebastien Buisson (Inactive) added a comment - 30/May/11 11:50 PM As far as I know, no new occurrence of this bug since last Tuesday. We will have more news from CEA by the end of the week. Sebastien.

Peter Jones added a comment - 30/May/11 3:53 PM

Sebastien

How has this patch fared running in production at CEA?

Thanks

Peter

Peter Jones added a comment - 30/May/11 3:53 PM Sebastien How has this patch fared running in production at CEA? Thanks Peter

Build Master (Inactive) added a comment - 18/May/11 5:13 PM

Integrated in lustre-master » i686,server,el6,inkernel #122
~~LU-300~~: Oops in cl_page_put of page fault path

Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e
Files :

lustre/llite/file.c
lustre/liblustre/rw.c
lustre/include/lclient.h
lustre/llite/llite_mmap.c
lustre/lclient/glimpse.c
lustre/lclient/lcommon_misc.c
lustre/lclient/lcommon_cl.c
lustre/llite/rw.c

Build Master (Inactive) added a comment - 18/May/11 5:13 PM Integrated in lustre-master » i686,server,el6,inkernel #122 LU-300 : Oops in cl_page_put of page fault path Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e Files : lustre/llite/file.c lustre/liblustre/rw.c lustre/include/lclient.h lustre/llite/llite_mmap.c lustre/lclient/glimpse.c lustre/lclient/lcommon_misc.c lustre/lclient/lcommon_cl.c lustre/llite/rw.c

Build Master (Inactive) added a comment - 18/May/11 4:58 PM

Integrated in lustre-master » x86_64,server,el6,inkernel #122
~~LU-300~~: Oops in cl_page_put of page fault path

Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e
Files :

lustre/liblustre/rw.c
lustre/include/lclient.h
lustre/llite/llite_mmap.c
lustre/llite/file.c
lustre/lclient/lcommon_misc.c
lustre/lclient/glimpse.c
lustre/llite/rw.c
lustre/lclient/lcommon_cl.c

Build Master (Inactive) added a comment - 18/May/11 4:58 PM Integrated in lustre-master » x86_64,server,el6,inkernel #122 LU-300 : Oops in cl_page_put of page fault path Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e Files : lustre/liblustre/rw.c lustre/include/lclient.h lustre/llite/llite_mmap.c lustre/llite/file.c lustre/lclient/lcommon_misc.c lustre/lclient/glimpse.c lustre/llite/rw.c lustre/lclient/lcommon_cl.c

Build Master (Inactive) added a comment - 18/May/11 4:58 PM

Integrated in lustre-master » x86_64,server,el5,ofa #122
~~LU-300~~: Oops in cl_page_put of page fault path

Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e
Files :

lustre/include/lclient.h
lustre/llite/llite_mmap.c
lustre/lclient/lcommon_cl.c
lustre/lclient/glimpse.c
lustre/llite/rw.c
lustre/llite/file.c
lustre/liblustre/rw.c
lustre/lclient/lcommon_misc.c

Build Master (Inactive) added a comment - 18/May/11 4:58 PM Integrated in lustre-master » x86_64,server,el5,ofa #122 LU-300 : Oops in cl_page_put of page fault path Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e Files : lustre/include/lclient.h lustre/llite/llite_mmap.c lustre/lclient/lcommon_cl.c lustre/lclient/glimpse.c lustre/llite/rw.c lustre/llite/file.c lustre/liblustre/rw.c lustre/lclient/lcommon_misc.c

Build Master (Inactive) added a comment - 18/May/11 4:53 PM

Integrated in lustre-master » i686,server,el5,ofa #122
~~LU-300~~: Oops in cl_page_put of page fault path

Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e
Files :

lustre/llite/llite_mmap.c
lustre/liblustre/rw.c
lustre/llite/file.c
lustre/lclient/glimpse.c
lustre/llite/rw.c
lustre/lclient/lcommon_cl.c
lustre/include/lclient.h
lustre/lclient/lcommon_misc.c

Build Master (Inactive) added a comment - 18/May/11 4:53 PM Integrated in lustre-master » i686,server,el5,ofa #122 LU-300 : Oops in cl_page_put of page fault path Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e Files : lustre/llite/llite_mmap.c lustre/liblustre/rw.c lustre/llite/file.c lustre/lclient/glimpse.c lustre/llite/rw.c lustre/lclient/lcommon_cl.c lustre/include/lclient.h lustre/lclient/lcommon_misc.c

Build Master (Inactive) added a comment - 18/May/11 4:52 PM

Integrated in lustre-master » i686,server,el5,inkernel #122
~~LU-300~~: Oops in cl_page_put of page fault path

Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e
Files :

lustre/lclient/glimpse.c
lustre/liblustre/rw.c
lustre/llite/rw.c
lustre/llite/llite_mmap.c
lustre/include/lclient.h
lustre/lclient/lcommon_misc.c
lustre/lclient/lcommon_cl.c
lustre/llite/file.c

Build Master (Inactive) added a comment - 18/May/11 4:52 PM Integrated in lustre-master » i686,server,el5,inkernel #122 LU-300 : Oops in cl_page_put of page fault path Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e Files : lustre/lclient/glimpse.c lustre/liblustre/rw.c lustre/llite/rw.c lustre/llite/llite_mmap.c lustre/include/lclient.h lustre/lclient/lcommon_misc.c lustre/lclient/lcommon_cl.c lustre/llite/file.c

Build Master (Inactive) added a comment - 18/May/11 4:51 PM

Integrated in lustre-master » x86_64,client,ubuntu1004,ofa #122
~~LU-300~~: Oops in cl_page_put of page fault path

Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e
Files :

lustre/llite/file.c
lustre/llite/rw.c
lustre/llite/llite_mmap.c
lustre/lclient/lcommon_misc.c
lustre/lclient/glimpse.c
lustre/lclient/lcommon_cl.c
lustre/include/lclient.h
lustre/liblustre/rw.c

Build Master (Inactive) added a comment - 18/May/11 4:51 PM Integrated in lustre-master » x86_64,client,ubuntu1004,ofa #122 LU-300 : Oops in cl_page_put of page fault path Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e Files : lustre/llite/file.c lustre/llite/rw.c lustre/llite/llite_mmap.c lustre/lclient/lcommon_misc.c lustre/lclient/glimpse.c lustre/lclient/lcommon_cl.c lustre/include/lclient.h lustre/liblustre/rw.c

Build Master (Inactive) added a comment - 18/May/11 4:49 PM

Integrated in lustre-master » i686,client,el6,inkernel #122
~~LU-300~~: Oops in cl_page_put of page fault path

Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e
Files :

lustre/lclient/lcommon_misc.c
lustre/llite/file.c
lustre/lclient/glimpse.c
lustre/liblustre/rw.c
lustre/llite/rw.c
lustre/lclient/lcommon_cl.c
lustre/include/lclient.h
lustre/llite/llite_mmap.c

Build Master (Inactive) added a comment - 18/May/11 4:49 PM Integrated in lustre-master » i686,client,el6,inkernel #122 LU-300 : Oops in cl_page_put of page fault path Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e Files : lustre/lclient/lcommon_misc.c lustre/llite/file.c lustre/lclient/glimpse.c lustre/liblustre/rw.c lustre/llite/rw.c lustre/lclient/lcommon_cl.c lustre/include/lclient.h lustre/llite/llite_mmap.c

People

Assignee:: Jinshan Xiong (Inactive)

Reporter:: Sebastien Buisson (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 10/May/11 1:44 AM

Updated:: 31/May/11 5:24 AM

Resolved:: 31/May/11 5:24 AM