[LU-300] Oops in cl_page_put() during execve()/page-fault on a binary mapped from a Lustre-filesystem and executed by a parallel application Created: 10/May/11 Updated: 31/May/11 Resolved: 31/May/11 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.0.0 |
| Fix Version/s: | Lustre 2.1.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Sebastien Buisson (Inactive) | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 5034 |
| Description |
|
Hi, During parallel applications execution, either mapping their binary or some of their dynamic-libs from Lustre, CEA at Tera-100 frequently encounters Lustre clients crashes with the following sample stack: ===================================== Further crash dump analysis clearly indicates that in vvp_io_fault_fini() routine, io->u.ci_fault.ft_page is found non-NULL and thus passed to cl_page_put(). The problem is this pointer is not a valid address, but a simple integer instead (or maybe a timestamp), whereas we have ci_type == CIT_FAULT. I add that the customer is running with the fix from Sebastien. |
| Comments |
| Comment by Peter Jones [ 10/May/11 ] |
|
Oleg Could you please look into this issue? Thanks Peter |
| Comment by Peter Jones [ 10/May/11 ] |
|
Jinshan will take a look at this one |
| Comment by Jinshan Xiong (Inactive) [ 10/May/11 ] |
|
This problem is due to unintialization of cl_io in page fault path. Please try this patch: http://review.whamcloud.com/530 |
| Comment by Sebastien Buisson (Inactive) [ 11/May/11 ] |
|
Hi Jinshan, Thanks for this quick answer. As the customer cluster is in production, I would need at very least one positive inspection on your patch before I can deliver an emergency fix to CEA. TIA, |
| Comment by Build Master (Inactive) [ 18/May/11 ] |
|
Integrated in Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e
|
| Comment by Build Master (Inactive) [ 18/May/11 ] |
|
Integrated in Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e
|
| Comment by Build Master (Inactive) [ 18/May/11 ] |
|
Integrated in Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e
|
| Comment by Build Master (Inactive) [ 18/May/11 ] |
|
Integrated in Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e
|
| Comment by Build Master (Inactive) [ 18/May/11 ] |
|
Integrated in Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e
|
| Comment by Build Master (Inactive) [ 18/May/11 ] |
|
Integrated in Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e
|
| Comment by Build Master (Inactive) [ 18/May/11 ] |
|
Integrated in Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e
|
| Comment by Build Master (Inactive) [ 18/May/11 ] |
|
Integrated in Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e
|
| Comment by Build Master (Inactive) [ 18/May/11 ] |
|
Integrated in Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e
|
| Comment by Build Master (Inactive) [ 18/May/11 ] |
|
Integrated in Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e
|
| Comment by Build Master (Inactive) [ 18/May/11 ] |
|
Integrated in Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e
|
| Comment by Build Master (Inactive) [ 18/May/11 ] |
|
Integrated in Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e
|
| Comment by Build Master (Inactive) [ 18/May/11 ] |
|
Integrated in Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e
|
| Comment by Build Master (Inactive) [ 18/May/11 ] |
|
Integrated in Oleg Drokin : 15ac26cb2fc0b9b4c6c4507d8cdab683b9b40b7e
|
| Comment by Peter Jones [ 30/May/11 ] |
|
Sebastien How has this patch fared running in production at CEA? Thanks Peter |
| Comment by Sebastien Buisson (Inactive) [ 30/May/11 ] |
|
As far as I know, no new occurrence of this bug since last Tuesday. We will have more news from CEA by the end of the week. Sebastien. |
| Comment by Peter Jones [ 31/May/11 ] |
|
Thanks Sebastien. In that case I will mark this ticket as resolved for now. The patch has landed on master for almost two weeks now with no observed side-effects and this issue was previously appearing daily at CEA before the patch was applied. If a problem is found with the patch at CEA then the ticket can simply be reopened. |