[LU-442] Client LBUG - (osc_request.c:3087:osc_set_lock_data_with_check()) ASSERTION(lock->l_ast_data == NULL || lock->l_ast_data == data) failed Created: 21/Jun/11 Updated: 19/Nov/12 Resolved: 28/Jun/11 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.0.0 |
| Fix Version/s: | Lustre 2.1.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Alexandre Louvet | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 4973 |
| Description |
|
This LBUG has been triggered on a node running a parallel application where at least 2 tasks were reading the same Lustre file. _ the LBUG/Assert occured/failed because (lock->l_ast_data != data) in osc_set_lock_data_with_check() when it was just found/checked equal/ok in osc_enqueue_base(). PID: 24833 TASK: ffff88086c05f340 CPU: 8 COMMAND: "gonel_Bordelman" _ the 2nd task has about the same stack : PID: 24834 TASK: ffff88086c05eb20 CPU: 24 COMMAND: "gonel_Bordelman" _ this indicate that the 2 tasks are executing the same code path in parallel, where they both may have found/elected the same ldlm_lock struct with (l_ast_data == NULL), then called osc_set_lock_data_with_check() to set l_ast_data (under "late" l_lock/osc_ast_guard spin-locks protection !) but since the same (l_ast_data == data) check/assertion is done there, the 2nd task should LBUG ... _ so this seems that the l_lock/osc_ast_guard spin-locks protection has to be done at the osc_enqueue_base() level, around the "if (matched->l_ast_data == NULL || matched->l_ast_data == einfo->ei_cbdata)" statement, instead in osc_set_lock_data_with_check(). |
| Comments |
| Comment by Oleg Drokin [ 21/Jun/11 ] |
|
I totally agree with you here that the locking is inadequate and should be extended to around l_ast_data check for it to make sense. |
| Comment by Peter Jones [ 21/Jun/11 ] |
|
Niu Could you please work on this one as your to piority Thanks Peter |
| Comment by Sebastien Buisson (Inactive) [ 21/Jun/11 ] |
|
Hi, I will propose CEA to give a try to a test package in which we will move the spin-lock protection according to Oleg's comment and Bruno's initial suggestion. Sebastien. |
| Comment by Niu Yawei (Inactive) [ 22/Jun/11 ] |
|
Hi, Sebastien You can try the patch at http://review.whamcloud.com/993, thanks. |
| Comment by Sebastien Buisson (Inactive) [ 22/Jun/11 ] |
|
Hi, Unfortunately at the moment this problem is not showing frequently at CEA, and there is no identified reproducer for it. So it will not be feasible to give a try to an experimental patch. Cheers, |
| Comment by Niu Yawei (Inactive) [ 23/Jun/11 ] |
|
Hi, Sebastien Ok, let's wait for the patch inspection finished. |
| Comment by Build Master (Inactive) [ 28/Jun/11 ] |
|
Integrated in Oleg Drokin : 50dd2cc62cf86f172f515480e7a6b1f0cdfc1768
|
| Comment by Build Master (Inactive) [ 28/Jun/11 ] |
|
Integrated in Oleg Drokin : 50dd2cc62cf86f172f515480e7a6b1f0cdfc1768
|
| Comment by Peter Jones [ 28/Jun/11 ] |
|
Landed for 2.1 |
| Comment by Build Master (Inactive) [ 28/Jun/11 ] |
|
Integrated in Oleg Drokin : 50dd2cc62cf86f172f515480e7a6b1f0cdfc1768
|
| Comment by Build Master (Inactive) [ 28/Jun/11 ] |
|
Integrated in Oleg Drokin : 50dd2cc62cf86f172f515480e7a6b1f0cdfc1768
|
| Comment by Build Master (Inactive) [ 28/Jun/11 ] |
|
Integrated in Oleg Drokin : 50dd2cc62cf86f172f515480e7a6b1f0cdfc1768
|
| Comment by Build Master (Inactive) [ 28/Jun/11 ] |
|
Integrated in Oleg Drokin : 50dd2cc62cf86f172f515480e7a6b1f0cdfc1768
|
| Comment by Build Master (Inactive) [ 28/Jun/11 ] |
|
Integrated in Oleg Drokin : 50dd2cc62cf86f172f515480e7a6b1f0cdfc1768
|
| Comment by Build Master (Inactive) [ 28/Jun/11 ] |
|
Integrated in Oleg Drokin : 50dd2cc62cf86f172f515480e7a6b1f0cdfc1768
|
| Comment by Build Master (Inactive) [ 28/Jun/11 ] |
|
Integrated in Oleg Drokin : 50dd2cc62cf86f172f515480e7a6b1f0cdfc1768
|
| Comment by Build Master (Inactive) [ 28/Jun/11 ] |
|
Integrated in Oleg Drokin : 50dd2cc62cf86f172f515480e7a6b1f0cdfc1768
|
| Comment by Build Master (Inactive) [ 28/Jun/11 ] |
|
Integrated in Oleg Drokin : 50dd2cc62cf86f172f515480e7a6b1f0cdfc1768
|
| Comment by Build Master (Inactive) [ 28/Jun/11 ] |
|
Integrated in Oleg Drokin : 50dd2cc62cf86f172f515480e7a6b1f0cdfc1768
|
| Comment by Build Master (Inactive) [ 28/Jun/11 ] |
|
Integrated in Oleg Drokin : 50dd2cc62cf86f172f515480e7a6b1f0cdfc1768
|
| Comment by Build Master (Inactive) [ 28/Jun/11 ] |
|
Integrated in Oleg Drokin : 50dd2cc62cf86f172f515480e7a6b1f0cdfc1768
|