[LU-1055] recovery-mds osd_handler.c:1966:osd_declare_object_destroy()) ASSERTION(!lu_object_is_dying(dt->do_lu.lo_header)) failed Created: 30/Jan/12 Updated: 27/Mar/12 Resolved: 14/Feb/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.2.0 |
| Fix Version/s: | Lustre 2.2.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Cliff White (Inactive) | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Hyperion - Chaos5 clients and servers |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 4736 |
| Description |
|
Running recovery-scale flavor MDS on Hyperion, MDS failed after 3 failovers - MDT shows as mounted via df, but unmounted in /proc/mounts. mount -f appears to have cleared the mount. |
| Comments |
| Comment by Oleg Drokin [ 30/Jan/12 ] |
|
Alex, I think this is another one of the problems surfaced after adding object destroy prototyping in osd api, similar in nature to |
| Comment by Alex Zhuravlev [ 31/Jan/12 ] |
|
at the moment the theory is the following: once recovery is completed, MDS starts to scan PENDING/ to cleanup orphans; at the same time some clients unlink/close a file. so, orph_index_iterate() races with mdd_close(). can somebody confirm this race is possible? |
| Comment by Alex Zhuravlev [ 31/Jan/12 ] |
|
if the theory is right, then original design relies on a result of direntry removal from PENDING/ then the following patch should be enough: diff --git a/lustre/osd-ldiskfs/osd_handler.c b/lustre/osd-ldiskfs/osd_handler.
OSD_DECLARE_OP(oh, destroy); OSD_EXEC_OP(th, destroy); |
| Comment by Peter Jones [ 02/Feb/12 ] |
|
Niu Could you please take care of this one? Thanks Peter |
| Comment by Niu Yawei (Inactive) [ 03/Feb/12 ] |
|
http://review.whamcloud.com/2083 I agree with Alex, the orphan cleanup after recovery could race with the client close, move the LASSERT into real destroy function which protected by lock should be fine. |
| Comment by Build Master (Inactive) [ 14/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Peter Jones [ 14/Feb/12 ] |
|
Landed for 2.2 |
| Comment by Build Master (Inactive) [ 14/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 14/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 14/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 14/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 14/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 14/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 14/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 14/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 14/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 14/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 14/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 14/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 14/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 17/Feb/12 ] |
|
Integrated in Result = FAILURE
|
| Comment by Build Master (Inactive) [ 17/Feb/12 ] |
|
Integrated in Result = FAILURE
|
| Comment by Build Master (Inactive) [ 17/Feb/12 ] |
|
Integrated in Result = ABORTED
|