Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.4.0
-
4
-
6174
Description
In http://review.whamcloud.com/5046 there was a change to OSD_EXEC_OP() to address problems in MDD operations that fail part-way through, where the undo of earlier operations causes internal osd-ldiskfs declare/execute accounting to fail. For example, in mdd_create()::cleanup, the failed create calls __mdd_index_delete(), mdo_ref_del(), mdo_destroy() in order to clean up the newly created object.
One proposal is to have some kind of OSD API call/flag/method to mark the transaction handle as being used for rollback, and to disable the ot_declare_op LASSERT() checking in OSD_EXEC_OP() for this case.
Attachments
Issue Links
- is duplicated by
-
LU-2668 mdd_create() failed at error path with osd_object_ref_del() ASSERTION( (oh)->ot_declare_ref_del > 0 ) failed
-
- Resolved
-
Do you think the current mechanism is worse than none at all? I think the current accounting can still find some code defects, even if it does not find as many as with th_rollback being set by the caller. In that case, I'd rather leave it in place during development instead of removing it entirely.
Maybe make this debugging conditional on (LUSTRE_PATCH >= 50 && LUSTRE_PATCH < 90)? That allows us to catch problems during development, but relies only on ldiskfs/jbd accounting during production.