Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2640

deactivate OSD_EXEC_OP() operation accounting if operation is being undone

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0
    • 4
    • 6174

    Description

      In http://review.whamcloud.com/5046 there was a change to OSD_EXEC_OP() to address problems in MDD operations that fail part-way through, where the undo of earlier operations causes internal osd-ldiskfs declare/execute accounting to fail. For example, in mdd_create()::cleanup, the failed create calls __mdd_index_delete(), mdo_ref_del(), mdo_destroy() in order to clean up the newly created object.

      One proposal is to have some kind of OSD API call/flag/method to mark the transaction handle as being used for rollback, and to disable the ot_declare_op LASSERT() checking in OSD_EXEC_OP() for this case.

      Attachments

        Issue Links

          Activity

            [LU-2640] deactivate OSD_EXEC_OP() operation accounting if operation is being undone

            Do you think the current mechanism is worse than none at all? I think the current accounting can still find some code defects, even if it does not find as many as with th_rollback being set by the caller. In that case, I'd rather leave it in place during development instead of removing it entirely.

            Maybe make this debugging conditional on (LUSTRE_PATCH >= 50 && LUSTRE_PATCH < 90)? That allows us to catch problems during development, but relies only on ldiskfs/jbd accounting during production.

            adilger Andreas Dilger added a comment - Do you think the current mechanism is worse than none at all? I think the current accounting can still find some code defects, even if it does not find as many as with th_rollback being set by the caller. In that case, I'd rather leave it in place during development instead of removing it entirely. Maybe make this debugging conditional on (LUSTRE_PATCH >= 50 && LUSTRE_PATCH < 90)? That allows us to catch problems during development, but relies only on ldiskfs/jbd accounting during production.

            given we do not hit this (except undo cases) so regular paths should be fine from this point of view, I'd suggest to disable the mechanism.
            we still have ldiskfs's internal checks.

            bzzz Alex Zhuravlev added a comment - given we do not hit this (except undo cases) so regular paths should be fine from this point of view, I'd suggest to disable the mechanism. we still have ldiskfs's internal checks.

            it doesn't need to be that detailed, imo. just an array of object and (depending on object type) optype with some argument (hash in case of index, range in case of regular file, xattr) would be enough.

            I am against th_rollback because we'll have to modify number of functions in MDD to implement this. and then this will stay there for long as no one seem to be interested enough.

            at the same time 2.x is doing pretty well w/o this mechanism at all (obviously). I'm tempted to say i regret for introducing this in the first place

            bzzz Alex Zhuravlev added a comment - it doesn't need to be that detailed, imo. just an array of object and (depending on object type) optype with some argument (hash in case of index, range in case of regular file, xattr) would be enough. I am against th_rollback because we'll have to modify number of functions in MDD to implement this. and then this will stay there for long as no one seem to be interested enough. at the same time 2.x is doing pretty well w/o this mechanism at all (obviously). I'm tempted to say i regret for introducing this in the first place

            Alex, while I agree that th_rollback is not strictly needed, I think you two are arguing in circles. Clearly we want to improve the ldiskfs transaction credit debugging in some way. The current mechanism is rather weak in detecting when rollback has started, while setting a flag in the handle is positively indicating that rollback has started.

            Unless you have some better mechanism that we could use in short order, I'm inclined to go with this method for now. The only other method I can think of is adding detailed accounting for every item being modified (e.g. tracking inode numbers, group descriptors, bitmaps, etc) so that the same item can be modified multiple times. However the complexity of that is high and I think it could be deferred until 2.5 if we want to implement it at that point.

            For now, I think th_rollback is an acceptable solution for the time we have left in 2.4, and it can be clearly commented that it is a workaround and reference this bug for a more complete solution in the future.

            adilger Andreas Dilger added a comment - Alex, while I agree that th_rollback is not strictly needed, I think you two are arguing in circles. Clearly we want to improve the ldiskfs transaction credit debugging in some way. The current mechanism is rather weak in detecting when rollback has started, while setting a flag in the handle is positively indicating that rollback has started. Unless you have some better mechanism that we could use in short order, I'm inclined to go with this method for now. The only other method I can think of is adding detailed accounting for every item being modified (e.g. tracking inode numbers, group descriptors, bitmaps, etc) so that the same item can be modified multiple times. However the complexity of that is high and I think it could be deferred until 2.5 if we want to implement it at that point. For now, I think th_rollback is an acceptable solution for the time we have left in 2.4, and it can be clearly commented that it is a workaround and reference this bug for a more complete solution in the future.

            this debugging is in turn needed only because low-level debugging in ldiskfs is rather poor. this is the only reason to have it in the first place.
            with better debugging (which we want to calculate credits better) we likely won't need this flag.

            bzzz Alex Zhuravlev added a comment - this debugging is in turn needed only because low-level debugging in ldiskfs is rather poor. this is the only reason to have it in the first place. with better debugging (which we want to calculate credits better) we likely won't need this flag.

            That depends on how "need" to be defined. From the view of transaction tracing and debugging, it is useful. We may implement such purposes through other ways, but the "thandle::th_rollback" is simple and clean. In fact, the "thandle::th_local" is also used for notification among layers, but not the functional requirement of ldiskfs or DUM. So...

            yong.fan nasf (Inactive) added a comment - That depends on how "need" to be defined. From the view of transaction tracing and debugging, it is useful. We may implement such purposes through other ways, but the "thandle::th_rollback" is simple and clean. In fact, the "thandle::th_local" is also used for notification among layers, but not the functional requirement of ldiskfs or DUM. So...

            the thing is that neither ldiskfs nor dmu need this..

            bzzz Alex Zhuravlev added a comment - the thing is that neither ldiskfs nor dmu need this..

            One idea about the "rollback" notification: introduce a new member thandle::th_rollback, which can be set by MDD and checked by OSD.

            How do you think?

            yong.fan nasf (Inactive) added a comment - One idea about the "rollback" notification: introduce a new member thandle::th_rollback, which can be set by MDD and checked by OSD. How do you think?

            if I remember DMU code correctly, it's fine to declare some range to be modified and then modify it few times - we should be passing internal sanity checks.
            the same applies for ldiskfs for sure.
            given so I think that we don't actually need to declare "rollback" updates as they are working on the same range/hash always.
            notice we probably don't need this mechanism with DMU as it does this internally (when debugging is enabled).

            bzzz Alex Zhuravlev added a comment - if I remember DMU code correctly, it's fine to declare some range to be modified and then modify it few times - we should be passing internal sanity checks. the same applies for ldiskfs for sure. given so I think that we don't actually need to declare "rollback" updates as they are working on the same range/hash always. notice we probably don't need this mechanism with DMU as it does this internally (when debugging is enabled).

            Alex, the 5138 patch was landed as-is to allow later LFSCK patch landings to move forward.

            Oleg wants a better comment explaining what the transaction declaration accounting is doing, so that it can be more easily understood by others.

            Other open questions that need to be discussed:

            • should we implement an OSD API to notify the transaction that there is a rollback in progress?
            • having more accurate transaction accounting is useful for both reducing the transaction credits, and improving the debugging. We do some limited aggregation of declarations for quotas, so that the common case of multiple updates for the same UID/GID do not need too many credits. We could change this to be a more generic mechanism (e.g. identified by FID?) that tracks a few different kinds of updates (e.g. setattr/dirty object, write/append/alloc/punch/truncate, insert/delete item).
            adilger Andreas Dilger added a comment - Alex, the 5138 patch was landed as-is to allow later LFSCK patch landings to move forward. Oleg wants a better comment explaining what the transaction declaration accounting is doing, so that it can be more easily understood by others. Other open questions that need to be discussed: should we implement an OSD API to notify the transaction that there is a rollback in progress? having more accurate transaction accounting is useful for both reducing the transaction credits, and improving the debugging. We do some limited aggregation of declarations for quotas, so that the common case of multiple updates for the same UID/GID do not need too many credits. We could change this to be a more generic mechanism (e.g. identified by FID?) that tracks a few different kinds of updates (e.g. setattr/dirty object, write/append/alloc/punch/truncate, insert/delete item).

            With patch set 4 test looks fine.

            aboyko Alexander Boyko added a comment - With patch set 4 test looks fine.

            People

              yong.fan nasf (Inactive)
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: