Details

    • Technical task
    • Resolution: Fixed
    • Critical
    • Lustre 2.6.0, Lustre 2.5.1
    • Lustre 2.5.0
    • 9919

    Description

      In the restore case of hsm_cdt_request_completed(), if the copytool returned success but the layout swap fails then we get an unreadable file with HS_RELEASED clear but LOV_PATTERN_F_RELEASED set.

      Perhaps the new HSM attributes should be applied to the volatile object before layout swap, and hsm_swap_layouts() should call mo_swap_layouts() with SWAP_LAYOUTS_MDS_HSM set.

      Attachments

        Activity

          [LU-3834] hsm_cdt_request_completed() may clear HS_RELEASED on failed restore

          I am wondering if I should also add some error injection to simulate SWAP_LAYOUT failure during restore ??

          I will also push a new patch-set #8 to address John's last comment and convert to usual error handling style.

          bfaccini Bruno Faccini (Inactive) added a comment - I am wondering if I should also add some error injection to simulate SWAP_LAYOUT failure during restore ?? I will also push a new patch-set #8 to address John's last comment and convert to usual error handling style.

          I found a possible bug in my original patch version causing layout-lock not to be released when restore is canceled … Just submitted patch-set #8 to fix this, will see if is passes auto-tests (particularly sanity-hsm/test_33 which was timing-out due to md5sum process never ending!!…).

          bfaccini Bruno Faccini (Inactive) added a comment - I found a possible bug in my original patch version causing layout-lock not to be released when restore is canceled … Just submitted patch-set #8 to fix this, will see if is passes auto-tests (particularly sanity-hsm/test_33 which was timing-out due to md5sum process never ending!!…).

          Latest patch-set #6 auto-tests failures appear not related to this patch but to multiple issues already addressed in others tickets (LU-4093, LU-4086, …). Some of them have heir patches that landed until now, so I will rebase it again …

          bfaccini Bruno Faccini (Inactive) added a comment - Latest patch-set #6 auto-tests failures appear not related to this patch but to multiple issues already addressed in others tickets ( LU-4093 , LU-4086 , …). Some of them have heir patches that landed until now, so I will rebase it again …

          1st patch attempt is at http://review.whamcloud.com/7631.
          Unfortunately it failed in multiple auto-tests and needs at least to re-base.
          But also failed in several sub-tests of sanity-hsm, which is highly suspect ...
          Currently under investigations.

          bfaccini Bruno Faccini (Inactive) added a comment - 1st patch attempt is at http://review.whamcloud.com/7631 . Unfortunately it failed in multiple auto-tests and needs at least to re-base. But also failed in several sub-tests of sanity-hsm, which is highly suspect ... Currently under investigations.
          jhammond John Hammond added a comment -

          > _ always call mo_swap_layouts() with SWAP_LAYOUTS_MDS_HSM flag from hsm_swap_layouts(), since (actually?) hsm_swap_layouts() is only called from mdt_hsm_update_request_state() for a RESTORE op.

          This seems better to me.

          jhammond John Hammond added a comment - > _ always call mo_swap_layouts() with SWAP_LAYOUTS_MDS_HSM flag from hsm_swap_layouts(), since (actually?) hsm_swap_layouts() is only called from mdt_hsm_update_request_state() for a RESTORE op. This seems better to me.

          Ok, I see where this can be fixed now, thank's John. But now, and to save time before submit patch, what is the preferred way to do this ? :

          _ always call mo_swap_layouts() with SWAP_LAYOUTS_MDS_HSM flag from hsm_swap_layouts(), since (actually?) hsm_swap_layouts() is only called from mdt_hsm_update_request_state() for a RESTORE op.

          _ add a new flags to hsm_swap_layouts() to enable SWAP_LAYOUTS_MDS_HSM flag use or not during call to mo_swap_layouts().

          bfaccini Bruno Faccini (Inactive) added a comment - Ok, I see where this can be fixed now, thank's John. But now, and to save time before submit patch, what is the preferred way to do this ? : _ always call mo_swap_layouts() with SWAP_LAYOUTS_MDS_HSM flag from hsm_swap_layouts(), since (actually?) hsm_swap_layouts() is only called from mdt_hsm_update_request_state() for a RESTORE op. _ add a new flags to hsm_swap_layouts() to enable SWAP_LAYOUTS_MDS_HSM flag use or not during call to mo_swap_layouts().

          If this is a simple fix, then we can work out a patch for this. Otherwise I'd like to put the resource on something else because it's unlikely for swap_layout to fail anyway.

          jay Jinshan Xiong (Inactive) added a comment - If this is a simple fix, then we can work out a patch for this. Otherwise I'd like to put the resource on something else because it's unlikely for swap_layout to fail anyway.
          jhammond John Hammond added a comment -

          Bruno,

          (I am just restating what I said on the call today.)

          In mdt_hsm_release() we call mdd_swap_layouts() with the SWAP_LAYOUTS_MDS_HSM flag, which causes the HSM xattrs to be handled along with the LOV xattrs, and rolls back both xattrs on failure.

          Contrast this with hsm_cdt_request_completed() which calls mdt_hsm_attr_set() and hsm_swap_layouts(). hsm_swap_layouts() then calls mdd_swap_layouts() with no flags. In this case if the layout swap fails then we do not restore the HSM xattr to its previous (released set) state.

          I have not checked but I suspect that may be possible to remodel the handing of restore after that of release and thereby avoid this inconsistency when layout swap fails.

          jhammond John Hammond added a comment - Bruno, (I am just restating what I said on the call today.) In mdt_hsm_release() we call mdd_swap_layouts() with the SWAP_LAYOUTS_MDS_HSM flag, which causes the HSM xattrs to be handled along with the LOV xattrs, and rolls back both xattrs on failure. Contrast this with hsm_cdt_request_completed() which calls mdt_hsm_attr_set() and hsm_swap_layouts(). hsm_swap_layouts() then calls mdd_swap_layouts() with no flags. In this case if the layout swap fails then we do not restore the HSM xattr to its previous (released set) state. I have not checked but I suspect that may be possible to remodel the handing of restore after that of release and thereby avoid this inconsistency when layout swap fails.

          John,
          Can you also detail me the scenario/conditions when you encountered such problem already ? As we discussed I may then be able to reproduce it or do some kind of error injection.

          bfaccini Bruno Faccini (Inactive) added a comment - John, Can you also detail me the scenario/conditions when you encountered such problem already ? As we discussed I may then be able to reproduce it or do some kind of error injection.

          John,
          Sorry, but HSM code is still new for me, so I need some clarifications here ...

          In this particular case of error handling, there is no other option than to return the file to its released state, is'nt it ?
          So is this what you mean by having "hsm_swap_layouts() call mo_swap_layouts() with SWAP_LAYOUTS_MDS_HSM set", as during release ?

          bfaccini Bruno Faccini (Inactive) added a comment - John, Sorry, but HSM code is still new for me, so I need some clarifications here ... In this particular case of error handling, there is no other option than to return the file to its released state, is'nt it ? So is this what you mean by having "hsm_swap_layouts() call mo_swap_layouts() with SWAP_LAYOUTS_MDS_HSM set", as during release ?

          People

            bfaccini Bruno Faccini (Inactive)
            jhammond John Hammond
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: