Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4640

Last unlink should trigger HSM remove by default

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.10.0
    • Lustre 2.5.0
    • 3
    • 12691

    Description

      In current code, when an archived file is removed from the file system, no HSM request is triggered. We rely on Changelogs and RobinHood reading them to have it sending corresponding hsm_remove requests to clean those orphans in HSM backend. This is done in purpose, to provide a way to implement "soft unlink". RobinHood will remove the file in the backend after a grace time. During this time, Admins could restore the file from the HSM if they want to (using import feature).

      Requiring a RobinHood setup to handle this cleaning is a too big limitation. We should consider modifying this behaviour.

      By default, Lustre should automatically add a HSM_REMOVE request for any last unlink. This way, no file will be leaked in the archive.
      A tunable should be added to disable this behaviour (should we add this to hsm_policy?) and go back to a mode where an external component is responsible for tracking UNLINK changelogs and add hsm_remove requests when needed (Robinhood) (current behaviour).

      Attachments

        Issue Links

          Activity

            [LU-4640] Last unlink should trigger HSM remove by default

            Well the problem comes from the time that this patch has waited to be landed, when in the meantime patch for LU-7136 was, and thus changed sanity-hsm framework causing this issue ...

            Will push a new version/patch-set of http://review.whamcloud.com/14384/ that will integrate my additional change for LU-7881 (http://review.whamcloud.com/#/c/18919/, "use new functions to kill and verify CT death" ).

            bfaccini Bruno Faccini (Inactive) added a comment - Well the problem comes from the time that this patch has waited to be landed, when in the meantime patch for LU-7136 was, and thus changed sanity-hsm framework causing this issue ... Will push a new version/patch-set of http://review.whamcloud.com/14384/ that will integrate my additional change for LU-7881 ( http://review.whamcloud.com/#/c/18919/ , "use new functions to kill and verify CT death" ).
            green Oleg Drokin added a comment -

            This patch was reverted due to testing problems discussed in LU-7881 causing big disruption to other patches.

            green Oleg Drokin added a comment - This patch was reverted due to testing problems discussed in LU-7881 causing big disruption to other patches.

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18925/
            Subject: Revert "LU-4640 mdt: implement Remove Archive on Last Unlink policy"
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: d9f95aa201341d972eeb610471e3c45f1ba12202

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18925/ Subject: Revert " LU-4640 mdt: implement Remove Archive on Last Unlink policy" Project: fs/lustre-release Branch: master Current Patch Set: Commit: d9f95aa201341d972eeb610471e3c45f1ba12202

            Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: http://review.whamcloud.com/18925
            Subject: Revert "LU-4640 mdt: implement Remove Archive on Last Unlink policy"
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 97b0e7059a9755e918e0760fd21b33186395cf5b

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: http://review.whamcloud.com/18925 Subject: Revert " LU-4640 mdt: implement Remove Archive on Last Unlink policy" Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 97b0e7059a9755e918e0760fd21b33186395cf5b

            Hello Aurelien,
            Now my patch to implement RAoLU has finally landed, do you agree that we can close this long standing ticket ??

            bfaccini Bruno Faccini (Inactive) added a comment - Hello Aurelien, Now my patch to implement RAoLU has finally landed, do you agree that we can close this long standing ticket ??

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14384/
            Subject: LU-4640 mdt: implement Remove Archive on Last Unlink policy
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 74d92933108dc64b110a843352cf3336dca249d0

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14384/ Subject: LU-4640 mdt: implement Remove Archive on Last Unlink policy Project: fs/lustre-release Branch: master Current Patch Set: Commit: 74d92933108dc64b110a843352cf3336dca249d0

            I agree with you that we do not need to wake up the CDT for each of them. It will be more difficult to write tests however.

            adegremont Aurelien Degremont (Inactive) added a comment - I agree with you that we do not need to wake up the CDT for each of them. It will be more difficult to write tests however.

            Now using mdt_agent_record_add() directly raises a new question, should we wake-up (by calling mdt_hsm_cdt_wakeup()) CDT upon each of these hsm_remove requests being created or simply "batch" them in hsm_actions LLOG waiting for CDT to wake-up upon any other event or loop_period timer end ? My feeling is that batching them is a good idea since there is no emergency to handle them ...

            bfaccini Bruno Faccini (Inactive) added a comment - Now using mdt_agent_record_add() directly raises a new question, should we wake-up (by calling mdt_hsm_cdt_wakeup()) CDT upon each of these hsm_remove requests being created or simply "batch" them in hsm_actions LLOG waiting for CDT to wake-up upon any other event or loop_period timer end ? My feeling is that batching them is a good idea since there is no emergency to handle them ...
            bfaccini Bruno Faccini (Inactive) added a comment - - edited

            Patch-set #1 testing has shown that requesting HSM data was unnecessary due to being already done before for a fully unlinked file via

            mdt_reint_unlink()->mdo_unlink()->mdd_unlink()->mdd_hsm_archive_exists()

            call sequence.
            Also mdt_hsm_add_actions() can not be used because it tries to lookup (via mdt_object_find()) object again but hangs/wait then (in lu_object_find[_at](), due to

            -EAGAIN

            returned for a dying object) since object seems already set as dying via

            mdt_reint_unlink()->mdo_unlink()->mdd_unlink()->mdd_finish_unlink()->mdo_destroy()->...

            call sequence. I use mdt_agent_record_add() directly instead.

            bfaccini Bruno Faccini (Inactive) added a comment - - edited Patch-set #1 testing has shown that requesting HSM data was unnecessary due to being already done before for a fully unlinked file via mdt_reint_unlink()->mdo_unlink()->mdd_unlink()->mdd_hsm_archive_exists() call sequence. Also mdt_hsm_add_actions() can not be used because it tries to lookup (via mdt_object_find()) object again but hangs/wait then (in lu_object_find [_at] (), due to -EAGAIN returned for a dying object) since object seems already set as dying via mdt_reint_unlink()->mdo_unlink()->mdd_unlink()->mdd_finish_unlink()->mdo_destroy()->... call sequence. I use mdt_agent_record_add() directly instead.

            Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/14384
            Subject: LU-4640 mdt: implement Remove Archive on Last Unlink policy
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 5c285dee336a29c2e1a55835004446fbd89c9c63

            gerrit Gerrit Updater added a comment - Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/14384 Subject: LU-4640 mdt: implement Remove Archive on Last Unlink policy Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 5c285dee336a29c2e1a55835004446fbd89c9c63

            After more discussions it appears that this new behavior should not be enable by default but upon a new tunable setting.

            bfaccini Bruno Faccini (Inactive) added a comment - After more discussions it appears that this new behavior should not be enable by default but upon a new tunable setting.

            People

              bfaccini Bruno Faccini (Inactive)
              adegremont Aurelien Degremont (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: