Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3362

HSM - Disaster Recovery Support - Master Landings

Details

    • New Feature
    • Resolution: Fixed
    • Critical
    • Lustre 2.5.0
    • Lustre 2.5.0
    • 8325

    Description

      Need disaster recovery support for HSM in 2.5

      Attachments

        Issue Links

          Activity

            [LU-3362] HSM - Disaster Recovery Support - Master Landings

            Patch landed to master. Additional patch moved to a separate ticket for 2.6

            jlevi Jodi Levi (Inactive) added a comment - Patch landed to master. Additional patch moved to a separate ticket for 2.6
            bzzz Alex Zhuravlev added a comment - http://review.whamcloud.com/#/c/7027/

            Yes, you can assign the reserved sequence range to MDT0. It is enough to avoid collision.

            johann Johann Lombardi (Inactive) added a comment - Yes, you can assign the reserved sequence range to MDT0. It is enough to avoid collision.

            just to clarify .. if we don't need to support DNE, then we just insert (somehow) an <reserved sequences> -> MDT#0 mapping into newly created FLDB.
            otherwise it'll be more difficult.

            bzzz Alex Zhuravlev added a comment - just to clarify .. if we don't need to support DNE, then we just insert (somehow) an <reserved sequences> -> MDT#0 mapping into newly created FLDB. otherwise it'll be more difficult.

            this is not supposed to be used with DNE?

            bzzz Alex Zhuravlev added a comment - this is not supposed to be used with DNE?

            I see... the issue is that it's not mkfs who creates FLDB.. will try to figure out a solution.

            bzzz Alex Zhuravlev added a comment - I see... the issue is that it's not mkfs who creates FLDB.. will try to figure out a solution.

            I have also been lobbying for not storing FIDs in the archive. Actually, CEA and I came up with a solution doing exactly this and not require preserving or remapping FIDs.

            That said, CEA thinks that this solution cannot be implemented by the feature freeze, so they want to go on with their current scheme which stores FID in the "archive". As a consequence, we either have to preserve original FIDs or re-map FIDs on the archive. Given that the archive is the slowest component, preserving FID sounds like the most reasonable approach.

            johann Johann Lombardi (Inactive) added a comment - I have also been lobbying for not storing FIDs in the archive. Actually, CEA and I came up with a solution doing exactly this and not require preserving or remapping FIDs. That said, CEA thinks that this solution cannot be implemented by the feature freeze, so they want to go on with their current scheme which stores FID in the "archive". As a consequence, we either have to preserve original FIDs or re-map FIDs on the archive. Given that the archive is the slowest component, preserving FID sounds like the most reasonable approach.

            may I ask why FIDs need to be preserved?

            bzzz Alex Zhuravlev added a comment - may I ask why FIDs need to be preserved?

            Implementation details:
            When recovering a filesystem from the archive, we must rebind archived entries to the newly created fids in Lustre.
            We must avoid collisions between old fids and new fids during this operation.

            Thus, to achieve distaster recovery, the following features are needed:
            1) add the copytool a "rebind" feature. This is implemented as part of change 4737 http://review.whamcloud.com/#/c/4737/ ("rebind" copytool option).
            2) be able to get the max fid sequence referenced in the archive. This is also implemented as part of change 4737 http://review.whamcloud.com/#/c/4737/ ("max_sequence" copytool option).
            3) we must ensure that the newly formatted Lustre filesystem will not allocate fids in the old ranges.
            To achieve this, we pass this max_sequence to mkfs, so that all the ranges up to max_sequence are reserved in the FLDB.

            leibovici-cea Thomas LEIBOVICI - CEA (Inactive) added a comment - Implementation details: When recovering a filesystem from the archive, we must rebind archived entries to the newly created fids in Lustre. We must avoid collisions between old fids and new fids during this operation. Thus, to achieve distaster recovery, the following features are needed: 1) add the copytool a "rebind" feature. This is implemented as part of change 4737 http://review.whamcloud.com/#/c/4737/ ("rebind" copytool option). 2) be able to get the max fid sequence referenced in the archive. This is also implemented as part of change 4737 http://review.whamcloud.com/#/c/4737/ ("max_sequence" copytool option). 3) we must ensure that the newly formatted Lustre filesystem will not allocate fids in the old ranges. To achieve this, we pass this max_sequence to mkfs, so that all the ranges up to max_sequence are reserved in the FLDB.

            This can be achieved by implementing a "rebind" operation in copytool (CEA).
            To avoid fid collisions when reimporting files to a new filesystem, we need a hack in FLDB to avoid re-using sequences of old fids (Intel).

            leibovici-cea Thomas LEIBOVICI - CEA (Inactive) added a comment - This can be achieved by implementing a "rebind" operation in copytool (CEA). To avoid fid collisions when reimporting files to a new filesystem, we need a hack in FLDB to avoid re-using sequences of old fids (Intel).

            People

              bzzz Alex Zhuravlev
              jlevi Jodi Levi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: