[LU-3362] HSM - Disaster Recovery Support - Master Landings Created: 20/May/13  Updated: 05/Sep/13  Resolved: 05/Sep/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: Lustre 2.5.0

Type: New Feature Priority: Critical
Reporter: Jodi Levi (Inactive) Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: HSM

Issue Links:
Related
is related to LU-3608 HSM Master Ticket for 2.5 Landings Resolved
Sub-Tasks:
Key
Summary
Type
Status
Assignee
LU-3565 HSM: Allow specifying minimum FID seq... Technical task Resolved Alex Zhuravlev  
Rank (Obsolete): 8325

 Description   

Need disaster recovery support for HSM in 2.5



 Comments   
Comment by Jodi Levi (Inactive) [ 05/Jun/13 ]

Meeting 2013-06-05
Use Cases:
1. Need MDT backup and restore with what is in archive (import by FID)
2. Restore into same file system - import file that was deleted (import by FID)
3. Brand new file system formatted - have archive available and import into new file system (import by FID)
4. Import into a file system which already has data (requires FID rebinding)
Focus for 2.5 will be on the first 3 Use Cases
Technical details will be discussed in meeting 2013-06-06 and added in this ticket
Changes to Lustre will be done by Intel
All other changes will be done by CEA

Comment by Thomas LEIBOVICI - CEA (Inactive) [ 25/Jun/13 ]

This can be achieved by implementing a "rebind" operation in copytool (CEA).
To avoid fid collisions when reimporting files to a new filesystem, we need a hack in FLDB to avoid re-using sequences of old fids (Intel).

Comment by Thomas LEIBOVICI - CEA (Inactive) [ 04/Jul/13 ]

Implementation details:
When recovering a filesystem from the archive, we must rebind archived entries to the newly created fids in Lustre.
We must avoid collisions between old fids and new fids during this operation.

Thus, to achieve distaster recovery, the following features are needed:
1) add the copytool a "rebind" feature. This is implemented as part of change 4737 http://review.whamcloud.com/#/c/4737/ ("rebind" copytool option).
2) be able to get the max fid sequence referenced in the archive. This is also implemented as part of change 4737 http://review.whamcloud.com/#/c/4737/ ("max_sequence" copytool option).
3) we must ensure that the newly formatted Lustre filesystem will not allocate fids in the old ranges.
To achieve this, we pass this max_sequence to mkfs, so that all the ranges up to max_sequence are reserved in the FLDB.

Comment by Alex Zhuravlev [ 09/Jul/13 ]

may I ask why FIDs need to be preserved?

Comment by Johann Lombardi (Inactive) [ 10/Jul/13 ]

I have also been lobbying for not storing FIDs in the archive. Actually, CEA and I came up with a solution doing exactly this and not require preserving or remapping FIDs.

That said, CEA thinks that this solution cannot be implemented by the feature freeze, so they want to go on with their current scheme which stores FID in the "archive". As a consequence, we either have to preserve original FIDs or re-map FIDs on the archive. Given that the archive is the slowest component, preserving FID sounds like the most reasonable approach.

Comment by Alex Zhuravlev [ 10/Jul/13 ]

I see... the issue is that it's not mkfs who creates FLDB.. will try to figure out a solution.

Comment by Alex Zhuravlev [ 11/Jul/13 ]

this is not supposed to be used with DNE?

Comment by Alex Zhuravlev [ 11/Jul/13 ]

just to clarify .. if we don't need to support DNE, then we just insert (somehow) an <reserved sequences> -> MDT#0 mapping into newly created FLDB.
otherwise it'll be more difficult.

Comment by Johann Lombardi (Inactive) [ 11/Jul/13 ]

Yes, you can assign the reserved sequence range to MDT0. It is enough to avoid collision.

Comment by Alex Zhuravlev [ 18/Jul/13 ]

http://review.whamcloud.com/#/c/7027/

Comment by Jodi Levi (Inactive) [ 05/Sep/13 ]

Patch landed to master. Additional patch moved to a separate ticket for 2.6

Generated at Sat Feb 10 01:33:16 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.