[LU-8900] Lustre snapshot based on ZFS backend Created: 03/Dec/16 Updated: 28/Jul/20 Resolved: 06/Apr/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.0 |
| Fix Version/s: | Lustre 2.10.0 |
| Type: | New Feature | Priority: | Blocker |
| Reporter: | nasf (Inactive) | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
Snapshot is an important feature for Lustre. As the first step, we will use ZFS backend snapshot functionalities to implement Lustre snapshot. Snapshots provide fast recovery of files from a previous checkpoint (without recourse to offline backup). Snapshots are cheap online backups, provided the hardware itself is not compromised. Recovery of lost files from a snapshot is usually considerably faster than from any offline backup or remote replica. It is noted that snapshots do not improve storage reliability and are just as exposed to hardware failure as any other storage volume. Snapshot addresses a need to be able to take a checkpoint of the file system, and has two historic purposes: prepare a file system for a backup or fast recovery of files from a previous state without recourse to an offline backup. The latter option is increasingly used in environments where the cost associated with any downtime is significant – consider the time required to restore a dataset from a tape library. In many cases, restore from tape will exceed the SLA for operations. A common pattern is for a file system to be checkpointed every two hours. If an error occurs in the “live” data (accidental data loss, corruption, etc.), then it is straightforward to revert to a previous snapshot, either whole sale or by copying back the original data. Snapshots do require that the underlying hardware is not compromised. Stabilising the file system for a backup is probably less relevant when the file system size reaches into petabytes. LTO drives for example, can only record at a maximum rate of 576-900 GB/hour. As file system capacities increase, the ability to take a backup, and more importantly restore from a backup within the conditions of an SLA, diminish. Let us not underestimate the utility of snapshots when planning maintenance. Taking a snapshot immediately prior to a system upgrade is a sensible precaution and making that mechanism accessible and reliable adds value to any system maintenance workflow. |
| Comments |
| Comment by Andreas Dilger [ 03/Dec/16 ] |
|
The overall implementation approach is described in http://wiki.lustre.org/Lustre_Snapshots |
| Comment by Gerrit Updater [ 09/Dec/16 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/24270 |
| Comment by Gerrit Updater [ 09/Dec/16 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/24262 |
| Comment by Gerrit Updater [ 09/Dec/16 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/24263 |
| Comment by Gerrit Updater [ 09/Dec/16 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/24264 |
| Comment by Gerrit Updater [ 09/Dec/16 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/24265 |
| Comment by Gerrit Updater [ 09/Dec/16 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/24266 |
| Comment by Gerrit Updater [ 09/Dec/16 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/24267 |
| Comment by Gerrit Updater [ 09/Dec/16 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/24268 |
| Comment by Gerrit Updater [ 09/Dec/16 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/24269 |
| Comment by Gerrit Updater [ 19/Dec/16 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/24415 |
| Comment by nasf (Inactive) [ 23/Dec/16 ] |
|
The write barrier has been reimplemented via new mechanism, the patch https://review.whamcloud.com/#/c/24265 (and its ancestors patches), set 7 gives excellent barrier performance. The time for "barrier_freeze" on idle system (with 2/4/6/8 MDTs cases) is less than 0.05 seconds. The same tests with old implementation (set 2) for "barrier_freeze" take about 15 ~ 17 seconds. |
| Comment by Gerrit Updater [ 31/Jan/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24415/ |
| Comment by Gerrit Updater [ 09/Mar/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24262/ |
| Comment by Gerrit Updater [ 14/Mar/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24263/ |
| Comment by Gerrit Updater [ 23/Mar/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24264/ |
| Comment by Gerrit Updater [ 23/Mar/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24265/ |
| Comment by Gerrit Updater [ 23/Mar/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24266/ |
| Comment by Gerrit Updater [ 23/Mar/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24267/ |
| Comment by Gerrit Updater [ 23/Mar/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24268/ |
| Comment by Gerrit Updater [ 27/Mar/17 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/26199 |
| Comment by Gerrit Updater [ 30/Mar/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24269/ |
| Comment by Gerrit Updater [ 30/Mar/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24270/ |
| Comment by Gerrit Updater [ 06/Apr/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26199/ |
| Comment by nasf (Inactive) [ 06/Apr/17 ] |
|
All patches have been landed to Lustre-2.10. |