[LU-8900] Lustre snapshot based on ZFS backend Created: 03/Dec/16  Updated: 28/Jul/20  Resolved: 06/Apr/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0
Fix Version/s: Lustre 2.10.0

Type: New Feature Priority: Blocker
Reporter: nasf (Inactive) Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: PDF File Lustre Snapshot Test Plan.pdf    
Issue Links:
Related
is related to LU-5553 Support "remount-ro" option of the ld... Resolved
is related to LU-5070 Utility to change filesystem name Resolved
is related to LUDOC-370 Lustre ZFS Snapshot Documentation Resolved
Rank (Obsolete): 9223372036854775807

 Description   

Snapshot is an important feature for Lustre. As the first step, we will use ZFS backend snapshot functionalities to implement Lustre snapshot.

Snapshots provide fast recovery of files from a previous checkpoint (without recourse to offline backup). Snapshots are cheap online backups, provided the hardware itself is not compromised. Recovery of lost files from a snapshot is usually considerably faster than from any offline backup or remote replica. It is noted that snapshots do not improve storage reliability and are just as exposed to hardware failure as any other storage volume.

Snapshot addresses a need to be able to take a checkpoint of the file system, and has two historic purposes: prepare a file system for a backup or fast recovery of files from a previous state without recourse to an offline backup. The latter option is increasingly used in environments where the cost associated with any downtime is significant – consider the time required to restore a dataset from a tape library. In many cases, restore from tape will exceed the SLA for operations.

A common pattern is for a file system to be checkpointed every two hours. If an error occurs in the “live” data (accidental data loss, corruption, etc.), then it is straightforward to revert to a previous snapshot, either whole sale or by copying back the original data. Snapshots do require that the underlying hardware is not compromised.

Stabilising the file system for a backup is probably less relevant when the file system size reaches into petabytes. LTO drives for example, can only record at a maximum rate of 576-900 GB/hour. As file system capacities increase, the ability to take a backup, and more importantly restore from a backup within the conditions of an SLA, diminish.

Let us not underestimate the utility of snapshots when planning maintenance. Taking a snapshot immediately prior to a system upgrade is a sensible precaution and making that mechanism accessible and reliable adds value to any system maintenance workflow.



 Comments   
Comment by Andreas Dilger [ 03/Dec/16 ]

The overall implementation approach is described in http://wiki.lustre.org/Lustre_Snapshots

Comment by Gerrit Updater [ 09/Dec/16 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/24270
Subject: LU-8900 doc: Lustre snapshot man page
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 62360a68ad735af8f92da0587fbbfd0f40509157

Comment by Gerrit Updater [ 09/Dec/16 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/24262
Subject: LU-8900 snapshot: new config for MDT write barrier
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e2c1ac186e2167fdc82aadc2016d25f98a6cffb6

Comment by Gerrit Updater [ 09/Dec/16 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/24263
Subject: LU-8900 snapshot: operate write barrier on MDT
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b54b00456dc32c9f03dba0fa2bbdbe01173f467c

Comment by Gerrit Updater [ 09/Dec/16 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/24264
Subject: LU-8900 snapshot: check write barrier before modification
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 33dd23690d75f41f8a72baad1ab157fafd298c95

Comment by Gerrit Updater [ 09/Dec/16 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/24265
Subject: LU-8900 snapshot: user interface for write barrier on MDT
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 79c9e1a406407a70520c5349cf065dbb0b8b3e02

Comment by Gerrit Updater [ 09/Dec/16 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/24266
Subject: LU-8900 snapshot: fork/erase configuration
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 2cf0091e2800fce4f8779d2d8049f04f7e58a62e

Comment by Gerrit Updater [ 09/Dec/16 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/24267
Subject: LU-8900 snapshot: simulate readonly device
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ec783e86aefe194289248194cabf0549403cef02

Comment by Gerrit Updater [ 09/Dec/16 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/24268
Subject: LU-8900 snapshot: rename filesysetem fsname
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 369e0b9f9fe610ec22ce4d0f6f9f610469fe2d03

Comment by Gerrit Updater [ 09/Dec/16 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/24269
Subject: LU-8900 snapshot: user space snapshot tools
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f095db1cb0bf76216e24f8e0815626ea2eb46f83

Comment by Gerrit Updater [ 19/Dec/16 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/24415
Subject: LU-8900 mgs: use reference count for fs_db
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 78081f8938c9d619c6f3c07bd401c8f1581bae7e

Comment by nasf (Inactive) [ 23/Dec/16 ]

The write barrier has been reimplemented via new mechanism, the patch https://review.whamcloud.com/#/c/24265 (and its ancestors patches), set 7 gives excellent barrier performance. The time for "barrier_freeze" on idle system (with 2/4/6/8 MDTs cases) is less than 0.05 seconds. The same tests with old implementation (set 2) for "barrier_freeze" take about 15 ~ 17 seconds.

Comment by Gerrit Updater [ 31/Jan/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24415/
Subject: LU-8900 mgs: use reference count for fs_db
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: bfa1dbc969df6e9e10579fdb30ab653835463bd2

Comment by Gerrit Updater [ 09/Mar/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24262/
Subject: LU-8900 snapshot: new config for MDT write barrier
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 019a3b34c0f4d934266a185bcda048b1dab201ed

Comment by Gerrit Updater [ 14/Mar/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24263/
Subject: LU-8900 snapshot: operate write barrier on MDT
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ef25ecdd8574a5932fe970f6b58e8d0c458d7e9e

Comment by Gerrit Updater [ 23/Mar/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24264/
Subject: LU-8900 snapshot: check write barrier before modification
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 20d724103f4edfaf59fcc5914c8d6200d4a0bdc5

Comment by Gerrit Updater [ 23/Mar/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24265/
Subject: LU-8900 snapshot: user interface for write barrier on MDT
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3afede2b8186912a08acfa8b1881356c7e11c656

Comment by Gerrit Updater [ 23/Mar/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24266/
Subject: LU-8900 snapshot: fork/erase configuration
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 61718da8ba068fdc093da464fc1097c6771079eb

Comment by Gerrit Updater [ 23/Mar/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24267/
Subject: LU-8900 snapshot: simulate readonly device
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 4c90aef2f0712d8da720f6a66cd09b88df7d0573

Comment by Gerrit Updater [ 23/Mar/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24268/
Subject: LU-8900 snapshot: rename filesysetem fsname
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: d0c6e97fa53ae26dec458087e96dcbb0ed0d469a

Comment by Gerrit Updater [ 27/Mar/17 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/26199
Subject: LU-8900 tests: add snapshot in racer tests
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 579f2152ec8eb18c318dfb18f56b5e5635be03bd

Comment by Gerrit Updater [ 30/Mar/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24269/
Subject: LU-8900 snapshot: user space snapshot tools
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: d73849a05e3ed1baf07b1fd80ffa055299539b4a

Comment by Gerrit Updater [ 30/Mar/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24270/
Subject: LU-8900 doc: Lustre snapshot man page
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3f7c5d26e6c14e4fb7575ee0b983133adf5e9a05

Comment by Gerrit Updater [ 06/Apr/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26199/
Subject: LU-8900 tests: add snapshot in racer tests
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f7d395102ac17e49d2c7fa3ad2bfda483a3ddeda

Comment by nasf (Inactive) [ 06/Apr/17 ]

All patches have been landed to Lustre-2.10.

Generated at Sat Feb 10 02:21:31 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.