Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8900

Lustre snapshot based on ZFS backend

Details

    • New Feature
    • Resolution: Fixed
    • Blocker
    • Lustre 2.10.0
    • Lustre 2.10.0
    • None
    • 9223372036854775807

    Description

      Snapshot is an important feature for Lustre. As the first step, we will use ZFS backend snapshot functionalities to implement Lustre snapshot.

      Snapshots provide fast recovery of files from a previous checkpoint (without recourse to offline backup). Snapshots are cheap online backups, provided the hardware itself is not compromised. Recovery of lost files from a snapshot is usually considerably faster than from any offline backup or remote replica. It is noted that snapshots do not improve storage reliability and are just as exposed to hardware failure as any other storage volume.

      Snapshot addresses a need to be able to take a checkpoint of the file system, and has two historic purposes: prepare a file system for a backup or fast recovery of files from a previous state without recourse to an offline backup. The latter option is increasingly used in environments where the cost associated with any downtime is significant – consider the time required to restore a dataset from a tape library. In many cases, restore from tape will exceed the SLA for operations.

      A common pattern is for a file system to be checkpointed every two hours. If an error occurs in the “live” data (accidental data loss, corruption, etc.), then it is straightforward to revert to a previous snapshot, either whole sale or by copying back the original data. Snapshots do require that the underlying hardware is not compromised.

      Stabilising the file system for a backup is probably less relevant when the file system size reaches into petabytes. LTO drives for example, can only record at a maximum rate of 576-900 GB/hour. As file system capacities increase, the ability to take a backup, and more importantly restore from a backup within the conditions of an SLA, diminish.

      Let us not underestimate the utility of snapshots when planning maintenance. Taking a snapshot immediately prior to a system upgrade is a sensible precaution and making that mechanism accessible and reliable adds value to any system maintenance workflow.

      Attachments

        Issue Links

          Activity

            [LU-8900] Lustre snapshot based on ZFS backend

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24269/
            Subject: LU-8900 snapshot: user space snapshot tools
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: d73849a05e3ed1baf07b1fd80ffa055299539b4a

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24269/ Subject: LU-8900 snapshot: user space snapshot tools Project: fs/lustre-release Branch: master Current Patch Set: Commit: d73849a05e3ed1baf07b1fd80ffa055299539b4a

            Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/26199
            Subject: LU-8900 tests: add snapshot in racer tests
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 579f2152ec8eb18c318dfb18f56b5e5635be03bd

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/26199 Subject: LU-8900 tests: add snapshot in racer tests Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 579f2152ec8eb18c318dfb18f56b5e5635be03bd

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24268/
            Subject: LU-8900 snapshot: rename filesysetem fsname
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: d0c6e97fa53ae26dec458087e96dcbb0ed0d469a

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24268/ Subject: LU-8900 snapshot: rename filesysetem fsname Project: fs/lustre-release Branch: master Current Patch Set: Commit: d0c6e97fa53ae26dec458087e96dcbb0ed0d469a

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24267/
            Subject: LU-8900 snapshot: simulate readonly device
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 4c90aef2f0712d8da720f6a66cd09b88df7d0573

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24267/ Subject: LU-8900 snapshot: simulate readonly device Project: fs/lustre-release Branch: master Current Patch Set: Commit: 4c90aef2f0712d8da720f6a66cd09b88df7d0573

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24266/
            Subject: LU-8900 snapshot: fork/erase configuration
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 61718da8ba068fdc093da464fc1097c6771079eb

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24266/ Subject: LU-8900 snapshot: fork/erase configuration Project: fs/lustre-release Branch: master Current Patch Set: Commit: 61718da8ba068fdc093da464fc1097c6771079eb

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24265/
            Subject: LU-8900 snapshot: user interface for write barrier on MDT
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 3afede2b8186912a08acfa8b1881356c7e11c656

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24265/ Subject: LU-8900 snapshot: user interface for write barrier on MDT Project: fs/lustre-release Branch: master Current Patch Set: Commit: 3afede2b8186912a08acfa8b1881356c7e11c656

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24264/
            Subject: LU-8900 snapshot: check write barrier before modification
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 20d724103f4edfaf59fcc5914c8d6200d4a0bdc5

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24264/ Subject: LU-8900 snapshot: check write barrier before modification Project: fs/lustre-release Branch: master Current Patch Set: Commit: 20d724103f4edfaf59fcc5914c8d6200d4a0bdc5

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24263/
            Subject: LU-8900 snapshot: operate write barrier on MDT
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: ef25ecdd8574a5932fe970f6b58e8d0c458d7e9e

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24263/ Subject: LU-8900 snapshot: operate write barrier on MDT Project: fs/lustre-release Branch: master Current Patch Set: Commit: ef25ecdd8574a5932fe970f6b58e8d0c458d7e9e

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24262/
            Subject: LU-8900 snapshot: new config for MDT write barrier
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 019a3b34c0f4d934266a185bcda048b1dab201ed

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24262/ Subject: LU-8900 snapshot: new config for MDT write barrier Project: fs/lustre-release Branch: master Current Patch Set: Commit: 019a3b34c0f4d934266a185bcda048b1dab201ed

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24415/
            Subject: LU-8900 mgs: use reference count for fs_db
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: bfa1dbc969df6e9e10579fdb30ab653835463bd2

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24415/ Subject: LU-8900 mgs: use reference count for fs_db Project: fs/lustre-release Branch: master Current Patch Set: Commit: bfa1dbc969df6e9e10579fdb30ab653835463bd2

            The write barrier has been reimplemented via new mechanism, the patch https://review.whamcloud.com/#/c/24265 (and its ancestors patches), set 7 gives excellent barrier performance. The time for "barrier_freeze" on idle system (with 2/4/6/8 MDTs cases) is less than 0.05 seconds. The same tests with old implementation (set 2) for "barrier_freeze" take about 15 ~ 17 seconds.

            yong.fan nasf (Inactive) added a comment - The write barrier has been reimplemented via new mechanism, the patch https://review.whamcloud.com/#/c/24265 (and its ancestors patches), set 7 gives excellent barrier performance. The time for "barrier_freeze" on idle system (with 2/4/6/8 MDTs cases) is less than 0.05 seconds. The same tests with old implementation (set 2) for "barrier_freeze" take about 15 ~ 17 seconds.

            People

              yong.fan nasf (Inactive)
              yong.fan nasf (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: