Details

Type: Improvement
Resolution: Fixed
Priority: Major
Fix Version/s: None
Affects Version/s: None
Labels:
- release

Rank (Obsolete):
8773

Description

The Lustre backup/restore documentation describes how to do a backup/restore using both "tar" and "dd" for ldiskfs devices. However, the preferred mechanism for backing up and restoring ZFS filesystems is via the "zfs send" and "zfs recv" mechanism. This will preserve the ZFS dnode numbers, and all of the FID->dnode mappings in the OI files. The zfs dump/restore functionality can also be used to do incremental backups, and keep two physically-separate devices in relatively close synchronization for disaster-recovery and other purposes.

It is also not currently possible to do a file-level backup from ldiskfs and restore this into a new ZFS filesystem because there is no OI Scrub facility for ZFS OSDs, so that should be documented as unsupported until ~~LU-7585~~ is resolved.

Attachments

Issue Links

is related to

LUDOC-56 Add ZFS setup and formatting information

Resolved

LU-8124 MDT zpool capacity consumed at greater rate than inode allocation

Resolved

LU-9023 Second opinion on MDT inode recovery requested

Open

is related to

LU-7585 Implement OI Scrub for ZFS

Resolved

Activity

[LUDOC-161] document backup/restore process for ZFS backing filesystems

nasf (Inactive) added a comment - 15/Nov/17 4:22 PM - edited

The steps for restore ZFS backend via ZPL by 'tar':

1) Create new pool for the target if necessary, then reformat new Lustre FS with "--replace" parameter. For example:

mkfs.lustre --mgs --fsname=${fsname} --mdt --index=0 --replace --param=sys.timeout=20 --backfstype=zfs --reformat ${fsname}-mdt1/mdt1 ${target_device}

2) Enable "canmount" property on the target FS. For example:

zfs set canmount=on ${fsname}-mdt1/mdt1

3) Mount the target as 'zfs'. For example:

zfs mount ${fsname}-mdt1/mdt1

4) Retore the data. For example:

tar jxf /tmp/target.tar.bz2 --xattrs --xattrs-include="trusted.*" -C /${fsname}-mdt1/mdt1/

5) Remove stale OIs and index objects. For example:

cd /${fsname}-mdt1/mdt1 && rm -rf oi.* OI_* lfsck_* LFSCK && sync && cd -

6) Umount the target.

7) (optional) If the restored system has different NID as the backup system, please change NID. For detail, please refer to Lustre manual 14.5. For example:

mount -t lustre -o nosvc ${fsname}-mdt1/mdt1 $mount_point
lctl replace_nids ${fsname}-MDTxxxx $new_nids
...
umount $mount_point

8) Mount the target as "lustre". Usually, we will use "-o abort_recov" option to skip unnecessary recovery. For example:

mount -t lustre -o abort_recov ${fsname}-mdt1/mdt1 $mount_point

The osd-zfs can detect the restore automatically when mount the target, then trigger OI scrub to rebuild the OIs and index objects asynchronously at background. You can check the OI scrub status. For example:

lctl get_param -n osd-zfs.${fsname}-${target}.oi_scrub

Or you can read the proc interface on the target directly:

cat /proc/fs/lustre/osd-zfs/${fsname}-${target}.oi_scrub

If you want to restore the system from ldiskfs-based backup, please follow the same steps.

nasf (Inactive) added a comment - 15/Nov/17 4:22 PM - edited The steps for restore ZFS backend via ZPL by 'tar': 1) Create new pool for the target if necessary, then reformat new Lustre FS with "--replace" parameter. For example: mkfs.lustre --mgs --fsname=${fsname} --mdt --index=0 --replace --param=sys.timeout=20 --backfstype=zfs --reformat ${fsname}-mdt1/mdt1 ${target_device} 2) Enable "canmount" property on the target FS. For example: zfs set canmount=on ${fsname}-mdt1/mdt1 3) Mount the target as 'zfs'. For example: zfs mount ${fsname}-mdt1/mdt1 4) Retore the data. For example: tar jxf /tmp/target.tar.bz2 --xattrs --xattrs-include="trusted.*" -C /${fsname}-mdt1/mdt1/ 5) Remove stale OIs and index objects. For example: cd /${fsname}-mdt1/mdt1 && rm -rf oi.* OI_* lfsck_* LFSCK && sync && cd - 6) Umount the target. 7) (optional) If the restored system has different NID as the backup system, please change NID. For detail, please refer to Lustre manual 14.5. For example: mount -t lustre -o nosvc ${fsname}-mdt1/mdt1 $mount_point lctl replace_nids ${fsname}-MDTxxxx $new_nids ... umount $mount_point 8) Mount the target as "lustre". Usually, we will use "-o abort_recov" option to skip unnecessary recovery. For example: mount -t lustre -o abort_recov ${fsname}-mdt1/mdt1 $mount_point The osd-zfs can detect the restore automatically when mount the target, then trigger OI scrub to rebuild the OIs and index objects asynchronously at background. You can check the OI scrub status. For example: lctl get_param -n osd-zfs.${fsname}-${target}.oi_scrub Or you can read the proc interface on the target directly: cat /proc/fs/lustre/osd-zfs/${fsname}-${target}.oi_scrub If you want to restore the system from ldiskfs-based backup, please follow the same steps.

nasf (Inactive) added a comment - 15/Nov/17 4:08 PM - edited

The steps for backup ZFS backend var ZPL by 'tar':

1) Before umount the target (MDT or OST), please enable index backup on the target. For example:

lctl set_param osd-zfs.${fsname}-MDT0000.index_backup=1

Or you can write the proc interface on the target directly:

echo 1 > /proc/fs/lustre/osd-zfs/${fsname}-MDT0000/index_backup

2) Umount the target.

3) Import the pool for the target if it is exported during the step 2). For example:

zpool import lustre-mdt1 [-d ${target_device_dir}]

4) Enable "canmount" property on the target FS. For example:

zfs set canmount=on ${fsname}-mdt1/mdt1

You also can specify the "mountpoint" property. By default, it will be /${fsname}-mdt1/mdt1

5) Mount the target as 'zfs'. For example:

zfs mount ${fsname}-mdt1/mdt1

6) Backup the data. For example:

tar jcf /tmp/target.tar.bz2 --xattrs --xattrs-include="trusted.*" -C /${fsname}-mdt1/mdt1/ .

7) Umount the target and export the pool if you want.

Please save the /tmp/target.bz2 and /tmp/target.ea. If you want to migrate system from ZFS to ldiskfs, please backup the system as the same steps.

nasf (Inactive) added a comment - 15/Nov/17 4:08 PM - edited The steps for backup ZFS backend var ZPL by 'tar': 1) Before umount the target (MDT or OST), please enable index backup on the target. For example: lctl set_param osd-zfs.${fsname}-MDT0000.index_backup=1 Or you can write the proc interface on the target directly: echo 1 > /proc/fs/lustre/osd-zfs/${fsname}-MDT0000/index_backup 2) Umount the target. 3) Import the pool for the target if it is exported during the step 2). For example: zpool import lustre-mdt1 [-d ${target_device_dir}] 4) Enable "canmount" property on the target FS. For example: zfs set canmount=on ${fsname}-mdt1/mdt1 You also can specify the "mountpoint" property. By default, it will be /${fsname}-mdt1/mdt1 5) Mount the target as 'zfs'. For example: zfs mount ${fsname}-mdt1/mdt1 6) Backup the data. For example: tar jcf /tmp/target.tar.bz2 --xattrs --xattrs-include="trusted.*" -C /${fsname}-mdt1/mdt1/ . 7) Umount the target and export the pool if you want. Please save the /tmp/target.bz2 and /tmp/target.ea. If you want to migrate system from ZFS to ldiskfs, please backup the system as the same steps.

tom crowe added a comment - 31/May/17 6:09 PM

Hi Andreas,

I would be glad to take a crack at updating the Lustre user manual to document the procedure to do a device level backup of a ZFS MDT or OST.

tom crowe added a comment - 31/May/17 6:09 PM Hi Andreas, I would be glad to take a crack at updating the Lustre user manual to document the procedure to do a device level backup of a ZFS MDT or OST.

Andreas Dilger added a comment - 31/May/17 4:25 PM

Hi Tom,
Based on your LUG presentation today about using ZFS send/recv for migration, the process to do a backup of the ZFS MDT (or OST). Would you be interested to take a crack st updating the Lustre user manual to document the procedure to do a device level backup of a ZFS MDT or OST? It should be noted that rsync is not a workable solution for ZFS until the ZFS OI scrub is implemented.

Andreas Dilger added a comment - 31/May/17 4:25 PM Hi Tom, Based on your LUG presentation today about using ZFS send/recv for migration, the process to do a backup of the ZFS MDT (or OST). Would you be interested to take a crack st updating the Lustre user manual to document the procedure to do a device level backup of a ZFS MDT or OST? It should be noted that rsync is not a workable solution for ZFS until the ZFS OI scrub is implemented.

Joseph Gmitter (Inactive) added a comment - 22/Nov/16 6:16 PM

Sure Andreas, I will sync up with Nathaniel and Zhiqi.

Joseph Gmitter (Inactive) added a comment - 22/Nov/16 6:16 PM Sure Andreas, I will sync up with Nathaniel and Zhiqi.

Scott Nolin (Inactive) added a comment - 18/May/14 1:55 AM

After living with this for a while, there are obviously a lot of things to consider and possibly add to any documentation on backup/restore for zfs backed filesystems.

While snapshots do work, zfs send performance can be very slow, and there is an impact on the file system.

More than just a zfs snapshot/send type recipe, some best practice recommendations are needed.

Scott Nolin (Inactive) added a comment - 18/May/14 1:55 AM After living with this for a while, there are obviously a lot of things to consider and possibly add to any documentation on backup/restore for zfs backed filesystems. While snapshots do work, zfs send performance can be very slow, and there is an impact on the file system. More than just a zfs snapshot/send type recipe, some best practice recommendations are needed.

Scott Nolin (Inactive) added a comment - 24/Jun/13 3:29 PM - edited

Here are my notes for a backup and restore test.

ZFS snapshots and send/receive for object backups.

This example is for combined mgs/mdt object, but the same would apply for an OST device-level backup. This example was run on on Redhat Enterprise Linux 6.2 and lustre 2.4.0.

Servers and filesystems in the example

luste2-8-25 - MGS/MDT server
lustre-meta/meta - Lustre ZFS MGS/MDT volume/filesystem on lustre2-8-25
lustre2-8-11 - OSS/OST server
lustre-ost0 - Lustre ZFS OST volume on lustre2-8-11

Backing up the object

Take a snapshot

zfs snapshot -r lustre-meta@lustre-meta-backup

"-r" means do a recursive snapshot, so this will include both the volume and the filesystem.

list existing snapshots

[root@lustre2-8-25 ~]# zfs list -t snapshot
NAME                                  USED  AVAIL  REFER  MOUNTPOINT
lustre-meta@lustre-meta-backup           0      -    30K  -
lustre-meta/meta@lustre-meta-backup      0      -   287M  -

send and store on a remote ZFS/Lustre server:

zfs send -R lustre-meta@lustre-meta-backup | ssh lustre2-8-11 zfs receive lustre-ost0/lustre-meta

note "-R" recursively sends the volume, filesystem, and preserves all properties. It is critical to preserve filesystem properties. If not using the "-R" flag, be sure to use "-p", we will show that during recovery.

Examine on remote side

[root@lustre2-8-25 ~]# ssh lustre2-8-11 zfs list -t snapshot
NAME                                              USED  AVAIL  REFER  MOUNTPOINT
lustre-ost0/lustre-meta@lustre-meta-backup           0      -  64.5K  -
lustre-ost0/lustre-meta/meta@lustre-meta-backup      0      -   605M  -

Recovery from failure.

In testing, I first corrupted the filesystem with 'dd'. You could also simply reformat it for testing.

Create a new ZFS lustre volume/filesystem with the same name.

In my test case we have a raid 10:

mkfs.lustre --fsname=cove --mgs --mdt --backfstype=zfs lustre-meta/meta mirror ssd0 ssd1 mirror ssd2 ssd3

Mount with "service start lustre"

This makes a volume called "lustre-meta" and filesystem "meta"

[root@lustre2-8-25 ~]# zfs list
NAME               USED  AVAIL  REFER  MOUNTPOINT
lustre-meta        315M   732G    30K  /lustre-meta
lustre-meta/meta   315M   732G   287M  /lustre-meta/meta

"mount" command shows:

lustre-meta/meta on /mnt/lustre/local/cove-MDT0000 type lustre (rw)

Login into lustre2-8-11 (remote target where you stored the snapshot), and send the filesystem back.

Now I will only send the filesystem back, not the whole volume (why do a whole volume? Convenient if you have multiple datasets?)

zfs send -p lustre-ost0/lustre-meta/meta@meta-backup | ssh lustre2-8-25 zfs receive lustre-meta/meta-new@recover

-p = preserve attributes, important for the lustre filesystem to mount.

Back on On lustre2-8-11 (failed metadata server), rename the filesystem to make the snapshot active.

zfs rename lustre-meta/meta lustre-meta/meta-old
cannot rename 'lustre-meta/meta': dataset is busy

oops! That didn't work. You need to unmount the filesystem so it isn't busy.
Note, this doesn't mean stop the lustre service, if you do you can't access the zfs volume.

umount /mnt/lustre/local/cove-MDT0000
zfs rename lustre-meta/meta lustre-meta/meta-old
zfs rename lustre-meta/meta-new lustre-meta/meta
zfs destroy lustre-meta/meta-old
service lustre stop
service lustre start

You should now be recovered.

Scott Nolin (Inactive) added a comment - 24/Jun/13 3:29 PM - edited Here are my notes for a backup and restore test. ZFS snapshots and send/receive for object backups. This example is for combined mgs/mdt object, but the same would apply for an OST device-level backup. This example was run on on Redhat Enterprise Linux 6.2 and lustre 2.4.0. Servers and filesystems in the example luste2-8-25 - MGS/MDT server lustre-meta/meta - Lustre ZFS MGS/MDT volume/filesystem on lustre2-8-25 lustre2-8-11 - OSS/OST server lustre-ost0 - Lustre ZFS OST volume on lustre2-8-11 Backing up the object Take a snapshot zfs snapshot -r lustre-meta@lustre-meta-backup "-r" means do a recursive snapshot, so this will include both the volume and the filesystem. list existing snapshots [root@lustre2-8-25 ~]# zfs list -t snapshot NAME USED AVAIL REFER MOUNTPOINT lustre-meta@lustre-meta-backup 0 - 30K - lustre-meta/meta@lustre-meta-backup 0 - 287M - send and store on a remote ZFS/Lustre server: zfs send -R lustre-meta@lustre-meta-backup | ssh lustre2-8-11 zfs receive lustre-ost0/lustre-meta note "-R" recursively sends the volume, filesystem, and preserves all properties. It is critical to preserve filesystem properties. If not using the "-R" flag, be sure to use "-p", we will show that during recovery. Examine on remote side [root@lustre2-8-25 ~]# ssh lustre2-8-11 zfs list -t snapshot NAME USED AVAIL REFER MOUNTPOINT lustre-ost0/lustre-meta@lustre-meta-backup 0 - 64.5K - lustre-ost0/lustre-meta/meta@lustre-meta-backup 0 - 605M - Recovery from failure. In testing, I first corrupted the filesystem with 'dd'. You could also simply reformat it for testing. Create a new ZFS lustre volume/filesystem with the same name. In my test case we have a raid 10: mkfs.lustre --fsname=cove --mgs --mdt --backfstype=zfs lustre-meta/meta mirror ssd0 ssd1 mirror ssd2 ssd3 Mount with "service start lustre" This makes a volume called "lustre-meta" and filesystem "meta" [root@lustre2-8-25 ~]# zfs list NAME USED AVAIL REFER MOUNTPOINT lustre-meta 315M 732G 30K /lustre-meta lustre-meta/meta 315M 732G 287M /lustre-meta/meta "mount" command shows: lustre-meta/meta on /mnt/lustre/local/cove-MDT0000 type lustre (rw) Login into lustre2-8-11 (remote target where you stored the snapshot), and send the filesystem back. Now I will only send the filesystem back, not the whole volume (why do a whole volume? Convenient if you have multiple datasets?) zfs send -p lustre-ost0/lustre-meta/meta@meta-backup | ssh lustre2-8-25 zfs receive lustre-meta/meta-new@recover -p = preserve attributes, important for the lustre filesystem to mount. Back on On lustre2-8-11 (failed metadata server), rename the filesystem to make the snapshot active. zfs rename lustre-meta/meta lustre-meta/meta-old cannot rename 'lustre-meta/meta': dataset is busy oops! That didn't work. You need to unmount the filesystem so it isn't busy. Note, this doesn't mean stop the lustre service, if you do you can't access the zfs volume. umount /mnt/lustre/local/cove-MDT0000 zfs rename lustre-meta/meta lustre-meta/meta-old zfs rename lustre-meta/meta-new lustre-meta/meta zfs destroy lustre-meta/meta-old service lustre stop service lustre start You should now be recovered.

People

Assignee:: Lustre Manual Triage

Reporter:: Andreas Dilger

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 20/Jun/13 10:24 PM

Updated:: 13/May/22 12:09 AM

Resolved:: 13/May/22 12:09 AM