[LU-17135] Inquiry Regarding MDT Backup Methods and Snapshot Reliability Created: 19/Sep/23 Updated: 25/Sep/23 Resolved: 25/Sep/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Question/Request | Priority: | Minor |
| Reporter: | BNL Team | Assignee: | Andreas Dilger |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre 2.15.2 on RHEL8 |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Hi, We use device-level backup for our MDT everyday as below: /usr/bin/dd if=/dev/sdb of=/somewhere/backup-$now bs=4096k Additionally, we've considered the e2image command for similar device-level backups. My question is, is there a better way to do MDT bkup? Would snapshots taken with dd or e2image be reliable for future MDT recovery? Is there a risk of encountering inconsistencies within these snapshots? If inconsistencies do arise, how should we proceed with such a snapshot? Thank you, Jane
|
| Comments |
| Comment by Peter Jones [ 20/Sep/23 ] |
|
Andreas What are your thoughts here? Peter |
| Comment by Andreas Dilger [ 20/Sep/23 ] |
|
Having a "dd" backup of the MDT filesystem(s), regardless of inconsistencies, is still far better than not having any backup at all. The ext4/ldiskfs metadata is mostly static on disk, so even if parts of the filesystem are being modified, this will not affect the backup of the vast majority of the filesystem that is not being modified at that time. If the MDT is being modified during a backup (as is usually the case) then it would need to have "e2fsck -fy" run on it to correct any inconsistencies before it is used. The e2fsck could be run immediately on the backup image after it is completed, in order to judge how "inconsistent" the backup is, and to avoid the need to do this later if the backup is needed. There will always be some errors reported by e2fsck for a "live" backup, in terms of inconsistent block and inode bitmaps, some directories and inodes will have missing/added entries, etc. so I would judge the "inconsistency" of the backup based on the total line count of e2fsck output and not whether e2fsck returns an error or not. Even if the backup is inconsistent, there is still a high value in having this available to restore configuration files, or extract other information if the MDT itself is partially corrupted. It is worthwhile to save the e2fsck output with the backup image in case there is any question about inconsistencies in the backup image itself. If the MDT is stored in an LVM volume, then creating a short-term LV snapshot as the source of the backup will make the backup more consistent (in theory 100% consistent). This is known to be functional (I use MDT LVM snapshots for my home Lustre filesystem backups), but it is not formally tested as part of any release. I would not recommend to keep LVM snapshots for an extended period, as they will cause increased IO load on the MDT storage and performance degradation over time, but they should be fine to use for the few hours that a full "dd" backup of a large MDT takes. Using e2image for the backup may be somewhat less useful than having a full "dd" backup of the MDT for a few reasons:
On the flip side, the e2image backup will still be smaller than a full "dd" image, but that is not a concern if the target is HDD based and can achieve equal streaming performance as the source MDT device, even if the MDT has much higher IOPS. The only issue with an HDD backup image would be that running e2fsck on the image would be slower, but that can be done in the background after the backup is complete, so is not performance critical. |
| Comment by BNL Team [ 25/Sep/23 ] |
|
Hi Andreas, Jane
|