[LU-15965] Backup-Restore when one of the hard drive on the MDT failed. Created: 23/Jun/22 Updated: 05/Jul/22 Resolved: 05/Jul/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Question/Request | Priority: | Critical |
| Reporter: | Loan Thai | Assignee: | Andreas Dilger |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Dell EMC Redhat 7.4, IML version 4.0.3. Consist of IML server, 1 head node, 72 compute nodes, 1 MDT, MDSs, OSSs, and OSTs. |
||
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Hello, One of the hard drives on MDT will fail soon. We need to replace it. The question is after replace the failed hard drive, should the Metadata will be rebuilt automatically or do we have to run something to recover the metadata? We have IML ver 4.0.3. Should we backup the DB before replacing the failed HD? Please advice. Thanks. |
| Comments |
| Comment by Andreas Dilger [ 23/Jun/22 ] |
|
Yes, definitely backup the MDT. You should do that now, before the drive is failing, and probably a second backup right before replacing the drive. Please see https://wiki.lustre.org/Backing_Up_a_Lustre_File_System for details. Whether the MDT volume is automatically rebuilt when the drive is replaced depends on the underlying RAID hardware on your system. Lustre itself does not provide any redundancy, so there must be some kind of RAID at the storage level to protect from individual drive failures. That is not something that we support, please contact your storage vendor. |
| Comment by Loan Thai [ 24/Jun/22 ] |
|
Hi Andreas, Thanks for working on my issue. To do the Device-Level backup, it requires to unmount the target? I am afraid the drive will fail after unmount and also we can not unmount the FS. Is there other method to backup MDT without unmount the luster FS? For the Device-level backup, i need to use dd command: dd if=/dev/{original} of=/dev/{newdev} bs=4M. We have 2 MDS, should i run the dd command on both or just one MDS? Disks layout on the MDS is below. Should i backup all /dev/sdx or only /dev/sda ?
Thank you. |
| Comment by Andreas Dilger [ 24/Jun/22 ] |
|
The "dd" mechanism can be used with a mounted MDT. Though it may not be a perfect backup (possibly some recently created/deleted files may be missed), it will be much better than having no backup at all. The less activity on the filesystem, the better the backup will be. While you may have multiple MDS, I suspect you only have one MDT on a system this old. I would recommend backing up at least the MGT and the MGS. It isn't clear from your comment which device is the MDT, it should be clearly listed: mds# mount | grep lustre /dev/nvme0n1p1 on /mnt/myth/mgs type lustre (ro,svname=MGS,nosvc,mgs,osd=osd-ldiskfs,user_xattr,errors=remount-ro,_netdev) /dev/sda on /mnt/myth/ost0000 type lustre (ro,svname=myth-OST0000,mgsnode=192.168.20.1@tcp,osd=osd-ldiskfs,errors=remount-ro,extents,mballoc,_netdev) /dev/sdb on /mnt/myth/ost0001 type lustre (ro,svname=myth-OST0001,mgsnode=192.168.20.1@tcp,osd=osd-ldiskfs,errors=remount-ro,extents,mballoc,_netdev) /dev/sdc on /mnt/myth/ost0002 type lustre (ro,svname=myth-OST0002,mgsnode=192.168.20.1@tcp,osd=osd-ldiskfs,errors=remount-ro,extents,mballoc,_netdev) /dev/sdd on /mnt/myth/ost0003 type lustre (ro,svname=myth-OST0003,mgsnode=192.168.20.1@tcp,osd=osd-ldiskfs,errors=remount-ro,_netdev) /dev/sde on /mnt/myth/ost0004 type lustre (ro,svname=myth-OST0004,mgsnode=192.168.20.1@tcp,osd=osd-ldiskfs,errors=remount-ro,_netdev) /dev/nvme0n1p2 on /mnt/myth/mdt0000 type lustre (ro,svname=myth-MDT0000,mgsnode=192.168.20.1@tcp,osd=osd-ldiskfs,errors=remount-ro,iopen_nopriv,user_xattr,_netdev) In this example, the MDT is on /dev/nvme0n1p2. That said, having a backup of the Linux OS disk is also very useful, because it would take a lot of time and effort to reinstall and rebuild this system from scratch (if even possible, given the age), and since the OS disk doesn't change very often then even a single OS backup is probably enough. The lowest-price 4TB drive here is about $100, so backing up 300GB is about $7 worth of storage (probably less with a larger drive), and it will definitely take much more time and effort to rebuild the OS drive than $7. |
| Comment by Loan Thai [ 24/Jun/22 ] |
|
Sorry for the confusion. Our system was built by a vendor so i dont quite understand the setup.
Is the MDT on /dev/mapper/mpatha or /dev/mapper/mpathb ? MDS0-0# dd if=/dev/mapper/mpatha of=/mnt/my_NFS_mounted bs=4M. #-- is it correct? --# You are right about the HD for backup. It is not money but permission. Our system is standalone/isolated in a secured area. It is very hard to get an approval for add in devices. But i will take your advice seriously and work on it. |
| Comment by Andreas Dilger [ 24/Jun/22 ] |
|
Sorry, it isn't clear at all that the /mnt/MGS device is the right one. Check the output from "mount | grep lustre" on both MDS servers to see which device is mounting MDT0000, as was shown in my example output above. |
| Comment by Andreas Dilger [ 24/Jun/22 ] |
|
PS: I'd assume you know which RAID device is holding the failed drive? If yes, that is the device that should be backed up. |
| Comment by Loan Thai [ 28/Jun/22 ] |
|
GM Andrea, i am sorry for a late response as i was out on TDY yesterday. From the MDS0-0, the output of "mount | grep lustre" is: From the MDS0-1, the output of "mount | grep lustre" is:: So i will login MDS0-0 and backup the MDT with this command:
and also it is still good to backup the MGS right? dd if=/dev/mapper/mpatha of=/mnt/backupMGS_onNFSmounted bs=4M. Thanks. |
| Comment by Andreas Dilger [ 28/Jun/22 ] |
|
Yes, making a backup of both is a good idea. |
| Comment by Loan Thai [ 28/Jun/22 ] |
|
Hi Andreas, I ran dd command to backup the MDT to my NFS but had to break it out because i dont have enough space on my nfs. Is there another way just to back up the used space on MDT? Thanks. |
| Comment by Andreas Dilger [ 28/Jun/22 ] |
|
Using 'dd' is by far the fastest and most reliable way to do a backup and restore, and would be my strong recommendation for you to use. Yes, it does a full backup of the entire MDT device, but it also ensures that if it needs to be restored it will be exactly the same as before, and it can nominally be done while the system is in use (though I would recommend to make a second backup when you are able to stop the MDT, or immediately before the drive is replaced). You could consider piping the "dd" output through bzip2 to try and compress it, but I don't know how much compression you will get as this depends heavily on how full the MDT is and the lifetime of the system. this is not difficult to try, something like the following, but of course compressing and decompressing the backup will make that process take significantly longer: dd if=/dev/mapper/mpathb bs=4M | bzip2 -9 > /mnt/backupMDT0000_onNFSmounted.img.bz2 That said, it makes sense to start such a backup now while other options are considered. There are measures that could be taken to improve the compression of the dd image, but that would involve unmounting the MDT and writing a lot of zeroes to it, and this is not advisable if a drive is already near failure. The user manual also describes how to use "tar" to do a backup/restore, and this may take less space (depends on how full the MDT is), but will take much longer and put a lot more load on the MDT drives. This would also require reformatting the MDT before restoring the backup, and will not produce an "exact" backup and restore process. It also isn't clear to me what state your configuration is, how the MDT was initially formatted, the software versions, and your experience level in order to use that approach, and given the secure nature of the system it would be extremely difficult to assist you. This kind of system administration task is really outside the scope of the Lustre support contract. You really need to reconsider the relatively low cost of installing a suitable backup drive, not just for the MDT, but also for the OS images. Given the lack of familiarity with the system, restoring a failed OS drive might prove to be very difficult and time consuming. This is very worthwhile given the risk of potentially of losing all of the data in the filesystem if the MDT is lost. If there is some administration overhead to install a new drive there, then consider making it a large one so that it will last a long time and can hold multiple MDT backups. Most secure sites I've worked with have less objection to bringing in new equipment, and more objection to removing equipment, but even a few hundred dollar 14TB drive in a USB enclosure would be sufficient for this use and could be left onsite afterward, since it will be doing linear writes during the backup and could sustain full bandwidth, so would finish a full MDT backup in about 48h (I'm assuming based on the age of the system, USB2.0 ~= 25MB/s). |
| Comment by Loan Thai [ 29/Jun/22 ] |
|
I will do the dd backup and compress the data as you suggested. I have a couple big drives, 10 and 14tb. I will figure out and get approval to add them in. I will post the update. Thanks for your assistant. |
| Comment by Loan Thai [ 05/Jul/22 ] |
|
I followed your instruction to backup (with the compression) the MDT and MGS. Using Dell MDSM tool to manage the MDT and replacing the failed disk. Data is rebuilding now. Please close this ticket as resolved. Thanks so much for your help Andreas, |