Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18456

TCU: Trash Can/Undelete for Lustre

Details

    • 3
    • 9223372036854775807

    Description

      Introduction

      If files are accidentally deleted from a file system, an application may be interrupted and the user data may be permanently lost. The trash can (also called "undelete" or "recycle bin") is a recommended feature in file systems that acts as a virtual trash can, allowing users to store deleted files temporarily before permanently deleting them. It provides a way to restore or retrieve deleted files if needed.

      Once the trash can feature is enabled, when a user deletes a file from a file system, it is not actually deleted but moved to the trash can, deleted files and directories are temporarily stored in the trash can. The trash can may be manually emptied or once it is full, it will remove the oldest files first. Additionally, items in the trash can may be restored or retrieved if they are still there.

      Trash Can/Undelete Functionalities

      The trash can should including the following functionalities:

      • List "undeleted" files in the trash can;
      • After a file is deleted and moved into trash can, the quota for this file should be accounted and updated (reduced) accordingly;
      • The trash can, and all files therein are not visible in the namespace of the file system;
      • Restore a file in the trash can. This will restore a file to its original path. The corresponding quota account should be updated also;
      • Delete a file in the trash can. This will finally remove the file from the file system and free the used space. The file is now unrecoverable;
      • Empty the trash can. This will remove all files in the trash can;
      • A user can restore files from trash can within the specified retention period. By this way, a file can be kept "undeleted" under a pre-defined configurable grace period.
      • Enable/disable trash can feature on a entire file system;
      • A administrator can enable/disable trash can feature on a specified directory;

      Deleted files can no longer be restored from the trash can when:

      • A file (or directory) is deleted again from the trash can. In other words it have been deleted twice. The first deletion only moves the file to the trash can. The second deletion actually removes the file from the file system.
      • The trash can is emptied of all of its contents.

      The Trash Can/Undelete HLD contains details of the design and implementation of this feature.

      Attachments

        Issue Links

          Activity

            [LU-18456] TCU: Trash Can/Undelete for Lustre

            "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58137
            Subject: LU-18456 mdd: replicate XATTRs for a dir moving into trash
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 6f15f8fac9bf57cc2f8713c15e9629b4c5870274

            gerrit Gerrit Updater added a comment - "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58137 Subject: LU-18456 mdd: replicate XATTRs for a dir moving into trash Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 6f15f8fac9bf57cc2f8713c15e9629b4c5870274

            "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57872
            Subject: LU-18456 mdd: move tree with multiple levels into trash
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 34361ee78dd5d87b4df830b01405d9eddb7a1d71

            gerrit Gerrit Updater added a comment - "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57872 Subject: LU-18456 mdd: move tree with multiple levels into trash Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 34361ee78dd5d87b4df830b01405d9eddb7a1d71

            "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57809
            Subject: LU-18456 mdd: add option to enable/disable trash
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 8338c5cdb1b1d1bba8679df1b5cc90c9ba151636

            gerrit Gerrit Updater added a comment - "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57809 Subject: LU-18456 mdd: add option to enable/disable trash Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 8338c5cdb1b1d1bba8679df1b5cc90c9ba151636

            "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57748
            Subject: LU-18456 mdd: move regular files into trash upon last unlink
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 3689792654be30a54ee2252372d1035e726edf57

            gerrit Gerrit Updater added a comment - "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57748 Subject: LU-18456 mdd: move regular files into trash upon last unlink Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 3689792654be30a54ee2252372d1035e726edf57

            "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57612
            Subject: LU-18456 mdt: create trash dir for MDT after MDT stack setup
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 27513a29efb3ff369c927d0380d1cdf32d2e6304

            gerrit Gerrit Updater added a comment - "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57612 Subject: LU-18456 mdt: create trash dir for MDT after MDT stack setup Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 27513a29efb3ff369c927d0380d1cdf32d2e6304

            The current patch already has a time-based cleanup feature, which is good, but I think it is also critical that this have a demand-based cleanup feature if the filesystem is too full. Otherwise, users will try to delete their files when the filesystem is full, then not see any space being freed and their jobs will fail due to errors.

            I think showing the space being freed is at least as important as the files actually being deleted. Because we get bug reports from users about "we deleted 10TB of files but the space was not freed", but this is just slow because the MDS does a two-phase cleanup of deleted files to ensure that the objects are not lost in case of a crash

            adilger Andreas Dilger added a comment - The current patch already has a time-based cleanup feature, which is good, but I think it is also critical that this have a demand-based cleanup feature if the filesystem is too full. Otherwise, users will try to delete their files when the filesystem is full, then not see any space being freed and their jobs will fail due to errors. I think showing the space being freed is at least as important as the files actually being deleted. Because we get bug reports from users about "we deleted 10TB of files but the space was not freed", but this is just slow because the MDS does a two-phase cleanup of deleted files to ensure that the objects are not lost in case of a crash

            qian_wc  adilger  can we also add recycle bin auto deletion which is time based auto cleanup of  recycle bin. it's configurable value (e.g. auto_cleanup=86400 for one day)  This still allows rescuing of deleted files when user mistakes of operations, but also help consuming spaces of recycle bin for a long while. It's like a time policy based delayed deletion future.
            Also, I would exclude quota accounting for recycle bin as an option. It still depends option needs to be enabled or disabled, but I would have that option for it at least.
             

            sihara Shuichi Ihara added a comment - qian_wc   adilger   can we also add recycle bin auto deletion which is time based auto cleanup of  recycle bin. it's configurable value (e.g. auto_cleanup=86400 for one day)  This still allows rescuing of deleted files when user mistakes of operations, but also help consuming spaces of recycle bin for a long while. It's like a time policy based delayed deletion future. Also, I would exclude quota accounting for recycle bin as an option. It still depends option needs to be enabled or disabled, but I would have that option for it at least.  
            qian_wc Qian Yingjin added a comment - - edited

            We need to have some better mechanism to preserve the pathname. I don't think changelog is the right answer, because the size of the changelog is limited, and we don't want files to become unrecoverable because the changelog was purged.

            I think changelog may be still needed for Flashback feature. Please note flashback in ORACLE database also cannot recover from all accidental failures. ORACLE is also based on undo log record for transient mis-operations. If the undo log is cleared, it also cannot recover...

            One option could that "last unlink" into RECYCLE would create a directory named after the parent FID, and move the file into there. Then, if/when the parent directory is also removed, the FID-named directory in RECYCLE inherits the actual FID (and other xattrs like crypt, selinux, etc) from the now-deleted parent. This would avoid having to move all of the deleted files over to the "real" deleted paren, or having to scan RECYCKE/UID for files that belong to the parent. This would also need an OI update.

            So the suggestion is also storing directories into the recycle bin?

            IIUC, it should work as follows:

            Lustre mount point: /mnt/lustre
            File:
            /mnt/lustre/d1/d2/d3/tf1
            /mnt/lustre/d1/d2/d3/tf2
            All files are on MDT0001
            # lfs path2fid /mnt/lustre/d1/d2/d3
            [0x200034021:0x3:0x0]
            # lfs path2fid /mnt/lustre/d1/d2
            [0x200034021:0x2:0x0]
            # lfs path2fid /mnt/lustre/d1
            [0x200034021:0x2:0x0]
            # unlink /mnt/lustre/d1/d2/d3/tf1
            === RECYCLE BIN ====
            RECYCLE/MDT0001/UID/0x200034021:0x3:0x0/tf1
            # unlink /mnt/lustre/d1/d2/d3/tf2
            === RECYCLE BIN ====
            RECYCLE/MDT0001/UID/0x200034021:0x3:0x0/tf1
            RECYCLE/MDT0001/UID/0x200034021:0x3:0x0/tf2
            Please note that "RECYCLE/MDT0001/UID/0x200034021:0x3:0x0" is a file on MDT0001 but its FID is not 0x200034021:0x3:0x0;
            # rmdir /mnt/lustre/d1/d2/d3
            === RECYCLE BIN ====
            RECYCLE/MDT0001/UID/0x200034021:0x2:0x0/d3/tf1 RECYCLE/MDT0001/UID/0x200034021:0x2:0x0/d3/tf2

             In the last step, to avoid moving files under a deleting directory "d3", We need to rename "RECYCLE/MDT0001/UID/0x200034021:0x3:0x0" with "RECYCLE/MDT0001/UID/0x200034021:0x2:0x0/d3".

            However, this is not a standard "rename/mv" operation.

            Suppose that FID(file) is the file's FID and INODE(file) is the OSD inode; and an entry link is [name, FID, INODE].

            Then this means that we should unlink entry [name "0x200034021:0x3:0x0", FID(RECYCLE/MDT0001/UID/0x200034021:0x3:0x0), INODE(RECYCLE/MDT0001/UID/0x200034021:0x3:0x0)] from "RECYCLE/MDT0001/UID/". And Add an entry link [name "d3", FID(d3), INODE(RECYCLE/MDT0001/UID/0x200034021:0x3:0x0)] under "RECYCLE/MDT0001/UID/0x200034021:0x2:0x0". And sync with other attributes/XATTRs between the deleting INODE("d3") and INODE("RECYCLE/MDT0001/UID/0x200034021:0x3:0x0").

            In a summary, during the moving, we should keep the underlying inode but change with the FID of the deleting directory for the FID-named directory in the recycle bin.

            After that delete "d3" from FS.

             

            One area where MacOS handles this better is that "delete this folder" from the Finder will rename the whole directory into the .Trash directory rather than a bottom-up "rm -r" that deleted every file separately. That makes me wonder if we should add "lfs rmdir" (or similar) that directly moves the directory into RECYCLE rather than deleting all of the files separately?

            Yes, Metadata WBC has already implemented a subtree deletion with similar idea.

            However, In all above cases, we need handle DNE case carefully and also need to consider the quota accounting.

             

             

            qian_wc Qian Yingjin added a comment - - edited We need to have some better mechanism to preserve the pathname. I don't think changelog is the right answer, because the size of the changelog is limited, and we don't want files to become unrecoverable because the changelog was purged. I think changelog may be still needed for Flashback feature. Please note flashback in ORACLE database also cannot recover from all accidental failures. ORACLE is also based on undo log record for transient mis-operations. If the undo log is cleared, it also cannot recover... One option could that "last unlink" into RECYCLE would create a directory named after the parent FID, and move the file into there. Then, if/when the parent directory is also removed, the FID-named directory in RECYCLE inherits the actual FID (and other xattrs like crypt, selinux, etc) from the now-deleted parent. This would avoid having to move all of the deleted files over to the "real" deleted paren, or having to scan RECYCKE/UID for files that belong to the parent. This would also need an OI update. So the suggestion is also storing directories into the recycle bin? IIUC, it should work as follows: Lustre mount point: /mnt/lustre File: /mnt/lustre/d1/d2/d3/tf1 /mnt/lustre/d1/d2/d3/tf2 All files are on MDT0001 # lfs path2fid /mnt/lustre/d1/d2/d3 [0x200034021:0x3:0x0] # lfs path2fid /mnt/lustre/d1/d2 [0x200034021:0x2:0x0] # lfs path2fid /mnt/lustre/d1 [0x200034021:0x2:0x0] # unlink /mnt/lustre/d1/d2/d3/tf1 === RECYCLE BIN ==== RECYCLE/MDT0001/UID/0x200034021:0x3:0x0/tf1 # unlink /mnt/lustre/d1/d2/d3/tf2 === RECYCLE BIN ==== RECYCLE/MDT0001/UID/0x200034021:0x3:0x0/tf1 RECYCLE/MDT0001/UID/0x200034021:0x3:0x0/tf2 Please note that "RECYCLE/MDT0001/UID/0x200034021:0x3:0x0" is a file on MDT0001 but its FID is not 0x200034021:0x3:0x0; # rmdir /mnt/lustre/d1/d2/d3 === RECYCLE BIN ==== RECYCLE/MDT0001/UID/0x200034021:0x2:0x0/d3/tf1 RECYCLE/MDT0001/UID/0x200034021:0x2:0x0/d3/tf2  In the last step, to avoid moving files under a deleting directory "d3", We need to rename "RECYCLE/MDT0001/UID/0x200034021:0x3:0x0" with "RECYCLE/MDT0001/UID/0x200034021:0x2:0x0/d3". However, this is not a standard "rename/mv" operation. Suppose that FID(file) is the file's FID and INODE(file) is the OSD inode; and an entry link is [name, FID, INODE] . Then this means that we should unlink entry [name "0x200034021:0x3:0x0", FID(RECYCLE/MDT0001/UID/0x200034021:0x3:0x0), INODE(RECYCLE/MDT0001/UID/0x200034021:0x3:0x0)] from "RECYCLE/MDT0001/UID/". And Add an entry link [name "d3", FID(d3), INODE(RECYCLE/MDT0001/UID/0x200034021:0x3:0x0)] under "RECYCLE/MDT0001/UID/0x200034021:0x2:0x0". And sync with other attributes/XATTRs between the deleting INODE("d3") and INODE("RECYCLE/MDT0001/UID/0x200034021:0x3:0x0"). In a summary, during the moving, we should keep the underlying inode but change with the FID of the deleting directory for the FID-named directory in the recycle bin. After that delete "d3" from FS.   One area where MacOS handles this better is that "delete this folder" from the Finder will rename the whole directory into the .Trash directory rather than a bottom-up "rm -r" that deleted every file separately. That makes me wonder if we should add "lfs rmdir" (or similar) that directly moves the directory into RECYCLE rather than deleting all of the files separately? Yes, Metadata WBC has already implemented a subtree deletion with similar idea. However, In all above cases, we need handle DNE case carefully and also need to consider the quota accounting.    

            Storing the full pathname into every deleted file is a usable solution to store a 4KB xattr for every file that is deleted. This would block deleting files when the MDT is full. We need to have some better mechanism to preserve the pathname. I don't think changelog is the right answer, because the size of the changelog is limited, and we don't want files to become unrecoverable because the changelog was purged.

            Instead, I think this should check that the last parent FID+filename is listed in trusted.link in the inode, or create one if it is missing. This xattr can normally fit into the inode, and almost always would already exist (unless the file had hundreds of hard links at one time).

            Then, there needs to be a way of preserving the directory hierarchy in the RECYCLE folder, so that there are not millions of files in the same {{RECYCLE/UID}}directory, even if it is per-UID.

            One option could that "last unlink" into RECYCLE would create a directory named after the parent FID, and move the file into there. Then, if/when the parent directory is also removed, the FID-named directory in RECYCLE inherits the actual FID (and other xattrs like crypt, selinux, etc) from the now-deleted parent. This would avoid having to move all of the deleted files over to the "real" deleted paren, or having to scan RECYCKE/UID for files that belong to the parent. This would also need an OI update.

            One area where MacOS handles this better is that "delete this folder" from the Finder will rename the whole directory into the .Trash directory rather than a bottom-up "rm -r" that deleted every file separately. That makes me wonder if we should add "lfs rmdir" (or similar) that directly moves the directory into RECYCLE rather than deleting all of the files separately?

            adilger Andreas Dilger added a comment - Storing the full pathname into every deleted file is a usable solution to store a 4KB xattr for every file that is deleted. This would block deleting files when the MDT is full. We need to have some better mechanism to preserve the pathname. I don't think changelog is the right answer, because the size of the changelog is limited, and we don't want files to become unrecoverable because the changelog was purged. Instead, I think this should check that the last parent FID+filename is listed in trusted.link in the inode, or create one if it is missing. This xattr can normally fit into the inode, and almost always would already exist (unless the file had hundreds of hard links at one time). Then, there needs to be a way of preserving the directory hierarchy in the RECYCLE folder, so that there are not millions of files in the same {{RECYCLE/UID}}directory, even if it is per-UID. One option could that "last unlink" into RECYCLE would create a directory named after the parent FID, and move the file into there. Then, if/when the parent directory is also removed, the FID-named directory in RECYCLE inherits the actual FID (and other xattrs like crypt, selinux, etc) from the now-deleted parent. This would avoid having to move all of the deleted files over to the "real" deleted paren, or having to scan RECYCKE/UID for files that belong to the parent. This would also need an OI update. One area where MacOS handles this better is that "delete this folder" from the Finder will rename the whole directory into the .Trash directory rather than a bottom-up "rm -r" that deleted every file separately. That makes me wonder if we should add "lfs rmdir" (or similar) that directly moves the directory into RECYCLE rather than deleting all of the files separately?

            I think this feature should be enabled by default, because it doesn't help users to tell them about it after they have deleted their files. However, there needs to be a way to disable it (possibly on a per-nodemap basis) and return to the existing behavior that deleting a file actually deletes it, in case there is some problem with the feature and/or for security/privacy reasons.

            There also needs to be a simple interface for permanently deleting the files in the recycle bin (ie "Empty Trash").

            adilger Andreas Dilger added a comment - I think this feature should be enabled by default, because it doesn't help users to tell them about it after they have deleted their files. However, there needs to be a way to disable it (possibly on a per-nodemap basis) and return to the existing behavior that deleting a file actually deletes it, in case there is some problem with the feature and/or for security/privacy reasons. There also needs to be a simple interface for permanently deleting the files in the recycle bin (ie "Empty Trash").
            adilger Andreas Dilger added a comment - - edited

            I had a good idea for this feature - that we should assign a project ID (eg. -2U) to the RECYCLE directories with PROJID_INHERIT, and assign it to the files and directories added there, so that it is easy to track the number of files and the space used there.

            That will make it easy to subtract the recycle bin usage from "df" and "lfs df" by default (so that the filesystem does not always show 90% full), and it would also be a way for the administrator to put a limit on how much space is used by the recycle bin. Then, the MDS can check the project quota as well as the total filesystem space to delete old files from the RECYCLE directory. There should be an option like "lfs df --trash" and/or "--recycle" that shows the actual space usage without subtracting the RECYCLE PROJID.

            The assigned PROJID needs to be configurable on a per-nodemap basis, so that multiple tenants can have some space for undelete even if the other tenants are deleting a lot of files. As such, it probably makes sense to automatically map PROJID -1 and -2 (etc.) into the top of the ID offset range for each tenant, instead of requiring an explicit mapping.

            For "df" or "lfs df" in a nodemap projid directory, it would subtract the per-nodemap "-2" PROJID from the totals, rather than the global "-2" ID.

            adilger Andreas Dilger added a comment - - edited I had a good idea for this feature - that we should assign a project ID (eg. -2U) to the RECYCLE directories with PROJID_INHERIT , and assign it to the files and directories added there, so that it is easy to track the number of files and the space used there. That will make it easy to subtract the recycle bin usage from "df" and "lfs df" by default (so that the filesystem does not always show 90% full), and it would also be a way for the administrator to put a limit on how much space is used by the recycle bin. Then, the MDS can check the project quota as well as the total filesystem space to delete old files from the RECYCLE directory. There should be an option like " lfs df --trash " and/or " --recycle " that shows the actual space usage without subtracting the RECYCLE PROJID. The assigned PROJID needs to be configurable on a per-nodemap basis, so that multiple tenants can have some space for undelete even if the other tenants are deleting a lot of files. As such, it probably makes sense to automatically map PROJID -1 and -2 (etc.) into the top of the ID offset range for each tenant, instead of requiring an explicit mapping. For "df" or "lfs df" in a nodemap projid directory, it would subtract the per-nodemap "-2" PROJID from the totals, rather than the global "-2" ID.

            People

              qian_wc Qian Yingjin
              qian_wc Qian Yingjin
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated: