Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18456

TCU: Trash Can/Undelete for Lustre

Details

    • 3
    • 9223372036854775807

    Description

      Introduction

      If files are accidentally deleted from a file system, an application may be interrupted and the user data may be permanently lost. The trash can (also called "undelete" or "recycle bin") is a recommended feature in file systems that acts as a virtual trash can, allowing users to store deleted files temporarily before permanently deleting them. It provides a way to restore or retrieve deleted files if needed.

      Once the trash can feature is enabled, when a user deletes a file from a file system, it is not actually deleted but moved to the trash can, deleted files and directories are temporarily stored in the trash can. The trash can may be manually emptied or once it is full, it will remove the oldest files first. Additionally, items in the trash can may be restored or retrieved if they are still there.

      Trash Can/Undelete Functionalities

      The trash can should including the following functionalities:

      • List "undeleted" files in the trash can;
      • After a file is deleted and moved into trash can, the quota for this file should be accounted and updated (reduced) accordingly;
      • The trash can, and all files therein are not visible in the namespace of the file system;
      • Restore a file in the trash can. This will restore a file to its original path. The corresponding quota account should be updated also;
      • Delete a file in the trash can. This will finally remove the file from the file system and free the used space. The file is now unrecoverable;
      • Empty the trash can. This will remove all files in the trash can;
      • A user can restore files from trash can within the specified retention period. By this way, a file can be kept "undeleted" under a pre-defined configurable grace period.
      • Enable/disable trash can feature on a entire file system;
      • A administrator can enable/disable trash can feature on a specified directory;

      Deleted files can no longer be restored from the trash can when:

      • A file (or directory) is deleted again from the trash can. In other words it have been deleted twice. The first deletion only moves the file to the trash can. The second deletion actually removes the file from the file system.
      • The trash can is emptied of all of its contents.

      The Trash Can/Undelete HLD contains details of the design and implementation of this feature.

      Attachments

        Issue Links

          Activity

            [LU-18456] TCU: Trash Can/Undelete for Lustre

            "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58997
            Subject: LU-18456 tcu: update LinkEA when move file into Trash Can
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 09a8bbf75d550b5f48282a1e31c323e5b4f8a90a

            gerrit Gerrit Updater added a comment - "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58997 Subject: LU-18456 tcu: update LinkEA when move file into Trash Can Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 09a8bbf75d550b5f48282a1e31c323e5b4f8a90a
            qian_wc Qian Yingjin made changes -
            Link New: This issue is related to LU-18917 [ LU-18917 ]

            "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58746
            Subject: LU-18456 tcu: add option to set trash can type
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 746a695241e7887d59fcfb4dbe7d7b92a5dfa60e

            gerrit Gerrit Updater added a comment - "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58746 Subject: LU-18456 tcu: add option to set trash can type Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 746a695241e7887d59fcfb4dbe7d7b92a5dfa60e
            adilger Andreas Dilger made changes -
            Epic Link Original: EX-428 [ 55037 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is cloned by LU-18914 [ LU-18914 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-18913 [ LU-18913 ]
            adilger Andreas Dilger made changes -
            Description Original: h2. Introduction

            If files are accidentally deleted from a file system, an application may be interrupted and the user data may be permanently lost. The trash can (also called "undelete" or "recycle bin") is a recommended feature in file systems that acts as a virtual trash can, allowing users to store deleted files temporarily before permanently deleting them. It provides a way to restore or retrieve deleted files if needed.

            Once the trash can feature is enabled, when a user deletes a file from a file system, it is not actually deleted but moved to the trash can, deleted files and directories are temporarily stored in the trash can. The trash can may be manually emptied or once it is full, it will remove the oldest files first. Additionally, items in the trash can may be restored or retrieved if they are still there.
            h2. Trash Can/Undelete Functionalities

            The trash can should including the following functionalities:
             - List "undeleted" files in the trash can;
             - After a file is deleted and moved into trash can, the quota for this file should be accounted and updated (reduced) accordingly;
             - The trash can, and all files therein are not visible in the namespace of the file system;
             - Restore a file in the trash can. This will restore a file to its original path. The corresponding quota account should be updated also;
             - Delete a file in the trash can. This will finally remove the file from the file system and free the used space. The file is now unrecoverable;
             - Empty the trash can. This will remove all files in the trash can;
             - A user can restore files from trash can within the specified retention period. By this way, a file can be kept "undeleted" under a pre-defined configurable grace period.
             - Enable/disable trash can feature on a entire file system;
             - A administrator can enable/disable trash can feature on a specified directory;

            Deleted files can no longer be restored from the trash can when:
             - A file (or directory) is deleted again from the trash can. In other words it have been deleted twice. The first deletion only moves the file to the trash can. The second deletion actually removes the file from the file system.
             - The trash can is emptied of all of its contents.

            The [Trash Can/Undelete HLD]|https://wiki.whamcloud.com/pages/viewpage.action?pageId=351437962] contains details of the design and implementation of this feature.
            New: h2. Introduction

            If files are accidentally deleted from a file system, an application may be interrupted and the user data may be permanently lost. The trash can (also called "undelete" or "recycle bin") is a recommended feature in file systems that acts as a virtual trash can, allowing users to store deleted files temporarily before permanently deleting them. It provides a way to restore or retrieve deleted files if needed.

            Once the trash can feature is enabled, when a user deletes a file from a file system, it is not actually deleted but moved to the trash can, deleted files and directories are temporarily stored in the trash can. The trash can may be manually emptied or once it is full, it will remove the oldest files first. Additionally, items in the trash can may be restored or retrieved if they are still there.
            h2. Trash Can/Undelete Functionalities

            The trash can should including the following functionalities:
             - List "undeleted" files in the trash can;
             - After a file is deleted and moved into trash can, the quota for this file should be accounted and updated (reduced) accordingly;
             - The trash can, and all files therein are not visible in the namespace of the file system;
             - Restore a file in the trash can. This will restore a file to its original path. The corresponding quota account should be updated also;
             - Delete a file in the trash can. This will finally remove the file from the file system and free the used space. The file is now unrecoverable;
             - Empty the trash can. This will remove all files in the trash can;
             - A user can restore files from trash can within the specified retention period. By this way, a file can be kept "undeleted" under a pre-defined configurable grace period.
             - Enable/disable trash can feature on a entire file system;
             - A administrator can enable/disable trash can feature on a specified directory;

            Deleted files can no longer be restored from the trash can when:
             - A file (or directory) is deleted again from the trash can. In other words it have been deleted twice. The first deletion only moves the file to the trash can. The second deletion actually removes the file from the file system.
             - The trash can is emptied of all of its contents.

            The [Trash Can/Undelete HLD|https://wiki.whamcloud.com/pages/viewpage.action?pageId=351437962] contains details of the design and implementation of this feature.
            adilger Andreas Dilger made changes -
            Description Original: h2. Introduction

            If files are accidentally deleted from a file system, an application may be interrupted and the user data may be permanently lost. The trash can (also called "undelete" or "recycle bin") is a recommended feature in file systems that acts as a virtual trash can, allowing users to store deleted files temporarily before permanently deleting them. It provides a way to restore or retrieve deleted files if needed.

            Once the trash can feature is enabled, when a user deletes a file from a file system, it is not actually deleted but moved to the trash can, deleted files and directories are temporarily stored in the trash can. The trash can may be manually emptied or once it is full, it will remove the oldest files first. Additionally, items in the trash can may be restored or retrieved if they are still there.
            h2. Trash Can/Undelete Functionalities

            The trash can should including the following functionalities:
             - List "undeleted" files in the trash can;
             - After a file is deleted and moved into trash can, the quota for this file should be accounted and updated (reduced) accordingly;
             - The trash can, and all files therein are not visible in the namespace of the file system;
             - Restore a file in the trash can. This will restore a file to its original path. The corresponding quota account should be updated also;
             - Delete a file in the trash can. This will finally remove the file from the file system and free the used space. The file is now unrecoverable;
             - Empty the trash can. This will remove all files in the trash can;
             - A user can restore files from trash can within the specified retention period. By this way, a file can be kept "undeleted" under a pre-defined configurable grace period.
             - Enable/disable trash can feature on a entire file system;
             - A administrator can enable/disable trash can feature on a specified directory;

            Deleted files can no longer be restored from the trash can when:
             - A file (or directory) is deleted again from the trash can. In other words it have been deleted twice. The first deletion only moves the file to the trash can. The second deletion actually removes the file from the file system.
             - The trash can is emptied of all of its contents.

            h2. Design and Implementation for Trash Can in Lustre

            The design for the trash can feature in Lustre is straight forward.

            On the server side, It just implements the basic functionalities such as moving the "undeleted" files into the cycle bin and the interface how to traverse them. On the client side, it implements the basic utility tools to interact with the trash can ({{{}lfs trash list|rm|unrm FILE{}}}), including:
             - Set or clear the recycle flag on a given file or directory;
             - list "undeleted" files;
             - Permanently delete a file within the trash can;
             - Empty the trash can, or a subdirectory of it;
             - Restore a file or directory in the trash can;

            *Our mechanism only moves the regular files into the trash can upon its last unlink, but by default does not preserve hard links to a file.*

            It borrows lots of ideas from orphan and volatile files in Lustre (which stores in "{{{}ROOT/PENDING{}}}" directory on each MDT). During the format and setup, each MDT creates a "{{{}ROOT/TRASH{}}}" directory as a trash can to store "undeleted" files.

            The POSIX API is used to traverse the files under the trash can on a given MDT. First, a client can get the FID of trash can directory "{{{}ROOT/TRASH{}}}" on the MDT. Then the client can get the file handle via {{{}dir_fd=llapi_open_by_fid_at(){}}}; after that, the "undeleted" files within the trash can can be traversed via {{{}readdir(){}}}. It can open by {{openat(dir_fd, dent)}} and obtain the "undeleted" XATTR, which contains the necessary information to restore, via {{{}fgetxattr(fd, "trusted.unrm"){}}}. The client can even read the data or swap layout of the "undeleted" file on the trash can for restore: {{{}opendir()/readddir()/openat()/fgetxattr("trusted.unrm")/close()/closedir(){}}}.

            The workflow for the trash can is as follows:
             - An administrator can enable/disable trash can feature on a specified MDT via: {{{}mdd.*.enable_trash_can{}}};
             - An administrator can enable/disable trash can feature on a specified directory or a file via the Lustre specified file flag: {{LUSTRE_TRASH_FL}} (similar to {{{}LUSTRE_ENCRYPT_FL{}}}). All sub files under a directory flagged with {{LUSTRE_TRASH_FL}} can inherit this flag.

            {code:java}
             # lctl recycle set_flag $file|$dir
             # lctl recycle clear_flag $file|$dir
            {code}
             - *Move a deleting file into the trash can.* When delete a regular file marked with {{LUSTRE_TRASH_FL}} upon its last unlink, first move the file into the trash can directory "{{{}ROOT/TRASH{}}}" with FID as its name. And then set a "{{{}trusted.unrm{}}}" xattr on the "undeleted" file on the trash can. The xattr contains the following information:

            {code:java}
            struct lustre_unrm_xattr {
                    __u32 lurm_uid; /* uid of the deleting file, used for quota accounting */
                    __u32 lurm_gid; /* gid of the deleting file, used for quota accounting */
                    __u32 lurm_projid; /* projid of the deleting file, used for quota accounting */
                    __u32 lurm_unused; /* unused, for field alignment/future use */
                    __u64 lurm_dtime; /* Timestamp that the file moved into the trash can */
            };
            {code}
            Where {{lurm_uid/gid}} is the original uid/gid of the deleting file, mainly used for quota accounting for the restore operation; {{lurm_dtime}} is the time that the file was moved into the trash can. It is used to determine whether the file is expired for the specified retention period and thus should be removed from the trash can finally. .
             - List "undeleted" files within a trash can on a given MDT:
            {code:java}
             # lfs trash [ls|list] [-i|--id UID] [DIR]
            uid gid size delete time FID Fullpath
            0 0 4096 Nov 14 08:11 [0x200034021:0x1:0x0] DIR/f1
            0 0 32104 Nov 14 08:07 [0x200034021:0x2:0x0] DIR/f2
            ...
            {code}
            Where {{DIR}} is an optional directory in a Lustre filesystem, or the current working directory if unspecified. This will list directories under {{{}MOUNT/.lustre/trash/{_}UID{_}/{_}DIRFID{_}{}}}.

            The pseudo code:
            {code:java}
            rbin_fid = llapi_trash_fid_get(MNTPT, mdt);
            dir_fd = llapi_open_by_fid(MNTPT, rbin_fid);
            while ((ent = readdir(dir_fd)) != NULL) {
                fd = openat(dir_fd, ent->d_name);
                fgetxattr(fd, "trusted.unrm", xattr_buf);
                print_one(ent->d_name, xattr_buf);
                close(fd);
            }
            close(dir_fd);
            {code}
             - Deleting a file in the trash can will remove the temporary file under "{{{}ROOT/TRASH{}}}" and free the data space on Lustre OSTs permanently.
            {code:java}
             # lfs trash delete [-i|--id UID] DIR/FILE
            {code}
            The pseudo code:
            {code:java}
            rbin_fid = llapi_trash_fid_get(MNTPT, mdt);
            dir_fd = llapi_open_by_fid(MNTPT, rbin_fid);
            unlinkat(dir_fd, "FID", 0);
            close(dir_fd);
            {code}

             - Empty a trash can (recursively delete all files/directories under {{{}_DIR_{}}}) :
            {code:java}
             # lfs trash empty [-i|--id UID] DIR
            {code}
            The pseudo code:
            {code:java}
            rbin_fid = llapi_trash_fid_get(MNTPT, mdt);
            dir_fd = llapi_open_by_fid(MNTPT, rbin_fid);
            while ((ent = readdir(dir_fd)) != NULL) {
                unlinkat(dir_fd, ent->d_name, 0);
            }
            close(dir_fd);
            {code}

             - Restore a file in the trash can on a given MDT. It will restore the file and its content according to the saved full path and then delete the stub on the trash can.

            {code:java}
             # lfs trash [unrm|restore] DIR/FILE
            {code}
            The pseudo code:
            {code:java}
            rbin_fid = llapi_trash_fid_get(MNTPT, mdt);
            dir_fd = llapi_open_by_fid(MNTPT, rbin_fid);
            fd = openat(dir_fd, FID, O_RDONLY);
            fgetxattr(fd, "trusted.unrm", xattr_buf);
            mkdir -p dirname(xattr_buf.path)
            { way 1:
            dst_fd = open(xattr_buf.path, O_CREAT);
            // copy the file data via read()/write() syscall
            copy_data(dst_fd, fd);
            close(dst_fd);
            unlinkat(dir_fd, "FID", 0);
            }
            { way 2:
            mknod(xattr_buf.path);
            dst_fid=path2fid(xattr_buf.path)
            swap_layouts(dst_fid, FID);
            unlinkat(dir_fd, "FID", 0)
            }
            { way 3:
            parent_fid=path2fid(dirname(xattr_buf.path))
            ioctl(IOCTL_TRASH_RESTORE, parent_fid, FID);
            in the ioctl(), mv the FID into parent_fid on MDT.
            }
            close(fd);
            close(dir_fd);
            {code}
             - LFSCK periodically scans the files under trash can directory "{{{}ROOT/TRASH{}}}" and delete the file with grace time expired.

             - Provide the functionality to scan "undeleted" files on all MDTs with the grace time expired manually and delete all of them (essentially just "{{{}find{}}}" on the files in {{_DIR_}} in trash for that user).
            {code:java}
            # lfs trash check [--expire_time|-E time] [-i|--id UID] [DIR]
            {code}

             - Provide the functionality to restore/delete all files within a given directory.
            This can be achieved by using the command combination of "{{{}lfs trash list{}}}" and "{{{}lfs trash unrm{}}}" or "{{{}lfs trash delete{}}}" to filter the files with the full path attribute within a given directory.
             - Provide "{{{}.trash/MDTxxxx{}}}" (where N is the MDT index) filesystem namespace. By this way, users can list the "undeleted" files with normal userspace tools in the trash can directory on a given {{MDTxxxx}} via POSIX file system API. However, users can not read these files while they are in the trash, to prevent abuse of quota limits and prevent applications from using them. We can perform the following commands from a Lustre namespace (mount point of "{{{}/mnt/lustre{}}}") on a client:

            {code:java}
            # ls /mnt/lustre/.lustre/trash/mdt2/UID
            0x200034021:0x1:0x0
            0x200034021:0x2:0x0
            ...

            # lfs trash ls /mnt/lustre/jsmith/project
            UID GID size delete_time FID Fullpath
            0 0 4096 Nov 14 08:11 [0x200034021:0x1:0x0]->/mnt/lustre/bob/project/f1
            # ls -R /mnt/lustre/.lustre/trash/MDT0002
            /mnt/lustre/.lustre/trash/MDT0002/1000/[0x200034021:0x1:0x0]/f1
            /mnt/lustre/.lustre/trash/MDT0002/1005/[0x200032140:0x44:0x0]/subdir/file
             ...
            {code}
            New: h2. Introduction

            If files are accidentally deleted from a file system, an application may be interrupted and the user data may be permanently lost. The trash can (also called "undelete" or "recycle bin") is a recommended feature in file systems that acts as a virtual trash can, allowing users to store deleted files temporarily before permanently deleting them. It provides a way to restore or retrieve deleted files if needed.

            Once the trash can feature is enabled, when a user deletes a file from a file system, it is not actually deleted but moved to the trash can, deleted files and directories are temporarily stored in the trash can. The trash can may be manually emptied or once it is full, it will remove the oldest files first. Additionally, items in the trash can may be restored or retrieved if they are still there.
            h2. Trash Can/Undelete Functionalities

            The trash can should including the following functionalities:
             - List "undeleted" files in the trash can;
             - After a file is deleted and moved into trash can, the quota for this file should be accounted and updated (reduced) accordingly;
             - The trash can, and all files therein are not visible in the namespace of the file system;
             - Restore a file in the trash can. This will restore a file to its original path. The corresponding quota account should be updated also;
             - Delete a file in the trash can. This will finally remove the file from the file system and free the used space. The file is now unrecoverable;
             - Empty the trash can. This will remove all files in the trash can;
             - A user can restore files from trash can within the specified retention period. By this way, a file can be kept "undeleted" under a pre-defined configurable grace period.
             - Enable/disable trash can feature on a entire file system;
             - A administrator can enable/disable trash can feature on a specified directory;

            Deleted files can no longer be restored from the trash can when:
             - A file (or directory) is deleted again from the trash can. In other words it have been deleted twice. The first deletion only moves the file to the trash can. The second deletion actually removes the file from the file system.
             - The trash can is emptied of all of its contents.

            The [Trash Can/Undelete HLD]|https://wiki.whamcloud.com/pages/viewpage.action?pageId=351437962] contains details of the design and implementation of this feature.

            "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58568
            Subject: LU-18456 trash: I/O operative limiting for a file in Trash Can
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: f48a7b5444db06fee8919175498763fe07f8c003

            gerrit Gerrit Updater added a comment - "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58568 Subject: LU-18456 trash: I/O operative limiting for a file in Trash Can Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: f48a7b5444db06fee8919175498763fe07f8c003

            Ticket LU-17648 is tracking an enhancement to store the JobID of the process deleting a file into an xattr, so that it is possible to debug after the fact why the file was deleted.  For interactive nodes, it is common to use jobid_name=%e.%u (procname_uid) so that it can be seen from a JobID like "rm.12344" that user 12344 ran "rm" to delete the file.

            For regular file deletion, this is of marginal use since it would only be available for forensic analysis (e.g. debugfs on the underlying MDT filesystem inodes to see what xattr was stored in the inode). With TCU, having the process name, UID, and timestamp of the deletion event would make it much easier to understand what happened, and of course to recover the files afterward.

            adilger Andreas Dilger added a comment - Ticket LU-17648 is tracking an enhancement to store the JobID of the process deleting a file into an xattr, so that it is possible to debug after the fact why the file was deleted.  For interactive nodes, it is common to use jobid_name=%e.%u ( procname_uid ) so that it can be seen from a JobID like " rm.12344 " that user 12344 ran " rm " to delete the file. For regular file deletion, this is of marginal use since it would only be available for forensic analysis (e.g. debugfs on the underlying MDT filesystem inodes to see what xattr was stored in the inode). With TCU, having the process name, UID, and timestamp of the deletion event would make it much easier to understand what happened, and of course to recover the files afterward.

            People

              qian_wc Qian Yingjin
              qian_wc Qian Yingjin
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated: