Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18456

TCU: Trash Can/Undelete for Lustre

Details

    • 3
    • 9223372036854775807

    Description

      Introduction

      If files are accidentally deleted from a file system, an application may be interrupted and the user data may be permanently lost. The trash can (also called "undelete" or "recycle bin") is a recommended feature in file systems that acts as a virtual trash can, allowing users to store deleted files temporarily before permanently deleting them. It provides a way to restore or retrieve deleted files if needed.

      Once the trash can feature is enabled, when a user deletes a file from a file system, it is not actually deleted but moved to the trash can, deleted files and directories are temporarily stored in the trash can. The trash can may be manually emptied or once it is full, it will remove the oldest files first. Additionally, items in the trash can may be restored or retrieved if they are still there.

      Trash Can/Undelete Functionalities

      The trash can should including the following functionalities:

      • List "undeleted" files in the trash can;
      • After a file is deleted and moved into trash can, the quota for this file should be accounted and updated (reduced) accordingly;
      • The trash can, and all files therein are not visible in the namespace of the file system;
      • Restore a file in the trash can. This will restore a file to its original path. The corresponding quota account should be updated also;
      • Delete a file in the trash can. This will finally remove the file from the file system and free the used space. The file is now unrecoverable;
      • Empty the trash can. This will remove all files in the trash can;
      • A user can restore files from trash can within the specified retention period. By this way, a file can be kept "undeleted" under a pre-defined configurable grace period.
      • Enable/disable trash can feature on a entire file system;
      • A administrator can enable/disable trash can feature on a specified directory;

      Deleted files can no longer be restored from the trash can when:

      • A file (or directory) is deleted again from the trash can. In other words it have been deleted twice. The first deletion only moves the file to the trash can. The second deletion actually removes the file from the file system.
      • The trash can is emptied of all of its contents.

      The Trash Can/Undelete HLD contains details of the design and implementation of this feature.

      Attachments

        Issue Links

          Activity

            [LU-18456] TCU: Trash Can/Undelete for Lustre

            "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58568
            Subject: LU-18456 trash: I/O operative limiting for a file in Trash Can
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: f48a7b5444db06fee8919175498763fe07f8c003

            gerrit Gerrit Updater added a comment - "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58568 Subject: LU-18456 trash: I/O operative limiting for a file in Trash Can Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: f48a7b5444db06fee8919175498763fe07f8c003

            Ticket LU-17648 is tracking an enhancement to store the JobID of the process deleting a file into an xattr, so that it is possible to debug after the fact why the file was deleted.  For interactive nodes, it is common to use jobid_name=%e.%u (procname_uid) so that it can be seen from a JobID like "rm.12344" that user 12344 ran "rm" to delete the file.

            For regular file deletion, this is of marginal use since it would only be available for forensic analysis (e.g. debugfs on the underlying MDT filesystem inodes to see what xattr was stored in the inode). With TCU, having the process name, UID, and timestamp of the deletion event would make it much easier to understand what happened, and of course to recover the files afterward.

            adilger Andreas Dilger added a comment - Ticket LU-17648 is tracking an enhancement to store the JobID of the process deleting a file into an xattr, so that it is possible to debug after the fact why the file was deleted.  For interactive nodes, it is common to use jobid_name=%e.%u ( procname_uid ) so that it can be seen from a JobID like " rm.12344 " that user 12344 ran " rm " to delete the file. For regular file deletion, this is of marginal use since it would only be available for forensic analysis (e.g. debugfs on the underlying MDT filesystem inodes to see what xattr was stored in the inode). With TCU, having the process name, UID, and timestamp of the deletion event would make it much easier to understand what happened, and of course to recover the files afterward.

            "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58492
            Subject: LU-18456 trash: mark LUSTRE_TRASH_FL for a file moving to trash
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: fee3c92955b069c6ae2d8f0b494ff93ba077bd23

            gerrit Gerrit Updater added a comment - "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58492 Subject: LU-18456 trash: mark LUSTRE_TRASH_FL for a file moving to trash Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: fee3c92955b069c6ae2d8f0b494ff93ba077bd23
            gerrit Gerrit Updater added a comment - - edited

            "Artem Blagodarenko <ablagodarenko@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58323
            Subject: LU-18456 lustre: Implement recycle bin for deleted files
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 24951367d7282068e938cae244aaefe9a38c762b

            gerrit Gerrit Updater added a comment - - edited "Artem Blagodarenko <ablagodarenko@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58323 Subject: LU-18456 lustre: Implement recycle bin for deleted files Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 24951367d7282068e938cae244aaefe9a38c762b

            Yingjin, can you please attach your presentation to this ticket.

            adilger Andreas Dilger added a comment - Yingjin, can you please attach your presentation to this ticket.

            "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58137
            Subject: LU-18456 mdd: replicate XATTRs for a dir moving into trash
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 6f15f8fac9bf57cc2f8713c15e9629b4c5870274

            gerrit Gerrit Updater added a comment - "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58137 Subject: LU-18456 mdd: replicate XATTRs for a dir moving into trash Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 6f15f8fac9bf57cc2f8713c15e9629b4c5870274

            "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57872
            Subject: LU-18456 mdd: move tree with multiple levels into trash
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 34361ee78dd5d87b4df830b01405d9eddb7a1d71

            gerrit Gerrit Updater added a comment - "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57872 Subject: LU-18456 mdd: move tree with multiple levels into trash Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 34361ee78dd5d87b4df830b01405d9eddb7a1d71

            "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57809
            Subject: LU-18456 mdd: add option to enable/disable trash
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 8338c5cdb1b1d1bba8679df1b5cc90c9ba151636

            gerrit Gerrit Updater added a comment - "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57809 Subject: LU-18456 mdd: add option to enable/disable trash Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 8338c5cdb1b1d1bba8679df1b5cc90c9ba151636

            "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57748
            Subject: LU-18456 mdd: move regular files into trash upon last unlink
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 3689792654be30a54ee2252372d1035e726edf57

            gerrit Gerrit Updater added a comment - "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57748 Subject: LU-18456 mdd: move regular files into trash upon last unlink Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 3689792654be30a54ee2252372d1035e726edf57

            "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57612
            Subject: LU-18456 mdt: create trash dir for MDT after MDT stack setup
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 27513a29efb3ff369c927d0380d1cdf32d2e6304

            gerrit Gerrit Updater added a comment - "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57612 Subject: LU-18456 mdt: create trash dir for MDT after MDT stack setup Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 27513a29efb3ff369c927d0380d1cdf32d2e6304

            The current patch already has a time-based cleanup feature, which is good, but I think it is also critical that this have a demand-based cleanup feature if the filesystem is too full. Otherwise, users will try to delete their files when the filesystem is full, then not see any space being freed and their jobs will fail due to errors.

            I think showing the space being freed is at least as important as the files actually being deleted. Because we get bug reports from users about "we deleted 10TB of files but the space was not freed", but this is just slow because the MDS does a two-phase cleanup of deleted files to ensure that the objects are not lost in case of a crash

            adilger Andreas Dilger added a comment - The current patch already has a time-based cleanup feature, which is good, but I think it is also critical that this have a demand-based cleanup feature if the filesystem is too full. Otherwise, users will try to delete their files when the filesystem is full, then not see any space being freed and their jobs will fail due to errors. I think showing the space being freed is at least as important as the files actually being deleted. Because we get bug reports from users about "we deleted 10TB of files but the space was not freed", but this is just slow because the MDS does a two-phase cleanup of deleted files to ensure that the objects are not lost in case of a crash

            People

              qian_wc Qian Yingjin
              qian_wc Qian Yingjin
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated: