Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10938 Metadata writeback cache support
  3. LU-13044

WBC3: remove the whole subtree on MDT already deleted in the client WBC cache

Details

    • Technical task
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 9223372036854775807

    Description

      In LU-13021, we have designed three flush mode for WBC.

      In WBC_FLUSH_AGE_LOCK_HOLD flush mode, to optimize the unlink() operation, we can also do unlink via background flush ->write_inode(). The design could be as follows:
      When unlink 'f' under the directory 'dir' ('f' and 'dir' are all protected under a root WBC EX lock):

      • Each directory dentry has a list L to maintain its children files or directories which have already flushed to MDT, but removed from WBC cache (MemFS) later;
      • If 'f' is not flushed to MDT (!Sync(S) state), remove it directly from cache (MemFS currently);
      • Otherwise, add a item which contains the name of the unlinking file into L of 'dir'; remove it from cache; And then mark 'dir' inode as dirty which will be flushed later;
      • When Linux kernel flushes an inode via ->write_inode(), if found that the directory 'dir' has some children files or directories which are already synced to MDT but unlinked locally in cache, it must do a unlink for these files or directories on MDT;
      • When MDT received an unlink request, if found it is not an empty directory (nlink > 0?), it must remove this whole subtree. This can be done Asynchronously:
        • move this directory into lost+found?, and then reply to the client;
        • launch a daemon thread on MDT dedicate to unlink this kind of directories under lost+found.

      Attachments

        Activity

          [LU-13044] WBC3: remove the whole subtree on MDT already deleted in the client WBC cache

          "Yingjin Qian <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/45643
          Subject: LU-13044 wbc: async subtree removal for keep flush mode
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 17ea03301e87f3192e51786a0ae5464c0bab00ab

          gerrit Gerrit Updater added a comment - "Yingjin Qian <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/45643 Subject: LU-13044 wbc: async subtree removal for keep flush mode Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 17ea03301e87f3192e51786a0ae5464c0bab00ab

          "Yingjin Qian <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/45642
          Subject: LU-13044 wbc: subtree removal for keep flush mode
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 67060cc94134971e37dce2ca82c404cb76aad209

          gerrit Gerrit Updater added a comment - "Yingjin Qian <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/45642 Subject: LU-13044 wbc: subtree removal for keep flush mode Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 67060cc94134971e37dce2ca82c404cb76aad209

          "Yingjin Qian <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/45640
          Subject: LU-13044 wbc: delay file removal for keep flush mode
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 8e3f2c7731f10903359acb1140d282c5f37b395f

          gerrit Gerrit Updater added a comment - "Yingjin Qian <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/45640 Subject: LU-13044 wbc: delay file removal for keep flush mode Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 8e3f2c7731f10903359acb1140d282c5f37b395f

          I think that having a dedicated "rm -r" optimization is a much lower priority than the batched RPCs support in LU-13045. Once we have batched RPCs then we will already be able to unlink files very quickly over the network. It also isn't clear that having the EX lock on an existing directory necessarily means that this client is the only one accessing the subtree.

          adilger Andreas Dilger added a comment - I think that having a dedicated "rm -r" optimization is a much lower priority than the batched RPCs support in LU-13045 . Once we have batched RPCs then we will already be able to unlink files very quickly over the network. It also isn't clear that having the EX lock on an existing directory necessarily means that this client is the only one accessing the subtree.

          The reason I suggest it needs to start at the bottom is that if removing a directory it would be best if the directory is already empty. Otherwise, there is a real danger if RMFID is allowed to remove the whole directory of entries. In general, the ability to remove a whole directory recursively is dangerous and should be restricted as much as possible.

          adilger Andreas Dilger added a comment - The reason I suggest it needs to start at the bottom is that if removing a directory it would be best if the directory is already empty. Otherwise, there is a real danger if RMFID is allowed to remove the whole directory of entries. In general, the ability to remove a whole directory recursively is dangerous and should be restricted as much as possible.
          qian_wc Qian Yingjin added a comment - - edited

          Could MDS_RMFID remove a whole subtree on MDT?

          For example, when flush a dirty directory /mnt/lustre/wbc/batch:

          /mnt/lustre/wbc/batch/dir1/a

          /mnt/lustre/wbc/batch/dir1/b

          /mnt/lustre/wbc/batch/dir1/c

          /mnt/lustre/wbc/batch/dir2/a

          /mnt/lustre/wbc/batch/dir1/dir3/dir4/bb

          ...

          /mnt/lustre/wbc/batch/f1

          /mnt/lustre/wbc/batch/f3

          these files are all removed from client WBC cache, but already flushed to MDT.

          At this time, it only needs to send unlink requests for its children files or directories:

          /mnt/lustre/wbc/batch/dir1

          /mnt/lustre/wbc/batch/dir2

          /mnt/lustre/wbc/batch/f1

          /mnt/lustre/wbc/batch/f3

           

          It does not need to start at the bottom for the best of the optimization.

           

           

           

          qian_wc Qian Yingjin added a comment - - edited Could MDS_RMFID remove a whole subtree on MDT? For example, when flush a dirty directory /mnt/lustre/wbc/batch: /mnt/lustre/wbc/batch/dir1/a /mnt/lustre/wbc/batch/dir1/b /mnt/lustre/wbc/batch/dir1/c /mnt/lustre/wbc/batch/dir2/a /mnt/lustre/wbc/batch/dir1/dir3/dir4/bb ... /mnt/lustre/wbc/batch/f1 /mnt/lustre/wbc/batch/f3 these files are all removed from client WBC cache, but already flushed to MDT. At this time, it only needs to send unlink requests for its children files or directories: /mnt/lustre/wbc/batch/dir1 /mnt/lustre/wbc/batch/dir2 /mnt/lustre/wbc/batch/f1 /mnt/lustre/wbc/batch/f3   It does not need to start at the bottom for the best of the optimization.      

          Another option would be to send one or more MDS_RMFID RPC with the FIDs of the files in the tree, starting at the bottom. One thing to be careful of is that RMFID will delete all links to the file, so we would need a slight modification to allow removing just some links.

          adilger Andreas Dilger added a comment - Another option would be to send one or more MDS_RMFID RPC with the FIDs of the files in the tree, starting at the bottom. One thing to be careful of is that RMFID will delete all links to the file, so we would need a slight modification to allow removing just some links.

          People

            wc-triage WC Triage
            qian_wc Qian Yingjin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: