[LU-12125] Allow parallel rename of regular files Created: 28/Aug/15  Updated: 16/Jan/24  Resolved: 21/Apr/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.15.0

Type: Improvement Priority: Minor
Reporter: Andreas Dilger Assignee: Andreas Dilger
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-12037 Possible DNE issue leading to hung fi... Resolved
is related to LU-6864 DNE3: Support multiple modify RPCs in... Resolved
is related to LU-15913 rename stress test leads to REMOTE_PA... Reopened
is related to LU-15285 same dir rename deadlock Resolved
is related to LU-17426 parallel cross-directory rename of re... Open
Rank (Obsolete): 9223372036854775807

 Description   

In theory it should be possible to relax renaming rules and allow parallel renaming of regular (or non-directory) files.

  • the easiest and most common case would be rename of files within the same parent directory, like "mv foo foo.old" where we would only need to lock the DLM resources of the source and target name hashes
  • the next case would be to allow rename of files between directories

Probably the easiest is to order such locks by DLM resource (FID+hash) to avoid deadlocks between multiple threads doing renames. For directory renames, they still need to be serialized by the BFL and also take the name hash locks in the source and target dirs to avoid conflict with regular file renames. In theory it would also be possible to allow concurrent rename of directories within the same parent, but that could be done in a separate step.

For renames across shards within a single DNE striped directory they would be treated as renames across directories.



 Comments   
Comment by Andreas Dilger [ 28/Aug/15 ]

Di, Lai, what do you think of this?

How hard would this be to implement? A few lines of code to fix up the locking for regular files? A bunch of code cleanup required first? A major feature to rewrite large parts of the MDS code?

In some sense the rename of a regular file could be treated the same as the creation of a file in the target dir and deletion of the file from the source dir, so pdirop locking within the directory would be enough, unlike rename of directories which need full tree locks to avoid creating loops or orphan trees.

Comment by Di Wang [ 28/Aug/15 ]

Yes, file rename under the same parent should be easy to implement, (only need fix mdt_rename_lock and order locks of source and target). Hmm, even rename files between directories might be easy to fix, probably we only need consider to order locks of the source and target parents, though with current implementation, we need check the parent/child relationship to order the locks, which will probably still need BFL lock unless we can order the locks by FID + hash, but then we will need change the order of locks everywhere (i.e. from parent-child to FID+hash order), which probably need some time to implement.

As you said, directories rename might not be easy, unless we have subtree lock or similar thing.

There are still a few simply fixes we can do

  • rename lock can be released early in some cases, maybe after we get LDLM locks, though I am not sure if we can do that for directories rename.
  • For DNE, there can be BFL locks on each MDT, we probably only needs to enqueue BFL locks on MDTs where source and target parent locate.
Comment by Andreas Dilger [ 05/Sep/19 ]

This might also need the patch https://review.whamcloud.com/14375 "LU-6864 osp: manage number of modify RPCs in flight" to allow multiple OUT RPCs in flight at one time when renaming between shards of striped directories or across MDTs.

Comment by Gerrit Updater [ 09/Jan/21 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41184
Subject: LU-12125 llite: send file mode with rename RPC
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d7eb3ea257299e5942a8aac77d98ddb6132f2014

Comment by Andreas Dilger [ 09/Jan/21 ]

Lai, do you have a chance to look into this to see how hard it would be to handle same-dir renames (for striped directories only if source and target stripe are the same parent FID) with just a lock on the parent FID? Given that we don't really support interop between different MDS versions, there isn't much concern for protocol interoperability.

We might consider an OBD_CONNECT2_RENAME flag to ensure that all of the MDTs have support for this, but even that may not be necessary. At worst, if any of the other MDTs don't support same-dir renames they will still request the LUSTRE_BFL_FID lock from MDT0000 before getting the parent FID locks, which is sub-optimal but is not problematic. Since the same-dir rename will also lock the parent FID, if other MDSes allow this but MDT0000 does not, then at worst the same-dir rename will only be blocked on the directory lock and not LUSTRE_BFL_FID, but this does not affect correctness since there is no lock ordering problem when only renaming regular files.

It may be that all is needed is to skip the call to mdt_rename_lock() call if msrcdir == mtgtdir and the renamed object is a regular file. If the client sends the file type with the rename RPC then this is relatively easily done. Unfortunately, that is not the case today:

static int ll_rename(struct inode *src, struct dentry *src_dchild,
                     struct inode *tgt, struct dentry *tgt_dchild)
{
        :
        op_data = ll_prep_md_op_data(NULL, src, tgt, NULL, 0, mode=0,
                                     LUSTRE_OPC_ANY, NULL);


struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data,
                                      struct inode *i1, struct inode *i2,
                                      const char *name, size_t namelen,
                                      __u32 mode, enum md_op_code opc,
                                      void *data)
{
        :
        op_data->op_mode = mode;

I've pushed a simple patch to send the file mode along with the rename RPC so that the locking decision can be made easily on the MDS. Hopefully this can be included into 2.14 before it is released.

Comment by Gerrit Updater [ 09/Jan/21 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41186
Subject: LU-12125 mds: allow parallel regular file rename
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: bc1e773b4364dde14df17645dd4d27b1143d1ccf

Comment by Gerrit Updater [ 15/Jan/21 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41229
Subject: LU-12125 mds: allow parallel directory rename
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8ba0a64c75dbc69bf9de774b67a7e49419f3ba7d

Comment by Gerrit Updater [ 15/Jan/21 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41230
Subject: LU-12125 mds: allow parallel directory rename
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8f50962fb02463d620163f9444a430ceca693368

Comment by Gerrit Updater [ 15/Jan/21 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41231
Subject: LU-12125 tests: allow racer to specify extra tasks
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4a12916a0375d97fa67bdcdcf860fcf9b88f4192

Comment by Gerrit Updater [ 22/Jan/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/41184/
Subject: LU-12125 llite: send file mode with rename RPC
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 47576dc68cd90068cd5cbd5c889f04531fe6e682

Comment by Andreas Dilger [ 09/Feb/21 ]

Further improvements in this area beyond the patches already pushed for same-dir rename operations:

  • instead of locking whole parent directory, just lock parent FID+source/target hash to allow parallel file (or directory) rename within a single directory
  • allow rename between shards of a striped directory, given that we know source and target shards are not child/ancestor of each other (though there would still need to ordering for locking of individual shards)
  • allow regular file rename between different source/target directories, which will need to get a fixed locking order to avoid deadlocks. That code already exists, so may be "only" a matter of dropping the BFL once the source/target are locked. With only regular file rename, there is no strong requirement to know the parent/child relationship of the parent directories, but the consistent locking order is still required to avoid AB/BA deadlocks with other concurrent operations.

I think the benefit of allowing arbitrary parallel directory rename is pretty small, and probably not worthwhile pursuing in the context of this ticket.

Comment by Gerrit Updater [ 06/Mar/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/41186/
Subject: LU-12125 mds: allow parallel regular file rename
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: d76cc65d5d68ed3e04bfbd9b7527f64ab0ee0ca7

Comment by Gerrit Updater [ 21/Apr/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/41230/
Subject: LU-12125 mds: allow parallel directory rename
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 90979ab390a72a084f6a77cf7fdc29a4329adb41

Comment by Gerrit Updater [ 21/Apr/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/41231/
Subject: LU-12125 tests: allow racer to specify extra tasks
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: fe2663f18e50023ad5cfe5e07b695378dd27a68e

Comment by Peter Jones [ 21/Apr/21 ]

Looks like everything has landed for 2.15

Generated at Sat Feb 10 02:49:52 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.