[LU-17441] move rename RPC handling to MDS_IO_PORTAL Created: 18/Jan/24  Updated: 23/Jan/24

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Andreas Dilger Assignee: Andreas Dilger
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-14564 Allow number of threads to grow when ... Open
is related to LU-17426 parallel cross-directory rename of re... Open
is related to LU-17427 reduce hold time for BFL rename lock Open
is related to LU-17434 DNE3: add exclude list for remote sub... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Apache Spark primarily writes files using a _temporary staging directory to hold files, and then renames the files to their final location. Since the files in the _temporary directory are renamed out of the directory, they currently always need the BFL lock, and there may be thousands of such renames concurrently (at least until LU-17426 and LU-17434 are implemented to allow such renames without BFL).

However, even after those changes are implemented, there may be many concurrent renames that need the BFL lock (more than the number of MDS_REQUEST_PORTAL service threads), they will block these threads until each is able to get the rename lock, and prevent other MDS_REINT RPCs from being processed.

Since the MDS_IO_PORTAL is often unused (only needed for DoM files, and has existed since 2.11.0, it seems possible to move the rename RPCs to be serviced by the MDS_IO_PORTAL threads to avoid contention on the primary MDS service threads. Also, it will avoid blocking normal file open, setattr, statfs, and other common operations if the BFL lock is contended. Even with DoM files they may have read-on-open handling and only DoM writes would be blocked by the uncommon rename.



 Comments   
Comment by Gerrit Updater [ 18/Jan/24 ]

"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53725
Subject: LU-17441 mdc: use MDS_IO_PORTAL for rename
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 3f8165456af0842b31c3cf3d5e9683d3836adf89

Generated at Sat Feb 10 03:35:28 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.