Details
-
Improvement
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
Apache Spark primarily writes files using a _temporary staging directory to hold files, and then renames the files to their final location. Since the files in the _temporary directory are renamed out of the directory, they currently always need the BFL lock, and there may be thousands of such renames concurrently (at least until LU-17426 and LU-17434 are implemented to allow such renames without BFL).
However, even after those changes are implemented, there may be many concurrent renames that need the BFL lock (more than the number of MDS_REQUEST_PORTAL service threads), they will block these threads until each is able to get the rename lock, and prevent other MDS_REINT RPCs from being processed.
Since the MDS_IO_PORTAL is often unused (only needed for DoM files, and has existed since 2.11.0, it seems possible to move the rename RPCs to be serviced by the MDS_IO_PORTAL threads to avoid contention on the primary MDS service threads. Also, it will avoid blocking normal file open, setattr, statfs, and other common operations if the BFL lock is contended. Even with DoM files they may have read-on-open handling and only DoM writes would be blocked by the uncommon rename.
Attachments
Issue Links
- is related to
-
LU-14564 Allow number of threads to grow when all existing threads are stuck
- Open
-
LU-17427 reduce hold time for BFL rename lock
- Open
-
LU-17426 parallel cross-directory rename of regular files on single MDT
- Resolved
-
LU-17434 DNE3: add exclude list for remote subdirectory creation
- Resolved