[LU-16789] "lfs migrate" to use AIO/DIO or io_uring (kernel 5.1+) - Whamcloud Community JIRA

Details

Type: Improvement
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: None
Labels:
- lug23dd
- lug24dd
- medium
- utils

Rank (Obsolete):
9223372036854775807

Description

It would be useful to speed up "lfs migrate", "lfs mirror extend", and "lfs mirror resync" to use asynchronous Direct IO (AIO/DIO via libaio) to do the data copying

This should use the same mechanism as lustre/tests/aiocp.c to have a producer/consumer queue and submits some number of AIO read requests, and then submit the write requests when the reads finish.

Attachments

Issue Links

is related to

LU-18647 libaio read file from Lustre in DIO mode reports errno 22

Closed

LU-17143 Use migrate_copy_data() for all data movement

Open

LU-18455 Make 'lfs migrate' multi-threading

Open

LU-16587 Make lfs migrate faster

Resolved

LU-18454 'lfs migrate' command can read filenames from file/stdin

Resolved

is related to

LU-12649 Tracker for ongoing FLR improvements

Open

LU-18032 disable unaligned AIO

Reopened

(2 is related to )

Activity

[LU-16789] "lfs migrate" to use AIO/DIO or io_uring (kernel 5.1+)

Gerrit Updater added a comment - 22/Nov/24 1:41 AM - edited

Mistake.

Gerrit Updater added a comment - 22/Nov/24 1:41 AM - edited Mistake.

Feng Lei added a comment - 18/Nov/24 6:25 AM

Create ~~LU-18454~~ to enable 'lfs migrate' read file list from stdin/file.

Create LU-18455 to make 'lfs migrate' multi-threading.

Feng Lei added a comment - 18/Nov/24 6:25 AM Create LU-18454 to enable 'lfs migrate' read file list from stdin/file. Create LU-18455 to make 'lfs migrate' multi-threading.

Andreas Dilger added a comment - 12/Oct/24 7:12 PM

flei, I think "-0" should be an option for "lfs migrate" (and "lfs mirror extend") to read NUL-terminated filenames from stdin, and --files-from=FILELIST should read from a list of files, with "FILELIST=-" should indicate stdin (one file per line). This matches the option name from tar and rsync, and allows reading from both an existing file as well as a pipe.

It isn't clear whether there is a need to have many threads if there can be multiple AIO requests active at one time. It would be enough to have two or three threads active at once - one to create one file in advance, and the others to issue AIO read/write requests for the its file until it hits the AIO limit. If enough AIO requests have been submitted to read (and in turn write) the whole file size, then the remaining AIO requests would go to a new file/thread, while the current thread waits for the pending AIO completions and closing the file before taking the next file from the input to process.

That way, only as many threads are started as needed to fill the AIO queue. It makes sense to have at least 2 or 3 threads actively reading/writing to different files (OSTs) so that they do not get blocked waiting for one OST that is slow or has hit max_rpcs_in_flight, but it also doesn't help to have thousands of in-flight IO requests to avoid clogging the network and OSTs. The --threads parameter would be the maximum number of threads started, if that is necessary (eg. for files <= 1MiB in size), but that probably wouldn't be helpful for large files.

Ideally this can be self-balancing in some way, so that only as many AIO requests are submitted on a single file as needed to keep it busy, and the others are used for other files. Similarly, the number of AIO requests are balanced to hit peak performance (or the specified bandwidth limit), and not thousands of requests that are all running slowly.

Andreas Dilger added a comment - 12/Oct/24 7:12 PM flei , I think " -0 " should be an option for " lfs migrate " (and " lfs mirror extend ") to read NUL-terminated filenames from stdin, and --files-from= FILELIST should read from a list of files, with " FILELIST =- " should indicate stdin (one file per line). This matches the option name from tar and rsync , and allows reading from both an existing file as well as a pipe. It isn't clear whether there is a need to have many threads if there can be multiple AIO requests active at one time. It would be enough to have two or three threads active at once - one to create one file in advance, and the others to issue AIO read/write requests for the its file until it hits the AIO limit. If enough AIO requests have been submitted to read (and in turn write) the whole file size, then the remaining AIO requests would go to a new file/thread, while the current thread waits for the pending AIO completions and closing the file before taking the next file from the input to process. That way, only as many threads are started as needed to fill the AIO queue. It makes sense to have at least 2 or 3 threads actively reading/writing to different files (OSTs) so that they do not get blocked waiting for one OST that is slow or has hit max_rpcs_in_flight , but it also doesn't help to have thousands of in-flight IO requests to avoid clogging the network and OSTs. The --threads parameter would be the maximum number of threads started, if that is necessary (eg. for files <= 1MiB in size), but that probably wouldn't be helpful for large files. Ideally this can be self-balancing in some way, so that only as many AIO requests are submitted on a single file as needed to keep it busy, and the others are used for other files. Similarly, the number of AIO requests are balanced to hit peak performance (or the specified bandwidth limit), and not thousands of requests that are all running slowly.

Feng Lei added a comment - 08/Oct/24 4:53 AM - edited

"lfs migrate" command can take multiple filenames and migrate them one by one. So just now such a command "lfs find ... -0 | xargs -0 -P 8 -n 32 lfs migrate ..." can work as expected.

Currently "lfs migrate" will process these 32 files one by one. For each file, multiple aio tasks will be created to copy data then be destroyed.

It's not so hard to start multiple threads for multiple files and multiple aio tasks for each file. But not so easy to share all these aio tasks across files.

What about such a model:

"lfs migrate" read unlimited filenames from stdin one by one (with --stdin or --input-file param)
prepare a thread pool to process files with --threads param

So that we can run such a command: "lfs find ... | lfs migrate --stdin --threads=32 ..."

Feng Lei added a comment - 08/Oct/24 4:53 AM - edited "lfs migrate" command can take multiple filenames and migrate them one by one. So just now such a command "lfs find ... -0 | xargs -0 -P 8 -n 32 lfs migrate ..." can work as expected. Currently "lfs migrate" will process these 32 files one by one. For each file, multiple aio tasks will be created to copy data then be destroyed. It's not so hard to start multiple threads for multiple files and multiple aio tasks for each file. But not so easy to share all these aio tasks across files. What about such a model: "lfs migrate" read unlimited filenames from stdin one by one (with --stdin or --input-file param) prepare a thread pool to process files with --threads param So that we can run such a command: "lfs find ... | lfs migrate --stdin --threads=32 ..."

Andreas Dilger added a comment - 03/Oct/24 3:32 AM

flei, I don't know if you've looked into this at all, but how hard would it be for "lfs migrate" to use AIO to migrate multiple files in parallel (possibly in multiple threads with "--threads")? If "lfs migrate" can take a list of files from the command-line or from stdin (e.g. using a "-0" option to read NUL-separated pathnames from "lfs find -0" or similar tool), would it be possible to keep multiple AIO requests in flight across files to improve small-file migration performance?

Or is it enough to run something like "lfs find ... -0 | xargs -0 -P 8 -N 32 lfs migrate" to generate parallelism across multiple separate executables (8 tasks with 32 files per task). The one drawback I see of using "xargs -P" to generate parallelism is that this may fail badly if one process gets a series of very large files, while other processes get small files. That might be OK when migrating many thousands or millions of files, but if migrating only 32 files there would be no parallelism at all. On the flip side, if using "-N 1" then there is potentially a lot of overhead from fork/exec of lfs to migrate only a single file with a few KB of data.

Andreas Dilger added a comment - 03/Oct/24 3:32 AM flei , I don't know if you've looked into this at all, but how hard would it be for " lfs migrate " to use AIO to migrate multiple files in parallel (possibly in multiple threads with " --threads ")? If " lfs migrate " can take a list of files from the command-line or from stdin (e.g. using a " -0 " option to read NUL-separated pathnames from " lfs find -0 " or similar tool), would it be possible to keep multiple AIO requests in flight across files to improve small-file migration performance? Or is it enough to run something like " lfs find ... -0 | xargs -0 -P 8 -N 32 lfs migrate " to generate parallelism across multiple separate executables (8 tasks with 32 files per task). The one drawback I see of using " xargs -P " to generate parallelism is that this may fail badly if one process gets a series of very large files, while other processes get small files. That might be OK when migrating many thousands or millions of files, but if migrating only 32 files there would be no parallelism at all. On the flip side, if using " -N 1 " then there is potentially a lot of overhead from fork/exec of lfs to migrate only a single file with a few KB of data.

Gerrit Updater added a comment - 13/Aug/24 3:55 AM

"Feng Lei <flei@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56016
Subject: LU-16789 utils: improve performance of 'lfs migrate'
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: bfe5b988d78c2fb8069dd805bb11fac9e042819e

Gerrit Updater added a comment - 13/Aug/24 3:55 AM "Feng Lei <flei@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56016 Subject: LU-16789 utils: improve performance of 'lfs migrate' Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: bfe5b988d78c2fb8069dd805bb11fac9e042819e

Feng Lei added a comment - 12/Aug/24 6:10 AM

Will only repalce migrate_copy_data() with an AIO version of migrate_copy_data_aio() in this ticket. For "lfs_mirror_resync_file", the work will be moved to LU-17143.

Feng Lei added a comment - 12/Aug/24 6:10 AM Will only repalce migrate_copy_data() with an AIO version of migrate_copy_data_aio() in this ticket. For "lfs_mirror_resync_file" , the work will be moved to LU-17143 .

Andreas Dilger added a comment - 08/Aug/24 4:21 PM

I think it should be possible to have two fd's open on the file pointing to different mirrors. One requirement of this mirror resync code is that the IO be done with DIO, since it cannot have cached pages from different mirrors on the same file at the same time.

Andreas Dilger added a comment - 08/Aug/24 4:21 PM I think it should be possible to have two fd's open on the file pointing to different mirrors. One requirement of this mirror resync code is that the IO be done with DIO, since it cannot have cached pages from different mirrors on the same file at the same time.

Feng Lei added a comment - 08/Aug/24 12:57 AM

"lfs_mirror_resync_file" function is a little different. It has only one fd. When it needs to read file, it sets the mirror id to the src replica; when it needs to write file, it sets the mirror id to the dst replica.

So my questions is:

Is it possible to open 2 fds, one is always set to the src replica and the other is always set to the dst replia.
If not, how do I optimize this function?

Feng Lei added a comment - 08/Aug/24 12:57 AM "lfs_mirror_resync_file" function is a little different. It has only one fd. When it needs to read file, it sets the mirror id to the src replica; when it needs to write file, it sets the mirror id to the dst replica. So my questions is: Is it possible to open 2 fds, one is always set to the src replica and the other is always set to the dst replia. If not, how do I optimize this function?

Andreas Dilger added a comment - 07/Aug/24 5:39 PM

Instead of using AIO it would also be possible to use io_uring for newer kernels (5.1+) . Partly this is a performance improvement, but it also has the benefit of regularly testing the async DIO path in the kernel instead of only using them only in a few sanity tests. So my strong preference would be to use libaio or io_uring rather than implementing multithreading in userspace.

Andreas Dilger added a comment - 07/Aug/24 5:39 PM Instead of using AIO it would also be possible to use io_uring for newer kernels (5.1+) . Partly this is a performance improvement, but it also has the benefit of regularly testing the async DIO path in the kernel instead of only using them only in a few sanity tests. So my strong preference would be to use libaio or io_uring rather than implementing multithreading in userspace.

People

Assignee:: Feng Lei

Reporter:: Andreas Dilger

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 01/May/23 5:49 PM

Updated:: 07/Feb/25 12:44 AM