Details
-
Question/Request
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.14.0
-
client:
toss 3.7-14.1
3.10.0-1160.45.1.1chaos.ch6.x86_64
lustre 2.12.7_2.llnl
server:
toss 4.1-5
4.18.0-240.22.1.1toss.t4.x86_64
zfs 2.0.52_2llnl-1
lustre 2.14.0_5.llnl
-
9223372036854775807
Description
lfs-migrate Metadata Performance Testing
While trying to use lfs-migrate for meta-data migration, we found that lfs-migrate perfomance does not scale well with additional processes. Even when using many processes and nodes, sustained performance was around 400 items/second, which is too slow to be practical for migrations of large numbers of files and directories.
This testing plan is for performing additional tests to see if the above results are in fact the limit, or near the limit of lfs-migrate's performance.
Overview
The performance to be measured is the rate at which items (files and directories) can be migrated. These items will be in a tree (or trees) and migrated by many processes running lfs-migrate in parallel.
The 3 basic parts of the test are:
- create the trees
- migrate the trees
- analyze the data generated during the migration
Create the Trees
A single tree can be created using mdtest. mdtest has the ability to make trees of files and directories, and can parameterize those trees in most of the ways necessary for this test.
The major shortcoming with mdtest is that it doesn't set the striping and directory striping of the trees it creates. This can be overcome by pre-creating directories, setting their striping and directory striping, and then having mdtest create trees within these directories so that each tree inherits these setting from its respective parent directory.
The command to create the trees needs to be saved. This includes both the mdtest command per directory, and also the command to make the directories and set their striping and directory striping. Also, mdtest will be run with srun, so the whole srun command needs to be saved because the srun parameters will affect the size/shape of the tree.
Migrate the Trees
The migration is done in parallel by many processes, each running lfs-migrate on one of the directories that contains a tree created by mdtest. The many processes are created and spread across multiple client nodes using srun.
Data needs to be collected during the run. Process 0 will record run-wide data such as total items migrated, and each process will write its own performance data. This will generate 1 file per processes, and 1 more for the run-wide data. Some of the collected data could be inferred from other data (or from the slurm database) but recording it simplifies post-processing.
Data to Collect Per Run
- total items migrated
- total data migrated
- the mdtest command and the striping/dirstriping commands
- slurm jobid
- the srun command that does the migration
Data to Collect Per Process
- start time (of lfs-migrate)
- end time (of lfs-migrate)
- source MDTs
- destination MDTs
- the lfs-migrate command
- lfs getdirstripe output for the root of the tree the process will migrate
Potential Parameters to Vary between Runs
- total number of processes, nodes*ppn
-
- the number of processes per node (2,8,16)
- the number of nodes (1,8,32)
- the kind of items that are migrated (files,directories)
- how many items per process are migrated (1K, 8K, 64K configured with mdtest command)
- file size = 0, fixed
- DoM or not DoM
Initial Runs Planned
Note that the above is still probably a larger parameter space than is necessary to find first-order bottlenecks (3*3*3*3*1*2 == 162 tests). To reduce the amount of tests, and expected total run time, initially only the following tests will be run. More complete testing of the parameter space will be performed as needed after developers are engaged.
- Find the values of nodes and ppn that maximize overall lfs-migrate rate for files only, 8K per process, without DoM (9 tests)
- Using those values for nodes and ppn, test for the above items per process. Record the value of items/process (ipp) that maximizes overall lfs-migrate rate for files only, without DoM (3 tests)
- Using those values for nodes, ppn, and ipp, test with files with DoM and files without DoM (2 tests)
Data Analysis
The data recorded for each run will all go into a single directory, along with the trees(s) creation data. A script will read the meta-data and per-process performance data, and calculate the rate at which items are migrated. The important input parameters and corresponding results for all runs will be output as a csv.
Performance Comparison
For comparison, other performance metrics with the same file system and clients will be gathered:
- mdtest will be run with the same node and ppn combinations and enough objects per process to make each mdtest stage (e.g. create, unlink, etc.) take at least 10 minutes.
Attachments
Issue Links
- is related to
-
LU-14975 DNE3: directory migration in non-recursive mode
- Resolved