[LU-15211] lfs migrate metadata performance test plan Created: 11/Nov/21 Updated: 29/Aug/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.14.0 |
| Fix Version/s: | None |
| Type: | Question/Request | Priority: | Minor |
| Reporter: | Gian-Carlo Defazio | Assignee: | Andreas Dilger |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | llnl | ||
| Environment: |
client: server: |
||
| Issue Links: |
|
||||||||
| Epic/Theme: | Performance | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
| Comments |
| Comment by Olaf Faaland [ 11/Nov/21 ] |
|
Peter & Co., we would like your feedback on this test plan. Once we arrive at a test plan you agree with, Gian will perform actual tests, compile the rates, and create a bug type issue to find and fix the bottlenecks. He can help work on the investigation and fixes, but he doesn't have the knowledge to be the main person working the issue. |
| Comment by Peter Jones [ 12/Nov/21 ] |
|
Andreas Could you please advise? Thanks Peter |
| Comment by Andreas Dilger [ 12/Nov/21 ] |
|
Hi Olaf, Gian-Carlo, Secondly, what is the goal of the MDT migration? Is that for manual MDT space balancing, or is it for replacement of the underlying MDT storage hardware, or some other reason? Definitely, the series of MDT space balancing changes in That isn't to say we shouldn't be looking at improving the migration performance itself, but understanding what the goals are would help shape where optimizations should be done, and also what parameters should be measured during the testing. I also have the feeling that a significant part of the performance limitation that you are seeing may relate to ZFS transaction commit performance, because the migrate process is very transaction intensive in order to ensure it is atomic and recoverable in the face of an MDS crash. Assuming we are discussing "lfs migrate -m" performance here, then it is also important to determine how this is being called. Currently, it is only possible to do recursive (whole-tree) directory migration, and this is handled internally on the MDS, so it may be that trying to migrate a directory tree is inadvertently doing multiple migrations and hurting performance? Before we go extensively into testing directory migration performance, we should also look at |
| Comment by Gian-Carlo Defazio [ 12/Nov/21 ] |
|
Hi Andreas, Yes, this ticket is specific for inode/directory migration and uses the "lfs migrate -m" command. I was referring to "lfs migrate" as "lfs-migrate". The shell script for object/data migration has an underscore ("lfs_migrate"), but I see why that could be confusing. This issue came up when we were exploring ways to do a full file system migration to new hardware. It was to be part of a process that involves moving the data from the old to new hardware with "zfs send/receive", which was to be used because our tests showed that it's very fast. However, once the data is on the new hardware there's more to do, and one of those steps involves moving meta/object data around within the new hardware. We initially considered "lfs migrate" for this, but it seemed slow. The other utility we considered is "dsync", but that had an "xattr" issue, and I see that you've reviewed Olaf's patch for that. The goal is ultimately for both MDT and OST migrations. The purpose of these migrations is potentially as part of the plan I mentioned above, although I don't think we'll be using "lfs migrate" for the migrations we're doing in the near term, so really it's to see if "lfs migrate" is a viable option in the more distant future. It's also for the hypothetical cases of balancing and evacuating hardware, but I see you've said there are likely better ways to deal with (or prevent) those problems. As for how this is being called: trees are being made specifically for the test, and we are intentionally migrating the whole tree, and not expecting to just migrate the files at depth=1 as proposed in "DNE3: directory migration in non-recursive mode". The individual "lfs migrate" calls are on non-overlapping trees. As for your comment "inadvertently doing multiple migrations and hurting performance", we are intentionally doing multiple migrations in the hopes that it will help performance, so it seems we might be confused about what helps vs hurts performance. One of the major questions I have about the whole process is how the data moves. Does it use the client nodes as intermediaries, or is the migration mostly happening just between the MDSs? My attempts to increase parallelism have been to use more clients with more processes per client. |
| Comment by Andreas Dilger [ 12/Nov/21 ] |
|
For directory/inode migration, this is mostly done on the MDS, and is only triggered by the client, because the whole operation has to be handled within a filesystem transaction on the MDT, so having the client involved would not improve things. I think that using 1-level migrations may improve parallelism, but the MDS may also throttle the amount of work that is being done to avoid consuming a large number of MDS service threads, since this can take a long time. There are almost certainly improvements to be had in this operation, since it has not been a focus for improvement in the past. |
| Comment by Olaf Faaland [ 11/Jul/22 ] |
|
Improved lfs migrate performance would be very useful for us, but we have worked out migration methods that are performant enough based on dsync(1) from mpifileutils. Removing topllnl. |