[LU-14211] DNE3: mechanism to interrupt and resume migration Created: 14/Dec/20  Updated: 04/Jul/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Andreas Dilger Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: dne3

Issue Links:
Related
is related to LU-14719 "lfs migrate -m" creates broken agent... Resolved
is related to LU-14975 DNE3: directory migration in non-recu... Resolved
is related to LU-15001 improve recovery of interrupted direc... Open
is related to LU-14212 DNE3: directory migration progress mo... Open
is related to LU-11776 add "lfs find" support for directory ... Resolved
is related to LU-15990 "lfs find" to scan for directory hash... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

It should be possible to cleanly interrupt DNE directory migration (e.g. at the end of the current directory) for a long-running recursive directory migration.

It would make sense to restructure recursive directory migration as a series of single directory migrations now that this is possible (LU-14975). This would provide a number of benefits:

  • allow the migration to be interrupted after the current directory has finished
  • allow statistics printing to be done by "lfs migrate -m" in userspace after each directory finishes
  • avoid very long-running processes on the MDS
  • simplify restart of directory tree migration
  • allow better (automated) per-directory tuning (stripe count 1/N for small/large directories, select different target MDT by space) as each directory is migrated, instead of using the same parameters for all directories in the tree


 Comments   
Comment by Andreas Dilger [ 05/Apr/22 ]

The message "migration was interrupted, run 'lfs migrate -m %d -c %d -H %s ...' to finish migration" should also be removed. If the MDS knows the migration options needed to finish the migration of that directory, then it should ignore what the user asked and finish the migration of that directory as originally started. Then, if the user parameters are incompatible with the new directory layout, an second migration should be done on the directory.

Running two directory migrations is still faster than having the user try to figure out the right "lfs migrate -m" parameters (if they even look at the MDS console log to figure this out), and then run both migrations manually.

Comment by Andreas Dilger [ 04/Jul/22 ]

In addition to commands to stop and resume individual directory migrations, we need to be able to find partially-migrated directories. I've filed LU-15990 to track enhancements to "lfs find" to allow finding directories with migrating directory hash flag. This was implemented via patch https://review.whamcloud.com/39340 "LU-11776 utils: add support lfs find with mdt hash flag".

Generated at Sat Feb 10 03:07:48 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.