Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14211

DNE3: mechanism to interrupt and resume migration

Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • 9223372036854775807

    Description

      It should be possible to cleanly interrupt DNE directory migration (e.g. at the end of the current directory) for a long-running recursive directory migration.

      It would make sense to restructure recursive directory migration as a series of single directory migrations now that this is possible (LU-14975). This would provide a number of benefits:

      • allow the migration to be interrupted after the current directory has finished
      • allow statistics printing to be done by "lfs migrate -m" in userspace after each directory finishes
      • avoid very long-running processes on the MDS
      • simplify restart of directory tree migration
      • allow better (automated) per-directory tuning (stripe count 1/N for small/large directories, select different target MDT by space) as each directory is migrated, instead of using the same parameters for all directories in the tree

      Attachments

        Issue Links

          Activity

            [LU-14211] DNE3: mechanism to interrupt and resume migration
            adilger Andreas Dilger added a comment - - edited

            In addition to commands to stop and resume individual directory migrations, we need to be able to find partially-migrated directories. I've filed LU-15990 to track enhancements to "lfs find" to allow finding directories with migrating directory hash flag. This was implemented via patch https://review.whamcloud.com/39340 "LU-11776 utils: add support lfs find with mdt hash flag".

            adilger Andreas Dilger added a comment - - edited In addition to commands to stop and resume individual directory migrations, we need to be able to find partially-migrated directories. I've filed LU-15990 to track enhancements to " lfs find " to allow finding directories with migrating directory hash flag. This was implemented via patch https://review.whamcloud.com/39340 " LU-11776 utils: add support lfs find with mdt hash flag ".

            The message "migration was interrupted, run 'lfs migrate -m %d -c %d -H %s ...' to finish migration" should also be removed. If the MDS knows the migration options needed to finish the migration of that directory, then it should ignore what the user asked and finish the migration of that directory as originally started. Then, if the user parameters are incompatible with the new directory layout, an second migration should be done on the directory.

            Running two directory migrations is still faster than having the user try to figure out the right "lfs migrate -m" parameters (if they even look at the MDS console log to figure this out), and then run both migrations manually.

            adilger Andreas Dilger added a comment - The message " migration was interrupted, run 'lfs migrate -m %d -c %d -H %s ...' to finish migration " should also be removed. If the MDS knows the migration options needed to finish the migration of that directory, then it should ignore what the user asked and finish the migration of that directory as originally started. Then, if the user parameters are incompatible with the new directory layout, an second migration should be done on the directory. Running two directory migrations is still faster than having the user try to figure out the right " lfs migrate -m " parameters (if they even look at the MDS console log to figure this out), and then run both migrations manually.

            People

              wc-triage WC Triage
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: