Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13076

DNE3: lfs migrate -m should allow -1 as the target index

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.15.0
    • None
    • 9223372036854775807

    Description

      When migrating directory trees to new MDTs for space balancing, it would be very convenient to allow specifying "lfs migrate -m -1" to have "lfs" pick the MDT with the most free inodes as the target. This is similar to "lfs setdirstripe -i -1" functionality, but for migration. If multiple directories are specified on the command line, it probably makes sense to refresh the statfs information after each directory tree is migrated, in case there is a new MDT that has more free space. This will greatly simplify directory migration on an existing filesystem without the user having to specify the details.

      If this is already implemented (it didn't look like parse_targets() would handle 1 as an argument), then the lfs-migrate.1 man page needs to be updated with this information under the description of -mdt-index.

      I suspect this would be relatively easily implemented in Lustre 2.12, because the mechanism to select the best MDT is already present in lfs.

      Attachments

        Issue Links

          Activity

            [LU-13076] DNE3: lfs migrate -m should allow -1 as the target index
            pjones Peter Jones added a comment -

            Landed for 2.15

            pjones Peter Jones added a comment - Landed for 2.15

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44886/
            Subject: LU-13076 dne: dir migrate in QOS mode
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 378c7567876b430d06031f7d380112b9bdb15166

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44886/ Subject: LU-13076 dne: dir migrate in QOS mode Project: fs/lustre-release Branch: master Current Patch Set: Commit: 378c7567876b430d06031f7d380112b9bdb15166

            "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/44886
            Subject: LU-13076 dne: dir migrate in QOS mode
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: f4ee9c0f45d5031942f1238e7af212a4cee9f912

            gerrit Gerrit Updater added a comment - "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/44886 Subject: LU-13076 dne: dir migrate in QOS mode Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: f4ee9c0f45d5031942f1238e7af212a4cee9f912

            We currently have a customer with 24 MDTs, they used -c 24 for all their directories and now wish to re-stripe with -c 1. Given the customer lacks experience, I don't want them to have to manually choose a new target with lfs migrate -m - it would be very good to have -m -1, we need to have a hands-free restriping here.

            cwhite_ddn Cliff White (Inactive) added a comment - We currently have a customer with 24 MDTs, they used -c 24 for all their directories and now wish to re-stripe with -c 1. Given the customer lacks experience, I don't want them to have to manually choose a new target with lfs migrate -m - it would be very good to have -m -1, we need to have a hands-free restriping here.

            You are correct that I meant "setdirstripe -c". It is fine if it "checks" whether the directory exists by trying to create it, then if this fails with -EEXIST or -EISDIR it can try to change the stripe count with a second RPC. I don't think this would impact the performance of "setdirstripe -c", or at least not in any critical way because there will likely be many more RPCs to migrate the entries.

            I also agree that a user space policy engine might be able to do a more optimal job than the MDS, but it should at least be possible to a basic job of directory migration without a policy engine.

            adilger Andreas Dilger added a comment - You are correct that I meant " setdirstripe -c ". It is fine if it "checks" whether the directory exists by trying to create it, then if this fails with -EEXIST or -EISDIR it can try to change the stripe count with a second RPC. I don't think this would impact the performance of " setdirstripe -c ", or at least not in any critical way because there will likely be many more RPCs to migrate the entries. I also agree that a user space policy engine might be able to do a more optimal job than the MDS, but it should at least be possible to a basic job of directory migration without a policy engine.
            laisiyao Lai Siyao added a comment -

            Besides, to achieve best performance after migration, IMO it should be done in user space by policy engine, which is more flexible and easier to customize.

            laisiyao Lai Siyao added a comment - Besides, to achieve best performance after migration, IMO it should be done in user space by policy engine, which is more flexible and easier to customize.
            laisiyao Lai Siyao added a comment -

            The problem of using "lfs setstripe -c <dir>" (or should be "lfs setdirstripe -c <dir>"?) is this command is currently used to create new striped directories, if it needs to support dir split, it needs to verify whether target directory exists, which will downgrade striped directory creation performance. Do you think it's acceptable?

            laisiyao Lai Siyao added a comment - The problem of using "lfs setstripe -c <dir>" (or should be "lfs setdirstripe -c <dir>"?) is this command is currently used to create new striped directories, if it needs to support dir split, it needs to verify whether target directory exists, which will downgrade striped directory creation performance. Do you think it's acceptable?

            Ah, I think I understand where my confusion is. Am I correct that you are separating the use of "lfs migrate -m N" to mean "migrate parent directory with all inodes to new MDT 'N'" and "lfs migrate -m N -c -1" to mean the new "split parent directory but leave inodes in place" code that you are working on? I think that "migrate" should generally mean "move inodes to new MDT", while "MDT auto split" should not move existing inodes since that will change inode numbers/locks and could cause problems. I think it is useful to have the meaning of "lfs migrate -m N -c C" for directories be similar to "lfs migrate -i N -c C" for regular files, where "N = -1" means "pick the target index for me" and "C = -1" means "stripe over all targets".

            Being able to specify "split directory now" is important for testing, but maybe a different command like "lfs setstripe -c <dir>" on the existing directory is the right command for splitting the directory to have a different number of stripes? For stripe_count=1 directories it is possible to add any kind of hash function to the existing directory, so "crush" would be preferred, and then setting the number of stripes would just migrate the names without affecting the inodes?

            I also agree that a simple implementation of "lfs migrate -m -1" might not pick the best MDTs, but users (other than you and me) are even less likely to pick the best MDTs. That means that the implementation of "-m -1" needs to be smart enough that it picks good MDTs when possible.

            It should be possible to disable MDTs for new directory creation (similar to "lfs set_param osp.$fsname-OST0000.max_create_count=0" for OSTs) so that the MDS does not put new directories on that MDT. That will allow an MDT to be emptied out with a command like "lfs find -type d -m N | lfs migrate -m -1" without the user having to specify the target for every directory.

            Also, when the user is doing MDT balancing, the use of "lfs migrate -m -1" needs to move some directories and inodes to the empty MDTs instead of just splitting existing directories onto new MDTs and not moving any inodes, otherwise the full MDT will not be any less full. Maybe this can be done by weighting the current MDT higher than other MDTs but not keep all directories/inodes on the original MDT (e.g. using qos_mdt_prio_free)?

            adilger Andreas Dilger added a comment - Ah, I think I understand where my confusion is. Am I correct that you are separating the use of " lfs migrate -m N " to mean "migrate parent directory with all inodes to new MDT 'N'" and " lfs migrate -m N -c -1 " to mean the new "split parent directory but leave inodes in place" code that you are working on? I think that "migrate" should generally mean "move inodes to new MDT", while "MDT auto split" should not move existing inodes since that will change inode numbers/locks and could cause problems. I think it is useful to have the meaning of " lfs migrate -m N -c C " for directories be similar to " lfs migrate -i N -c C " for regular files, where " N = -1 " means "pick the target index for me" and " C = -1 " means "stripe over all targets". Being able to specify "split directory now" is important for testing, but maybe a different command like " lfs setstripe -c <dir> " on the existing directory is the right command for splitting the directory to have a different number of stripes? For stripe_count=1 directories it is possible to add any kind of hash function to the existing directory, so " crush " would be preferred, and then setting the number of stripes would just migrate the names without affecting the inodes? I also agree that a simple implementation of " lfs migrate -m -1 " might not pick the best MDTs, but users (other than you and me) are even less likely to pick the best MDTs. That means that the implementation of " -m -1 " needs to be smart enough that it picks good MDTs when possible. It should be possible to disable MDTs for new directory creation (similar to " lfs set_param osp.$fsname-OST0000.max_create_count=0 " for OSTs) so that the MDS does not put new directories on that MDT. That will allow an MDT to be emptied out with a command like " lfs find -type d -m N | lfs migrate -m -1 " without the user having to specify the target for every directory. Also, when the user is doing MDT balancing, the use of " lfs migrate -m -1 " needs to move some directories and inodes to the empty MDTs instead of just splitting existing directories onto new MDTs and not moving any inodes, otherwise the full MDT will not be any less full. Maybe this can be done by weighting the current MDT higher than other MDTs but not keep all directories/inodes on the original MDT (e.g. using qos_mdt_prio_free )?
            laisiyao Lai Siyao added a comment -

            "lfs migrate -m N -c -1" doesn't look meaningful to me: migrate directory to MDT N and with stripe count -1? Maybe we can add a new command "lfs restripe -c -H <dir>" for directory split/merge.

            BTW let system choose target MDTs may not be optimal in some cases:

            • case 1: dir1 is fairly big, and it's a plain directory on MDT0, now system decides to migrate it to MDT1 and MDT2, though currently they have more free space than MDT0, after migration, they may become more full, or then don't have enough space to finish migration.
            • case 2: dir1 is located on MDT0, and system decides to migrate to MDT1 and MDT2, this will cause all sub files/dirs moved, in contrast, if user manually migrate it to MDT0 and MDT2, only half sub files/dirs needs to be moved.
            laisiyao Lai Siyao added a comment - "lfs migrate -m N -c -1" doesn't look meaningful to me: migrate directory to MDT N and with stripe count -1? Maybe we can add a new command "lfs restripe -c -H <dir>" for directory split/merge. BTW let system choose target MDTs may not be optimal in some cases: case 1: dir1 is fairly big, and it's a plain directory on MDT0, now system decides to migrate it to MDT1 and MDT2, though currently they have more free space than MDT0, after migration, they may become more full, or then don't have enough space to finish migration. case 2: dir1 is located on MDT0, and system decides to migrate to MDT1 and MDT2, this will cause all sub files/dirs moved, in contrast, if user manually migrate it to MDT0 and MDT2, only half sub files/dirs needs to be moved.

            Doesn't it make more sense to have "lfs migrate -m N -c -1" do directly split, or similar? The "-m is used as the DNE stripe index, not the stripe count.

            adilger Andreas Dilger added a comment - Doesn't it make more sense to have " lfs migrate -m N -c -1 " do directly split, or similar? The " -m is used as the DNE stripe index, not the stripe count.
            laisiyao Lai Siyao added a comment -

            'lfs migrate -m -1' is not allowed in 2.12, and in the implementation of LU-11025, 'lfs migrate -m -1' will do directory split/merge instead of migration, though for directory split it will choose the MDTs that have more free space.

            laisiyao Lai Siyao added a comment - 'lfs migrate -m -1' is not allowed in 2.12, and in the implementation of LU-11025 , 'lfs migrate -m -1' will do directory split/merge instead of migration, though for directory split it will choose the MDTs that have more free space.

            People

              laisiyao Lai Siyao
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: