Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.6.0
    • Lustre 2.6.0
    • None
    • 3
    • 5755

    Description

      Comments from Andreas:

      This relates to the need to rebalance the inodes between an existing MDT and a newly-added MDT in some manner, preferably without having to move all of the file data just to make namespace changes.
      We discussed something like "lfs mv -i

      Unknown macro: {mdt_idx}

      Unknown macro: {pathname}

      " (same parameters as "lfs mkdir"/"lfs setdirstripe") to move the name and inode over to the specified MDT, but not keeping the same FID for that file. The inode number is not preserved for local filesystems when "mv" is called to move a file across different filesystems. This would only work for non-directories, since the ability to migrate a whole directory is difficult and does not need to be implemented for the first version of this tool.

      Moving a whole directory hierarchy is non-trivial, since we don't want the namespace to be split if the user-space tool is interrupted in some way. One possible way to do this safely without the ability to migrate whole directories is:

      do a breadth-first traversal of the directory tree
      create a duplicate directory hierarchy on the target MDT using llapi_mkdir()
      if the file is small, just copy it to the new directory
      if the file is large hard link it into the new hierarchy (leave remote inode in place)
      delete the old hierarchy (to get rid of the hard links/copies)
      if the inode is remote with 1 hard link, call llapi_mv() to move the file name and inode (still resident on the original MDT) to the new MDT
      

      This is not mandatory for DNE phase I, but nice to have. And this is definitely needed for DNE phase II.

      Attachments

        Issue Links

          Activity

            [LU-2430] Migration tool for DNE

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13161/
            Subject: LU-2430 utils: fix "lfs mv" command parsing
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 5c61e053874caafe75f84c149de933050e9ec660

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13161/ Subject: LU-2430 utils: fix "lfs mv" command parsing Project: fs/lustre-release Branch: master Current Patch Set: Commit: 5c61e053874caafe75f84c149de933050e9ec660

            Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/13161
            Subject: LU-2430 utils: fix "lfs mv" command parsing
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 13440292305d6a8cdce949501f8b5f5eb3357ff6

            gerrit Gerrit Updater added a comment - Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/13161 Subject: LU-2430 utils: fix "lfs mv" command parsing Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 13440292305d6a8cdce949501f8b5f5eb3357ff6

            Patch landed to Master. Please reopen ticket if more work is needed.

            jlevi Jodi Levi (Inactive) added a comment - Patch landed to Master. Please reopen ticket if more work is needed.
            di.wang Di Wang (Inactive) added a comment - http://review.whamcloud.com/#change,6662

            The llapi_mv() functionality could benefit from on LU-2016 (MDD layout swap) and LU-2017 (MDC layout swap). Once layout swap is available, {{llapi_mv()} would be able to mknod() a new file on the new MDT and then swap the layout from the old inode to the new one, and delete the old inode.

            One complexity of using layout swap for moving the node instead of swapping the objects is that "llapi_swap_layouts()" will not preserve the open file handles or layout lock on the old resource/FID. While applications doing IO on the objects would still be able to read/write the original objects, their open file handle and layout lock would now point to an open-unlinked inode, and likely cause the objects to be unlinked when the file descriptor is closed (due to nlink == 0 on the old inode).

            That means the lfs_mv() should only migrate inodes that it knows to be unused. How it does this on the client needs to be figured out, though it presumably would get some kind of error back from the MDS. It isn't possible to just refuse to swap the layout on a file that is open, since that will always be the case for normal layout swap with a temporary open-unlinked file (see LU-2441). Possibly flags could be passed to the layout swap in the case of inode migration to have the MDS return -EBUSY if the "source" inode is in use.

            adilger Andreas Dilger added a comment - The llapi_mv() functionality could benefit from on LU-2016 (MDD layout swap) and LU-2017 (MDC layout swap). Once layout swap is available, {{llapi_mv()} would be able to mknod() a new file on the new MDT and then swap the layout from the old inode to the new one, and delete the old inode. One complexity of using layout swap for moving the node instead of swapping the objects is that "llapi_swap_layouts()" will not preserve the open file handles or layout lock on the old resource/FID. While applications doing IO on the objects would still be able to read/write the original objects, their open file handle and layout lock would now point to an open-unlinked inode, and likely cause the objects to be unlinked when the file descriptor is closed (due to nlink == 0 on the old inode). That means the lfs_mv() should only migrate inodes that it knows to be unused. How it does this on the client needs to be figured out, though it presumably would get some kind of error back from the MDS. It isn't possible to just refuse to swap the layout on a file that is open, since that will always be the case for normal layout swap with a temporary open-unlinked file (see LU-2441 ). Possibly flags could be passed to the layout swap in the case of inode migration to have the MDS return -EBUSY if the "source" inode is in use.

            People

              di.wang Di Wang (Inactive)
              di.wang Di Wang (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: