Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7607

Preserve inode number after MDT migration

Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.8.0

    Description

      During migration, the MDT FID of the migrated file is changed to reflect the new MDT the inode is stored on. However, it would be possible to keep the user-visible inode constant after migration by storing the original FID into the LMA as a new field. If present, this saved FID could be used to generate the inode number for userspace instead of the current FID so that it doesn't affect user tools such as backups.

      Attachments

        Issue Links

          Activity

            [LU-7607] Preserve inode number after MDT migration

            Add in links to existing patches under this ticket, not sure why they weren't previously created.

            adilger Andreas Dilger added a comment - Add in links to existing patches under this ticket, not sure why they weren't previously created.

            "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch https://review.whamcloud.com/38285
            Subject: LU-7607 mdd: add fidmap reclaim thread
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 4
            Commit: 3068abb9c2e5f85eb76f9e94c7469a761af5f86a

            adilger Andreas Dilger added a comment - "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch https://review.whamcloud.com/38285 Subject: LU-7607 mdd: add fidmap reclaim thread Project: fs/lustre-release Branch: master Current Patch Set: 4 Commit: 3068abb9c2e5f85eb76f9e94c7469a761af5f86a

            "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch https://review.whamcloud.com/38233
            Subject: LU-7607 dne: support FID map
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 7
            Commit: 0f7af1befc6de458acd84b817c535938f839c808

            adilger Andreas Dilger added a comment - "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch https://review.whamcloud.com/38233 Subject: LU-7607 dne: support FID map Project: fs/lustre-release Branch: master Current Patch Set: 7 Commit: 0f7af1befc6de458acd84b817c535938f839c808
            adilger Andreas Dilger added a comment - - edited

            "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch https://review.whamcloud.com/38135
            Subject: LU-7607 dne: add FID map interfaces
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 9
            Commit: 2119206d30f6d84ac1974d9c7d3b24bc25eee1e4

            adilger Andreas Dilger added a comment - - edited "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch https://review.whamcloud.com/38135 Subject: LU-7607 dne: add FID map interfaces Project: fs/lustre-release Branch: master Current Patch Set: 9 Commit: 2119206d30f6d84ac1974d9c7d3b24bc25eee1e4

            I was looking at whether we could use the FID stored in the LOV EA to preserve the "original" inode number of the file. It seems that the FID is stored in the LOV EA in each component and is also preserved over OST and MDT migration:

            tests# lfs setstripe -E 1M -L mdt -E 1G -c 3 -E eof /mnt/testfs/dir1/tt
            tests# dd if=/dev/zero of=/mnt/testfs/dir1/tt bs=1M count=2
            tests# lfs path2fid /mnt/testfs/dir1/tt
            [0x200001b72:0x10c07:0x0]
            tests# lfs getstripe -v /mnt/testfs/dir1/tt
            components:
              - lcme_id:             1
                lcme_extent.e_start: 0
                lcme_extent.e_end:   1048576
                sub_layout:
                  lmm_seq:           0x200001b72
                  lmm_object_id:     0x10c07
                  lmm_fid:           [0x200001b72:0x10c07:0x0]
                  lmm_stripe_count:  1
              - lcme_id:             2
                lcme_extent.e_start: 1048576
                lcme_extent.e_end:   1073741824
                sub_layout:
                  lmm_seq:           0x200001b72
                  lmm_object_id:     0x10c07
                  lmm_fid:           [0x200001b72:0x10c07:0x0]
              - lcme_id:             3
                lcme_extent.e_start: 1073741824
                lcme_extent.e_end:   EOF
                sub_layout:
                  lmm_seq:           0x200001b72
                  lmm_object_id:     0x10c07
                  lmm_fid:           [0x200001b72:0x10c07:0x0]
            tests# lfs migrate -c 3 /mnt/testfs/dir1/tt
            tests# lfs getstripe -v /mnt/testfs/dir1/tt
            lmm_seq:           0x200001b72
            lmm_object_id:     0x10c07
            lmm_fid:           [0x200001b72:0x10c07:0x0]
            lmm_stripe_count:  3
            tests# lfs migrate -m1 /mnt/testfs/dir1
            tests# lfs path2fid /mnt/testfs/dir1/tt
            [0x240001b70:0x2743:0x0]
            tests# lfs getstripe -v /mnt/testfs/dir1/tt
            lmm_seq:           0x200001b72
            lmm_object_id:     0x10c07
            lmm_fid:           [0x200001b72:0x10c07:0x0]
            lmm_stripe_count:  3
            

            There is a bug (LU-13426) if there is a DOM component in the layout that clobbers the FID, but DOM migration is relatively new and can be fixed to preserve the FID properly.

            adilger Andreas Dilger added a comment - I was looking at whether we could use the FID stored in the LOV EA to preserve the "original" inode number of the file. It seems that the FID is stored in the LOV EA in each component and is also preserved over OST and MDT migration: tests# lfs setstripe -E 1M -L mdt -E 1G -c 3 -E eof /mnt/testfs/dir1/tt tests# dd if=/dev/zero of=/mnt/testfs/dir1/tt bs=1M count=2 tests# lfs path2fid /mnt/testfs/dir1/tt [0x200001b72:0x10c07:0x0] tests# lfs getstripe -v /mnt/testfs/dir1/tt components: - lcme_id: 1 lcme_extent.e_start: 0 lcme_extent.e_end: 1048576 sub_layout: lmm_seq: 0x200001b72 lmm_object_id: 0x10c07 lmm_fid: [0x200001b72:0x10c07:0x0] lmm_stripe_count: 1 - lcme_id: 2 lcme_extent.e_start: 1048576 lcme_extent.e_end: 1073741824 sub_layout: lmm_seq: 0x200001b72 lmm_object_id: 0x10c07 lmm_fid: [0x200001b72:0x10c07:0x0] - lcme_id: 3 lcme_extent.e_start: 1073741824 lcme_extent.e_end: EOF sub_layout: lmm_seq: 0x200001b72 lmm_object_id: 0x10c07 lmm_fid: [0x200001b72:0x10c07:0x0] tests# lfs migrate -c 3 /mnt/testfs/dir1/tt tests# lfs getstripe -v /mnt/testfs/dir1/tt lmm_seq: 0x200001b72 lmm_object_id: 0x10c07 lmm_fid: [0x200001b72:0x10c07:0x0] lmm_stripe_count: 3 tests# lfs migrate -m1 /mnt/testfs/dir1 tests# lfs path2fid /mnt/testfs/dir1/tt [0x240001b70:0x2743:0x0] tests# lfs getstripe -v /mnt/testfs/dir1/tt lmm_seq: 0x200001b72 lmm_object_id: 0x10c07 lmm_fid: [0x200001b72:0x10c07:0x0] lmm_stripe_count: 3 There is a bug ( LU-13426 ) if there is a DOM component in the layout that clobbers the FID, but DOM migration is relatively new and can be fixed to preserve the FID properly.
            laisiyao Lai Siyao added a comment -

            Andreas, okay.

            laisiyao Lai Siyao added a comment - Andreas, okay.

            Lai, it definitely makes sense to have an option to migrate the parent directory and filenames without migrating the file inodes. In that case there is no need for this feature to preserve the inode numbers, since they won't change.

            This is only needed in the case where the inode is moved to a new MDT, which can be needed in case of removing an MDT, or if an MDT is very full, not in the normal space balancing case.

            I think storing the original FID in the inode is not too hard, and it will always be unique. Adding the old and new FID in the OI table is also useful. We can't return -EREMOTE to NFS clients, but it can be handled by the Lustre client so that it doesn't return -ESTALE to NFS.

            adilger Andreas Dilger added a comment - Lai, it definitely makes sense to have an option to migrate the parent directory and filenames without migrating the file inodes. In that case there is no need for this feature to preserve the inode numbers, since they won't change. This is only needed in the case where the inode is moved to a new MDT, which can be needed in case of removing an MDT, or if an MDT is very full, not in the normal space balancing case. I think storing the original FID in the inode is not too hard, and it will always be unique. Adding the old and new FID in the OI table is also useful. We can't return -EREMOTE to NFS clients, but it can be handled by the Lustre client so that it doesn't return -ESTALE to NFS.

            Ben, there is a already a MIGRT ChangeLog record for inode migration.

            adilger Andreas Dilger added a comment - Ben, there is a already a MIGRT ChangeLog record for inode migration.

            Could we emit the changes out the changelog?  If someone wants to keep track of changes of fid/inode through time, they could have a listener set up to catch/archive them.

            The various calls to find a FID, etc. could simply call up to the userspace service to get historical info.

            bevans Ben Evans (Inactive) added a comment - Could we emit the changes out the changelog?  If someone wants to keep track of changes of fid/inode through time, they could have a listener set up to catch/archive them. The various calls to find a FID, etc. could simply call up to the userspace service to get historical info.
            laisiyao Lai Siyao added a comment -

            IMO the overhead of reserving inode number is quite high, and rather than saving the original FID, I'd prefer to add an option to keep file inode untouched in migration, that is to say, for existing sub files, inode won't be migrated, but namespace updated.

            If we still prefer migrating inode, I'd suggest drop the support of preserving inode number, but add FID mapping to support NFS export: a special OI file will be added, which contains mappings from original FID to new FID, and in lu_object_find() it will lookup this
            OI file, if new FID is found, it's replied client with -EREMOTE, and client will resend the request with new FID to the correct MDT, for some request like rename, client may need to retry several times if more than involved file are migrated. There will be a garbage collect thread on server to remove aged mapping from this OI file.

            laisiyao Lai Siyao added a comment - IMO the overhead of reserving inode number is quite high, and rather than saving the original FID, I'd prefer to add an option to keep file inode untouched in migration, that is to say, for existing sub files, inode won't be migrated, but namespace updated. If we still prefer migrating inode, I'd suggest drop the support of preserving inode number, but add FID mapping to support NFS export: a special OI file will be added, which contains mappings from original FID to new FID, and in lu_object_find() it will lookup this OI file, if new FID is found, it's replied client with -EREMOTE, and client will resend the request with new FID to the correct MDT, for some request like rename, client may need to retry several times if more than involved file are migrated. There will be a garbage collect thread on server to remove aged mapping from this OI file.

            People

              laisiyao Lai Siyao
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated: