Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7607

Preserve inode number after MDT migration

Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.8.0

    Description

      During migration, the MDT FID of the migrated file is changed to reflect the new MDT the inode is stored on. However, it would be possible to keep the user-visible inode constant after migration by storing the original FID into the LMA as a new field. If present, this saved FID could be used to generate the inode number for userspace instead of the current FID so that it doesn't affect user tools such as backups.

      Attachments

        Issue Links

          Activity

            [LU-7607] Preserve inode number after MDT migration

            Ben, there is a already a MIGRT ChangeLog record for inode migration.

            adilger Andreas Dilger added a comment - Ben, there is a already a MIGRT ChangeLog record for inode migration.

            Could we emit the changes out the changelog?  If someone wants to keep track of changes of fid/inode through time, they could have a listener set up to catch/archive them.

            The various calls to find a FID, etc. could simply call up to the userspace service to get historical info.

            bevans Ben Evans (Inactive) added a comment - Could we emit the changes out the changelog?  If someone wants to keep track of changes of fid/inode through time, they could have a listener set up to catch/archive them. The various calls to find a FID, etc. could simply call up to the userspace service to get historical info.
            laisiyao Lai Siyao added a comment -

            IMO the overhead of reserving inode number is quite high, and rather than saving the original FID, I'd prefer to add an option to keep file inode untouched in migration, that is to say, for existing sub files, inode won't be migrated, but namespace updated.

            If we still prefer migrating inode, I'd suggest drop the support of preserving inode number, but add FID mapping to support NFS export: a special OI file will be added, which contains mappings from original FID to new FID, and in lu_object_find() it will lookup this
            OI file, if new FID is found, it's replied client with -EREMOTE, and client will resend the request with new FID to the correct MDT, for some request like rename, client may need to retry several times if more than involved file are migrated. There will be a garbage collect thread on server to remove aged mapping from this OI file.

            laisiyao Lai Siyao added a comment - IMO the overhead of reserving inode number is quite high, and rather than saving the original FID, I'd prefer to add an option to keep file inode untouched in migration, that is to say, for existing sub files, inode won't be migrated, but namespace updated. If we still prefer migrating inode, I'd suggest drop the support of preserving inode number, but add FID mapping to support NFS export: a special OI file will be added, which contains mappings from original FID to new FID, and in lu_object_find() it will lookup this OI file, if new FID is found, it's replied client with -EREMOTE, and client will resend the request with new FID to the correct MDT, for some request like rename, client may need to retry several times if more than involved file are migrated. There will be a garbage collect thread on server to remove aged mapping from this OI file.

            Lai, I recall not too long ago we discussed the ability to save the old FID after migration. Is there anything that needs to be updated in this ticket to describe your proposal?

            adilger Andreas Dilger added a comment - Lai, I recall not too long ago we discussed the ability to save the old FID after migration. Is there anything that needs to be updated in this ticket to describe your proposal?

            I see. Thanks. Hmm for stat() we only need fill the original FID into mdt_body of getattr request, and cache it in ll_inode_info, and fill it to stat->ino, But for readdir(), it will read from the directory entries directly, if we inject LMA original ino checking in this process, it might slow down the readdir a lot.

            di.wang Di Wang (Inactive) added a comment - I see. Thanks. Hmm for stat() we only need fill the original FID into mdt_body of getattr request, and cache it in ll_inode_info, and fill it to stat->ino, But for readdir(), it will read from the directory entries directly, if we inject LMA original ino checking in this process, it might slow down the readdir a lot.

            The point of keeping a consistent inode number is that some tools, such as backups, depend on the inode number to remain the same so they can do incremental backups. Otherwise, they can't tell the difference between migrate changing the inode number, or the file being deleted and a new file created with the same name. NFS servers in userspace would also use the inode number.

            NFS file handles generated in the kernel by Lustre are the same, but since we encode the FID into the Lustre file handle this wouldn't help - we'd need to allow the original FID to be looked up on the original MDT with a redirection to the new FID.

            adilger Andreas Dilger added a comment - The point of keeping a consistent inode number is that some tools, such as backups, depend on the inode number to remain the same so they can do incremental backups. Otherwise, they can't tell the difference between migrate changing the inode number, or the file being deleted and a new file created with the same name. NFS servers in userspace would also use the inode number. NFS file handles generated in the kernel by Lustre are the same, but since we encode the FID into the Lustre file handle this wouldn't help - we'd need to allow the original FID to be looked up on the original MDT with a redirection to the new FID.

            Actually I am a bit confused now. Since migration will keep namespace consistency, so either stat() or readdir() should return the real (correct) ino. I do not know why should we keep the original ino (or FID) after migration? I probably miss sth. Could you please explain the purpose of keeping consistent ino here? why these external tool needs consistency ino? Thanks.

            di.wang Di Wang (Inactive) added a comment - Actually I am a bit confused now. Since migration will keep namespace consistency, so either stat() or readdir() should return the real (correct) ino. I do not know why should we keep the original ino (or FID) after migration? I probably miss sth. Could you please explain the purpose of keeping consistent ino here? why these external tool needs consistency ino? Thanks.

            For the HSM FID problem I think a different solution is needed. Instead of storing the FID in the archive, it is better to store the archive identifier (UUID or whatever) in the Lustre inode as a part of the composite layout. That allows storing multiple versions of the file in the archive, as well as allowing partial HSM file restore with composite files.

            adilger Andreas Dilger added a comment - For the HSM FID problem I think a different solution is needed. Instead of storing the FID in the archive, it is better to store the archive identifier (UUID or whatever) in the Lustre inode as a part of the composite layout. That allows storing multiple versions of the file in the archive, as well as allowing partial HSM file restore with composite files.

            I hadn't thought about keeping the whole FID, just the original inode number for exposure to userspace for tools like tar or other backup tools that depend on the inode number. Since we won't be using this number internally, only for exposure via stat() and readdir() it doesn't matter if it is different than the real FID.

            Do you think there is value to keep the original FID?

            adilger Andreas Dilger added a comment - I hadn't thought about keeping the whole FID, just the original inode number for exposure to userspace for tools like tar or other backup tools that depend on the inode number. Since we won't be using this number internally, only for exposure via stat() and readdir() it doesn't matter if it is different than the real FID. Do you think there is value to keep the original FID?
            di.wang Di Wang (Inactive) added a comment - - edited

            Hmm, as long as we do not use this original FID for internal references, it should be ok. As for backup, are you trying to temporarily resolve LU-6866? i.e. copy tool will always use this original FID as the identifier to locate the file in the archive? then we probably need record this original FID into changelog, or the copy tool needs to be changed to retrieve FID from LMA?

            di.wang Di Wang (Inactive) added a comment - - edited Hmm, as long as we do not use this original FID for internal references, it should be ok. As for backup, are you trying to temporarily resolve LU-6866 ? i.e. copy tool will always use this original FID as the identifier to locate the file in the archive? then we probably need record this original FID into changelog, or the copy tool needs to be changed to retrieve FID from LMA?

            Di, what do you think about this idea? It should be possible to add this as a "compat" feature to the LMA only if the inode is migrated. Since the internal references are all using the FID, the inode number displayed by "ls" and "stat" don't really matter.

            adilger Andreas Dilger added a comment - Di, what do you think about this idea? It should be possible to add this as a "compat" feature to the LMA only if the inode is migrated. Since the internal references are all using the FID, the inode number displayed by "ls" and "stat" don't really matter.

            People

              laisiyao Lai Siyao
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated: