[LU-7607] Preserve inode number after MDT migration - Whamcloud Community JIRA

Details

Type: Improvement
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: Lustre 2.8.0
Labels:
- LMR
- dne3
- usability

Rank (Obsolete):
9223372036854775807
Epic Link:
MDT rebalance v3

Description

During migration, the MDT FID of the migrated file is changed to reflect the new MDT the inode is stored on. However, it would be possible to keep the user-visible inode constant after migration by storing the original FID into the LMA as a new field. If present, this saved FID could be used to generate the inode number for userspace instead of the current FID so that it doesn't affect user tools such as backups.

Attachments

Issue Links

is related to

LU-7749 DNE3: migrated orphan survive till next reboot

Open

LU-14465 unlink fails when if the metadata migration is running behind

Open

LU-11753 MDS BUG on lfs migrate [osd_it_ea_rec]

Resolved

LU-11306 Moving files from one MDT to another does not free inodes on source MDT

Resolved

LU-11025 DNE3: directory restripe

Resolved

LU-17820 LMR2a: Replicate ROOT/ Directory

Open

LU-16024 Allow permanently removing an MDT from config

Open

is related to

LU-6866 MDT file migration is incompatible with HSM

Resolved

LU-2430 Migration tool for DNE

Resolved

LU-13426 "lfs migrate" on DoM component clobbers LOV EA FID

Resolved

LU-14975 DNE3: directory migration in non-recursive mode

Resolved

(2 is related to, 4 is related to )

Activity

[LU-7607] Preserve inode number after MDT migration

Andreas Dilger added a comment - 05/Dec/19 4:53 PM

Ben, there is a already a MIGRT ChangeLog record for inode migration.

Andreas Dilger added a comment - 05/Dec/19 4:53 PM Ben, there is a already a MIGRT ChangeLog record for inode migration.

Ben Evans (Inactive) added a comment - 05/Dec/19 4:37 PM

Could we emit the changes out the changelog? If someone wants to keep track of changes of fid/inode through time, they could have a listener set up to catch/archive them.

The various calls to find a FID, etc. could simply call up to the userspace service to get historical info.

Ben Evans (Inactive) added a comment - 05/Dec/19 4:37 PM Could we emit the changes out the changelog? If someone wants to keep track of changes of fid/inode through time, they could have a listener set up to catch/archive them. The various calls to find a FID, etc. could simply call up to the userspace service to get historical info.

Lai Siyao added a comment - 05/Dec/19 8:58 AM

IMO the overhead of reserving inode number is quite high, and rather than saving the original FID, I'd prefer to add an option to keep file inode untouched in migration, that is to say, for existing sub files, inode won't be migrated, but namespace updated.

If we still prefer migrating inode, I'd suggest drop the support of preserving inode number, but add FID mapping to support NFS export: a special OI file will be added, which contains mappings from original FID to new FID, and in lu_object_find() it will lookup this
OI file, if new FID is found, it's replied client with -EREMOTE, and client will resend the request with new FID to the correct MDT, for some request like rename, client may need to retry several times if more than involved file are migrated. There will be a garbage collect thread on server to remove aged mapping from this OI file.

Lai Siyao added a comment - 05/Dec/19 8:58 AM IMO the overhead of reserving inode number is quite high, and rather than saving the original FID, I'd prefer to add an option to keep file inode untouched in migration, that is to say, for existing sub files, inode won't be migrated, but namespace updated. If we still prefer migrating inode, I'd suggest drop the support of preserving inode number, but add FID mapping to support NFS export: a special OI file will be added, which contains mappings from original FID to new FID, and in lu_object_find() it will lookup this OI file, if new FID is found, it's replied client with -EREMOTE, and client will resend the request with new FID to the correct MDT, for some request like rename, client may need to retry several times if more than involved file are migrated. There will be a garbage collect thread on server to remove aged mapping from this OI file.

Andreas Dilger added a comment - 04/Dec/19 3:37 PM

Lai, I recall not too long ago we discussed the ability to save the old FID after migration. Is there anything that needs to be updated in this ticket to describe your proposal?

Andreas Dilger added a comment - 04/Dec/19 3:37 PM Lai, I recall not too long ago we discussed the ability to save the old FID after migration. Is there anything that needs to be updated in this ticket to describe your proposal?

Di Wang (Inactive) added a comment - 05/Jan/16 6:33 PM

I see. Thanks. Hmm for stat() we only need fill the original FID into mdt_body of getattr request, and cache it in ll_inode_info, and fill it to stat->ino, But for readdir(), it will read from the directory entries directly, if we inject LMA original ino checking in this process, it might slow down the readdir a lot.

Di Wang (Inactive) added a comment - 05/Jan/16 6:33 PM I see. Thanks. Hmm for stat() we only need fill the original FID into mdt_body of getattr request, and cache it in ll_inode_info, and fill it to stat->ino, But for readdir(), it will read from the directory entries directly, if we inject LMA original ino checking in this process, it might slow down the readdir a lot.

Andreas Dilger added a comment - 05/Jan/16 10:56 AM

The point of keeping a consistent inode number is that some tools, such as backups, depend on the inode number to remain the same so they can do incremental backups. Otherwise, they can't tell the difference between migrate changing the inode number, or the file being deleted and a new file created with the same name. NFS servers in userspace would also use the inode number.

NFS file handles generated in the kernel by Lustre are the same, but since we encode the FID into the Lustre file handle this wouldn't help - we'd need to allow the original FID to be looked up on the original MDT with a redirection to the new FID.

Andreas Dilger added a comment - 05/Jan/16 10:56 AM The point of keeping a consistent inode number is that some tools, such as backups, depend on the inode number to remain the same so they can do incremental backups. Otherwise, they can't tell the difference between migrate changing the inode number, or the file being deleted and a new file created with the same name. NFS servers in userspace would also use the inode number. NFS file handles generated in the kernel by Lustre are the same, but since we encode the FID into the Lustre file handle this wouldn't help - we'd need to allow the original FID to be looked up on the original MDT with a redirection to the new FID.

Di Wang (Inactive) added a comment - 05/Jan/16 6:34 AM

Actually I am a bit confused now. Since migration will keep namespace consistency, so either stat() or readdir() should return the real (correct) ino. I do not know why should we keep the original ino (or FID) after migration? I probably miss sth. Could you please explain the purpose of keeping consistent ino here? why these external tool needs consistency ino? Thanks.

Di Wang (Inactive) added a comment - 05/Jan/16 6:34 AM Actually I am a bit confused now. Since migration will keep namespace consistency, so either stat() or readdir() should return the real (correct) ino. I do not know why should we keep the original ino (or FID) after migration? I probably miss sth. Could you please explain the purpose of keeping consistent ino here? why these external tool needs consistency ino? Thanks.

Andreas Dilger added a comment - 05/Jan/16 3:51 AM

For the HSM FID problem I think a different solution is needed. Instead of storing the FID in the archive, it is better to store the archive identifier (UUID or whatever) in the Lustre inode as a part of the composite layout. That allows storing multiple versions of the file in the archive, as well as allowing partial HSM file restore with composite files.

Andreas Dilger added a comment - 05/Jan/16 3:51 AM For the HSM FID problem I think a different solution is needed. Instead of storing the FID in the archive, it is better to store the archive identifier (UUID or whatever) in the Lustre inode as a part of the composite layout. That allows storing multiple versions of the file in the archive, as well as allowing partial HSM file restore with composite files.

Andreas Dilger added a comment - 05/Jan/16 3:48 AM

I hadn't thought about keeping the whole FID, just the original inode number for exposure to userspace for tools like tar or other backup tools that depend on the inode number. Since we won't be using this number internally, only for exposure via stat() and readdir() it doesn't matter if it is different than the real FID.

Do you think there is value to keep the original FID?

Andreas Dilger added a comment - 05/Jan/16 3:48 AM I hadn't thought about keeping the whole FID, just the original inode number for exposure to userspace for tools like tar or other backup tools that depend on the inode number. Since we won't be using this number internally, only for exposure via stat() and readdir() it doesn't matter if it is different than the real FID. Do you think there is value to keep the original FID?

Di Wang (Inactive) added a comment - 04/Jan/16 10:00 PM - edited

Hmm, as long as we do not use this original FID for internal references, it should be ok. As for backup, are you trying to temporarily resolve ~~LU-6866~~? i.e. copy tool will always use this original FID as the identifier to locate the file in the archive? then we probably need record this original FID into changelog, or the copy tool needs to be changed to retrieve FID from LMA?

Di Wang (Inactive) added a comment - 04/Jan/16 10:00 PM - edited Hmm, as long as we do not use this original FID for internal references, it should be ok. As for backup, are you trying to temporarily resolve LU-6866 ? i.e. copy tool will always use this original FID as the identifier to locate the file in the archive? then we probably need record this original FID into changelog, or the copy tool needs to be changed to retrieve FID from LMA?

Andreas Dilger added a comment - 04/Jan/16 7:05 PM

Di, what do you think about this idea? It should be possible to add this as a "compat" feature to the LMA only if the inode is migrated. Since the internal references are all using the FID, the inode number displayed by "ls" and "stat" don't really matter.

Andreas Dilger added a comment - 04/Jan/16 7:05 PM Di, what do you think about this idea? It should be possible to add this as a "compat" feature to the LMA only if the inode is migrated. Since the internal references are all using the FID, the inode number displayed by "ls" and "stat" don't really matter.

People

Assignee:: Lai Siyao

Reporter:: Andreas Dilger

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 24/Dec/15 10:37 AM

Updated:: 29/May/25 5:38 PM