[LU-7607] Preserve inode number after MDT migration Created: 24/Dec/15  Updated: 04/Oct/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Andreas Dilger Assignee: Lai Siyao
Resolution: Unresolved Votes: 0
Labels: dne3

Issue Links:
Related
is related to LU-6866 MDT file migration is incompatible wi... Resolved
is related to LU-2430 Migration tool for DNE Resolved
is related to LU-13426 "lfs migrate" on DoM component clobbe... Resolved
is related to LU-14975 DNE3: directory migration in non-recu... Resolved
is related to LU-7749 DNE3: migrated orphan survive till ne... Open
is related to LU-14465 unlink fails when if the metadata mig... Open
is related to LU-11753 MDS BUG on lfs migrate [osd_it_ea_rec] Resolved
is related to LU-11306 Moving files from one MDT to another ... Resolved
is related to LU-11025 DNE3: directory restripe Resolved
Rank (Obsolete): 9223372036854775807
Epic Link: MDT rebalance v3

 Description   

During migration, the MDT FID of the migrated file is changed to reflect the new MDT the inode is stored on. However, it would be possible to keep the user-visible inode constant after migration by storing the original FID into the LMA as a new field. If present, this saved FID could be used to generate the inode number for userspace instead of the current FID so that it doesn't affect user tools such as backups.



 Comments   
Comment by Andreas Dilger [ 04/Jan/16 ]

Di, what do you think about this idea? It should be possible to add this as a "compat" feature to the LMA only if the inode is migrated. Since the internal references are all using the FID, the inode number displayed by "ls" and "stat" don't really matter.

Comment by Di Wang [ 04/Jan/16 ]

Hmm, as long as we do not use this original FID for internal references, it should be ok. As for backup, are you trying to temporarily resolve LU-6866? i.e. copy tool will always use this original FID as the identifier to locate the file in the archive? then we probably need record this original FID into changelog, or the copy tool needs to be changed to retrieve FID from LMA?

Comment by Andreas Dilger [ 05/Jan/16 ]

I hadn't thought about keeping the whole FID, just the original inode number for exposure to userspace for tools like tar or other backup tools that depend on the inode number. Since we won't be using this number internally, only for exposure via stat() and readdir() it doesn't matter if it is different than the real FID.

Do you think there is value to keep the original FID?

Comment by Andreas Dilger [ 05/Jan/16 ]

For the HSM FID problem I think a different solution is needed. Instead of storing the FID in the archive, it is better to store the archive identifier (UUID or whatever) in the Lustre inode as a part of the composite layout. That allows storing multiple versions of the file in the archive, as well as allowing partial HSM file restore with composite files.

Comment by Di Wang [ 05/Jan/16 ]

Actually I am a bit confused now. Since migration will keep namespace consistency, so either stat() or readdir() should return the real (correct) ino. I do not know why should we keep the original ino (or FID) after migration? I probably miss sth. Could you please explain the purpose of keeping consistent ino here? why these external tool needs consistency ino? Thanks.

Comment by Andreas Dilger [ 05/Jan/16 ]

The point of keeping a consistent inode number is that some tools, such as backups, depend on the inode number to remain the same so they can do incremental backups. Otherwise, they can't tell the difference between migrate changing the inode number, or the file being deleted and a new file created with the same name. NFS servers in userspace would also use the inode number.

NFS file handles generated in the kernel by Lustre are the same, but since we encode the FID into the Lustre file handle this wouldn't help - we'd need to allow the original FID to be looked up on the original MDT with a redirection to the new FID.

Comment by Di Wang [ 05/Jan/16 ]

I see. Thanks. Hmm for stat() we only need fill the original FID into mdt_body of getattr request, and cache it in ll_inode_info, and fill it to stat->ino, But for readdir(), it will read from the directory entries directly, if we inject LMA original ino checking in this process, it might slow down the readdir a lot.

Comment by Andreas Dilger [ 04/Dec/19 ]

Lai, I recall not too long ago we discussed the ability to save the old FID after migration. Is there anything that needs to be updated in this ticket to describe your proposal?

Comment by Lai Siyao [ 05/Dec/19 ]

IMO the overhead of reserving inode number is quite high, and rather than saving the original FID, I'd prefer to add an option to keep file inode untouched in migration, that is to say, for existing sub files, inode won't be migrated, but namespace updated.

If we still prefer migrating inode, I'd suggest drop the support of preserving inode number, but add FID mapping to support NFS export: a special OI file will be added, which contains mappings from original FID to new FID, and in lu_object_find() it will lookup this
OI file, if new FID is found, it's replied client with -EREMOTE, and client will resend the request with new FID to the correct MDT, for some request like rename, client may need to retry several times if more than involved file are migrated. There will be a garbage collect thread on server to remove aged mapping from this OI file.

Comment by Ben Evans (Inactive) [ 05/Dec/19 ]

Could we emit the changes out the changelog?  If someone wants to keep track of changes of fid/inode through time, they could have a listener set up to catch/archive them.

The various calls to find a FID, etc. could simply call up to the userspace service to get historical info.

Comment by Andreas Dilger [ 05/Dec/19 ]

Ben, there is a already a MIGRT ChangeLog record for inode migration.

Comment by Andreas Dilger [ 05/Dec/19 ]

Lai, it definitely makes sense to have an option to migrate the parent directory and filenames without migrating the file inodes. In that case there is no need for this feature to preserve the inode numbers, since they won't change.

This is only needed in the case where the inode is moved to a new MDT, which can be needed in case of removing an MDT, or if an MDT is very full, not in the normal space balancing case.

I think storing the original FID in the inode is not too hard, and it will always be unique. Adding the old and new FID in the OI table is also useful. We can't return -EREMOTE to NFS clients, but it can be handled by the Lustre client so that it doesn't return -ESTALE to NFS.

Comment by Lai Siyao [ 06/Dec/19 ]

Andreas, okay.

Comment by Andreas Dilger [ 08/Apr/20 ]

I was looking at whether we could use the FID stored in the LOV EA to preserve the "original" inode number of the file. It seems that the FID is stored in the LOV EA in each component and is also preserved over OST and MDT migration:

tests# lfs setstripe -E 1M -L mdt -E 1G -c 3 -E eof /mnt/testfs/dir1/tt
tests# dd if=/dev/zero of=/mnt/testfs/dir1/tt bs=1M count=2
tests# lfs path2fid /mnt/testfs/dir1/tt
[0x200001b72:0x10c07:0x0]
tests# lfs getstripe -v /mnt/testfs/dir1/tt
components:
  - lcme_id:             1
    lcme_extent.e_start: 0
    lcme_extent.e_end:   1048576
    sub_layout:
      lmm_seq:           0x200001b72
      lmm_object_id:     0x10c07
      lmm_fid:           [0x200001b72:0x10c07:0x0]
      lmm_stripe_count:  1
  - lcme_id:             2
    lcme_extent.e_start: 1048576
    lcme_extent.e_end:   1073741824
    sub_layout:
      lmm_seq:           0x200001b72
      lmm_object_id:     0x10c07
      lmm_fid:           [0x200001b72:0x10c07:0x0]
  - lcme_id:             3
    lcme_extent.e_start: 1073741824
    lcme_extent.e_end:   EOF
    sub_layout:
      lmm_seq:           0x200001b72
      lmm_object_id:     0x10c07
      lmm_fid:           [0x200001b72:0x10c07:0x0]
tests# lfs migrate -c 3 /mnt/testfs/dir1/tt
tests# lfs getstripe -v /mnt/testfs/dir1/tt
lmm_seq:           0x200001b72
lmm_object_id:     0x10c07
lmm_fid:           [0x200001b72:0x10c07:0x0]
lmm_stripe_count:  3
tests# lfs migrate -m1 /mnt/testfs/dir1
tests# lfs path2fid /mnt/testfs/dir1/tt
[0x240001b70:0x2743:0x0]
tests# lfs getstripe -v /mnt/testfs/dir1/tt
lmm_seq:           0x200001b72
lmm_object_id:     0x10c07
lmm_fid:           [0x200001b72:0x10c07:0x0]
lmm_stripe_count:  3

There is a bug (LU-13426) if there is a DOM component in the layout that clobbers the FID, but DOM migration is relatively new and can be fixed to preserve the FID properly.

Comment by Andreas Dilger [ 09/Jan/22 ]

"Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch https://review.whamcloud.com/38135
Subject: LU-7607 dne: add FID map interfaces
Project: fs/lustre-release
Branch: master
Current Patch Set: 9
Commit: 2119206d30f6d84ac1974d9c7d3b24bc25eee1e4

Comment by Andreas Dilger [ 09/Jan/22 ]

"Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch https://review.whamcloud.com/38233
Subject: LU-7607 dne: support FID map
Project: fs/lustre-release
Branch: master
Current Patch Set: 7
Commit: 0f7af1befc6de458acd84b817c535938f839c808

Comment by Andreas Dilger [ 09/Jan/22 ]

"Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch https://review.whamcloud.com/38285
Subject: LU-7607 mdd: add fidmap reclaim thread
Project: fs/lustre-release
Branch: master
Current Patch Set: 4
Commit: 3068abb9c2e5f85eb76f9e94c7469a761af5f86a

Comment by Andreas Dilger [ 09/Jan/22 ]

Add in links to existing patches under this ticket, not sure why they weren't previously created.

Generated at Sat Feb 10 02:10:22 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.