[LU-7749] DNE3: migrated orphan survive till next reboot Created: 05/Feb/16  Updated: 07/Apr/20

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Alex Zhuravlev Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: dne3

Issue Links:
Related
is related to LU-2430 Migration tool for DNE Resolved
is related to LU-11025 DNE3: directory restripe Resolved
is related to LU-7607 Preserve inode number after MDT migra... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

during migration we put old objects onto /PENDING and leave it there till the next reboot (PENDING cleanup). this approach has a number of problems:
1) space (inodes and directories) aren't released
so if one wanted to rebalance/free space on MDT will have to reboot it
2) PENDING/ may become huge slowing down regular unlinks
this is especially bad as it never shrink



 Comments   
Comment by Andreas Dilger [ 05/Feb/16 ]

Alex, what do you propose for a solution here? There is definitely a need to keep migrated inodes on the current MDT for some time, to avoid access races as well as things like NFS file handles (which contain the FID). The problem with NFS file handles is that they can be long lived.

I agree that keeping the old inode for a long time isn't necessarily desirable, so possibly an LRU and/or aging to remove them? Should this be seconds, minutes, hours?

You had previously also mentioned that the inodes are deleted from PENDING after a reboot, which I guess is true since nothing holds them open after recovery. It would be nice to fix this as well, though I don't think it is a blocker.

In theory, we don't need to actually keep this redirection on the disk. It could be only in memory (at least currently, since PENDING isn't really persistent across recovery), but then it would pin a lot of RAM, which may be even less desirable. It would also be possible to have a redirection index with [key=old_fid,value=new_fid] mappings, which would take less space than keeping a whole inode around, but it wouldn't be any faster than PENDING if it grows large since it is still an index. We could also delete and recreate PENDING after recovery if it is empty but very large, or even if non-empty by moving remaining files to the new PENDING (though this adds complexity).

Comment by Alex Zhuravlev [ 11/Feb/16 ]

Andreas, do we already have a support for such a forwarding on the client side?

big PENDING/ affects regular unlink/close while a separate index does not.

Comment by Andreas Dilger [ 13/Feb/16 ]

I think a few smaller patches could be made to improve things here:

  • scan of PENDING periodically (hourly?) to free objects that are not open but older than an hour (maybe /proc tunable to adjust scan interval), or if free inodes are getting low
  • change recovery to keep redirection objects in PENDING after reboot
  • implementing LU-7607 would also reduce the need to keep redirector inodes around, but not eliminate it for NFS filesystems unless the Lustre file handle could locate the redirected inode. I don't know if we would be able to detect if NFS is ever used on an object, since this is local to the client, but migration may happen on another client. I guess it would be possible for the client to send a flag/set_info to the MDS of the export operations are ever used. If not, then the redirector objects don't need to be kept much longer than the end of the migration transaction.
Comment by Andreas Dilger [ 07/Apr/20 ]

I think that this functionality is being implemented by patch https://review.whamcloud.com/38135 "LU-11025 dne: add FID mapping interfaces"?

Generated at Sat Feb 10 02:11:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.