Details
-
Improvement
-
Resolution: Duplicate
-
Minor
-
None
-
None
-
9223372036854775807
Description
For cases when a large file is created with a DoM component in a PFL file, this will result in the maximum amount of space consumed on the MDT for the PFL layout, yet provide little or no benefit of storing a small part of the file on the MDT (no RPC savings, no SOM, or even extra RPCs because the bulk data is transferred separately from the MDT and OST for the first stripe, if < 1MB in size).
In such cases, it would be desirable to migrate the DoM component to the hole at the start of the first OST to save space on the MDT, especially for large files. The drawback is that currently DoM migration requires having a client copy all of the file data to another file before swapping the layouts. This may mean GB or TB of data movement to remove a 64KB DoM component.
It would be more efficient to write only the data from the DoM component directly from the MDS into the start of the first OST object on the OSS. The MDS can safely exclude other writers while holding the MDS_INODELOCK_LAYOUT|MDS_INODELOCK_DOM locks for the inode, and use OUT_WRITE to send the data to the OST object. After the write has committed, then the MDS can rewrite the layout to remove the DoM component safely and drop the DLM locks.
There is no danger if the MDS crashes before the layout is changed, because the "hidden" data on the OST cannot be accessed by the client with the old layout in place.
I don't think this is done at all, and should be reopened as a useful feature for reclaiming DoM or flash space.
LU-11421simply allows a DoM file to be used in a mirror. It does not do what this ticket asks, which is to copy an earlier component's data into the hole at the beginning of the next component (should be generally, not just for DoM.)What would need to happen:
So to address #1 it seems we need to create a "fake" mirror using Bob's component description, but adjusting the starting extent to 0. Then we would need to mirror sync the DoM's extent. Then we can delete the original mirror - but making sure not to delete Bob's objects, which are now also part of mirror 2.
Alternatively, we give lfs some special permission to write directly into any extent in any component, and avoid the mirror dance.