Details
-
Improvement
-
Resolution: Duplicate
-
Minor
-
None
-
None
-
9223372036854775807
Description
For cases when a large file is created with a DoM component in a PFL file, this will result in the maximum amount of space consumed on the MDT for the PFL layout, yet provide little or no benefit of storing a small part of the file on the MDT (no RPC savings, no SOM, or even extra RPCs because the bulk data is transferred separately from the MDT and OST for the first stripe, if < 1MB in size).
In such cases, it would be desirable to migrate the DoM component to the hole at the start of the first OST to save space on the MDT, especially for large files. The drawback is that currently DoM migration requires having a client copy all of the file data to another file before swapping the layouts. This may mean GB or TB of data movement to remove a 64KB DoM component.
It would be more efficient to write only the data from the DoM component directly from the MDS into the start of the first OST object on the OSS. The MDS can safely exclude other writers while holding the MDS_INODELOCK_LAYOUT|MDS_INODELOCK_DOM locks for the inode, and use OUT_WRITE to send the data to the OST object. After the write has committed, then the MDS can rewrite the layout to remove the DoM component safely and drop the DLM locks.
There is no danger if the MDS crashes before the layout is changed, because the "hidden" data on the OST cannot be accessed by the client with the old layout in place.