<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:34:31 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-17326] Implement FALLOC_FL_COLLAPSE_RANGE and FALLOC_FL_INSERT_RANGE</title>
                <link>https://jira.whamcloud.com/browse/LU-17326</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;It would be possible to implement the &lt;tt&gt;fallocate(FALLOC_FL_INSERT_RANGE)&lt;/tt&gt; and &lt;tt&gt;FALLOC_FL_COLLAPSE_RANGE&lt;/tt&gt; options for Lustre, with specific restrictions:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;in all cases there must be a DLM write lock held on the object from &lt;tt&gt;offset&lt;/tt&gt; to &lt;tt&gt;OBD_OBJECT_EOF&lt;/tt&gt; to flush any dirty cache from the clients and prevent access while the layout is being modified.  If this was done on the OST it would ensure the client cache is flushed and fetched again with the correct data.&lt;/li&gt;
	&lt;li&gt;as a potentially separate step, if the client sends a lock handle in the RPC for the appropriate range of the object, then the OST could assume that the client is adjusting its page cache appropriately and avoid flushing the entire file from cache.  The &lt;tt&gt;ext4_fallocate()&lt;/tt&gt; call handles flushing the page cache for local operations, so the llite code would need to do the same for the local page cache to keep it consistent (probably &lt;b&gt;after&lt;/b&gt; the OST RPC is successful so that it does not lose local state if the RPC failed for any reason).&lt;/li&gt;
	&lt;li&gt;for 1-stripe plain layout (non-PFL) files it would only require &lt;tt&gt;blocksize&lt;/tt&gt; alignment limitations for &lt;tt&gt;offset&lt;/tt&gt; and &lt;tt&gt;len&lt;/tt&gt;, which appear to be enforced by the backing ldiskfs filesystem code itself.  This could basically be implemented today as a straight pass-through with no effort except checking the flags and layout and continuing to return &lt;tt&gt;-EOPNOTSUPP&lt;/tt&gt; from both the client and server for these modes for files with more than one stripe.&lt;/li&gt;
	&lt;li&gt;for multi-striped plain layouts the &lt;tt&gt;offset&lt;/tt&gt; must be aligned to an integer multiple of &lt;tt&gt;lmm_stripe_size&lt;/tt&gt;, and &lt;tt&gt;len&lt;/tt&gt; must be an integer multiple of &lt;tt&gt;stride = lmm_stripe_count &amp;#42; lmm_stripe_size&lt;/tt&gt;.  This ensures that whole &quot;&lt;tt&gt;stride&lt;/tt&gt; units&quot; of the file are added/removed at once and the data does not need to be moved between OST stripes of the file when it is shifted.  Otherwise the client would continue to return &lt;tt&gt;-EOPNOTSUPP&lt;/tt&gt; for PFL files.  The &lt;tt&gt;offset&lt;/tt&gt; should be mapped in LOV to the proper starting offset of the OST object, and &lt;tt&gt;len&lt;/tt&gt; should be divided by &lt;tt&gt;lmm_stripe_count&lt;/tt&gt; so that there is an appropriate amount of space added/removed from each object by calling &lt;tt&gt;fallocate()&lt;/tt&gt; on each object individually.&lt;/li&gt;
	&lt;li&gt;for a PFL file, this alignment/size restriction applies to &lt;b&gt;both&lt;/b&gt; the layout of the current component (and any overlapping mirror components at that offset) and any &lt;b&gt;later&lt;/b&gt; components in the file (if allocated), to ensure that any data shifts in the later components can also be handled without data movement since they will also need to have &lt;tt&gt;fallocate()&lt;/tt&gt; called on all allocated objects for the component.  It would also be necessary to shift the &lt;tt&gt;lcme_extent.e_start&lt;/tt&gt; and &lt;tt&gt;.e_end&lt;/tt&gt; for the following component(s) so that the file layout is suitably mapped to the new data offset. It is also necessary for the OST to update &lt;tt&gt;ost_layout.ol_comp_start&lt;/tt&gt; and &lt;tt&gt;ol_comp_end&lt;/tt&gt; in the &lt;tt&gt;filter_fid&lt;/tt&gt; xattr on the OST object as part of the same transaction as &lt;tt&gt;fallocate()&lt;/tt&gt; so that the data stays consistent.&lt;/li&gt;
	&lt;li&gt;in a far distant future where this feature is heavily used and important for some workload, it might be possible to reduce the &lt;tt&gt;stride&lt;/tt&gt; alignment to only &lt;tt&gt;lmm_stripe_size&lt;/tt&gt; (largest in current and later components) by reordering the OST objects in the current and later component layouts.  That still avoids the need to move between objects on different OSTs, but is of course much more complex to get right, and can be avoided by selecting suitable &lt;tt&gt;stride = stripe_count &amp;#42; stripe_size&lt;/tt&gt; for all components of a file (e.g. &lt;b&gt;smaller&lt;/b&gt; &lt;tt&gt;stripe_size&lt;/tt&gt; for later components to compensate for larger &lt;tt&gt;stripe_count&lt;/tt&gt;).&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;There would be some risk when using these operations on a file, since at least &lt;tt&gt;fallocate(FALLOC_FL_COLLAPSE_RANGE)&lt;/tt&gt; cannot be reversed if there is an error when the operation is partially applied to multiple stripes/components of the file.  This is not totally different from &lt;tt&gt;truncate()&lt;/tt&gt; or &lt;tt&gt;fallocate(FALLOC_FL_PUNCH_HOLE)&lt;/tt&gt;, but the added risk is that a partially-applied data shift would leave any incomplete parts of the file with the &lt;b&gt;wrong&lt;/b&gt; data (as opposed to stale but formerly correct data).  It may be necessary to implement recoverability for a partial operation via a logged transaction from the MDS to ensure that it is applied to an OST object after recovery (in an idempotent way, since repeated shifts would also corrupt the data).  For a mirrored file, one option would be to mark a mirror stale if the shift partly fails, and leave it up to a resync agent to copy the data again with the correct offset in that case.&lt;/p&gt;</description>
                <environment></environment>
        <key id="79264">LU-17326</key>
            <summary>Implement FALLOC_FL_COLLAPSE_RANGE and FALLOC_FL_INSERT_RANGE</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="adilger">Andreas Dilger</reporter>
                        <labels>
                            <label>medium</label>
                    </labels>
                <created>Thu, 30 Nov 2023 21:06:43 +0000</created>
                <updated>Thu, 30 Nov 2023 21:58:38 +0000</updated>
                                                                                <due></due>
                            <votes>0</votes>
                                    <watches>1</watches>
                                                                                <issuelinks>
                            <issuelinktype id="10324">
                    <name>Cloners</name>
                                            <outwardlinks description="Clones">
                                        <issuelink>
            <issuekey id="77617">LU-17055</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="19874">LU-3606</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="61799">LU-14160</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="62599">LUDOC-487</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i043dz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>