<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:32:41 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-10169] Spillover space</title>
                <link>https://jira.whamcloud.com/browse/LU-10169</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;I had an alternate but somewhat overlapping thought to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10070&quot; title=&quot;PFL self-extending file layout&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10070&quot;&gt;&lt;del&gt;LU-10070&lt;/del&gt;&lt;/a&gt; - keep the last component going to EOF, but allow for the possibility of bounding it in the future by creating a new FLR replica, and marking the original one as stale.&lt;/p&gt;

&lt;p&gt;Spillover space (MRP-4598):&lt;br/&gt;
I&apos;m wondering if we can take the new enhanced file layouts feature plus OSC space grant info (or some other trigger) to eliminate ENOSPC caused by full OSTs. The idea would be something like:&lt;br/&gt;
1.       Server decreases grants as space approaches 0&lt;br/&gt;
2.       Client notes grant = 0 (or low)&lt;br/&gt;
3.       Client takes layout lock, forcing flush of all dirty extents&lt;br/&gt;
4.       Client adds a new FLR replica which chooses a different (emptier) set of OSTs for all &lt;em&gt;further&lt;/em&gt; file extents of open-for-write files&lt;br/&gt;
5.       Client sets this new layout as primary, and releases layout lock&lt;br/&gt;
6.       Clients reqacquire locks and grants from new OSTs&lt;/p&gt;

&lt;h4&gt;&lt;a name=&quot;Example&quot;&gt;&lt;/a&gt;Example&lt;/h4&gt;
&lt;p&gt;Initially:&lt;br/&gt;
Layout v1: [0-inf) OST0-4&lt;/p&gt;

&lt;p&gt;Runs out of space when file is at 10GB; we add a 2nd replica with a PFL layout&lt;/p&gt;

&lt;p&gt;Layout v2: complex&lt;br/&gt;
   Replica1: [0-inf) OST0-4&lt;br/&gt;
   Replica2: component1 [0-10GB) OST0-4; component2 [10GB-inf) OST5-10&lt;br/&gt;
Set Layer2 as the primary copy, mark Layer1 as stale.&lt;/p&gt;

&lt;h4&gt;&lt;a name=&quot;Benefits&quot;&gt;&lt;/a&gt;Benefits&lt;/h4&gt;
&lt;ul&gt;
	&lt;li&gt;Eliminate ENOSPC for single full OST&lt;/li&gt;
	&lt;li&gt;Allow for &quot;tiering&quot; spillover from flash pool to disk pool&lt;/li&gt;
	&lt;li&gt;Maybe use something like this also for changing layouts for failed OSTs...??&lt;/li&gt;
&lt;/ul&gt;


&lt;h4&gt;&lt;a name=&quot;Questions&quot;&gt;&lt;/a&gt;Questions&lt;/h4&gt;
&lt;ul&gt;
	&lt;li&gt;Can layouts for open-for-write files be changed like this?&lt;/li&gt;
	&lt;li&gt;Can the PFL component layout match object-for-object the original simple RAID0 layout? (I.e. we don&apos;t want to copy the data, but instead just reference the original objects, now with an extent limit.)&lt;/li&gt;
	&lt;li&gt;Is grant the right trigger?&lt;/li&gt;
	&lt;li&gt;Do we need a policy for spillover selection? Pre-set spillover targets? Ask the MDS for new allocation?&lt;/li&gt;
&lt;/ul&gt;
</description>
                <environment></environment>
        <key id="49026">LU-10169</key>
            <summary>Spillover space</summary>
                <type id="2" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11311&amp;avatarType=issuetype">New Feature</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="pfarrell">Patrick Farrell</assignee>
                                    <reporter username="nrutman">Nathan Rutman</reporter>
                        <labels>
                    </labels>
                <created>Mon, 30 Oct 2017 20:13:04 +0000</created>
                <updated>Tue, 27 Aug 2019 17:49:30 +0000</updated>
                            <resolved>Tue, 27 Aug 2019 17:49:30 +0000</resolved>
                                    <version>Lustre 2.10.1</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="212354" author="nrutman" created="Mon, 30 Oct 2017 20:58:36 +0000"  >&lt;p&gt;&amp;gt; mark Layer1 as stale.&lt;br/&gt;
Permanently stale. Or just delete it. Maybe this doesn&apos;t need FLR at all, if we can change Layout v1 into Layout v2 Replica2 directly under the layout lock.&lt;/p&gt;</comment>
                            <comment id="212363" author="adilger" created="Mon, 30 Oct 2017 22:40:17 +0000"  >&lt;p&gt;I&apos;ve thought about this issue and possible solutions in the past as well, and can share my ideas here (they may also be somewhere else). I agree that handling the single OST full issue is desirable, but my hope is that PFL will avoid this to a large extent, as would better OST space balancing as described in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9&quot; title=&quot;Optimize weighted QOS Round-Robin allocator&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9&quot;&gt;LU-9&lt;/a&gt; and &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9809&quot; title=&quot;RTDS(Real-Time Dynamic Striping): A policy based striping framework&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9809&quot;&gt;LU-9809&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;While it IS possible to modify a layout while it is actively being written by a client, there are some caveats. There is not currently any way for a client to modify an existing component directly. This is done to prevent clients from introducing corruption into the layout (e.g. referencing objects owned by another file/user, or objects that do not exist). Also, until FLR is landed the components must be strictly non-overlapping. &lt;/p&gt;

&lt;p&gt;Currently the methods to update composite layout are:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;add a new component with a specified layout (instantialted or template)&lt;/li&gt;
	&lt;li&gt;remove an exiting component (by component number)&lt;/li&gt;
	&lt;li&gt;swap the layout from one file with the layout from another file&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;I&apos;m not against fixing this issue more directly, but at a minimum we would need a new layout operation to truncate the end of an existing component (@10GB in your example) before adding a new component to cover the rest of the file. That wouldn&apos;t be too hard, and would preserve the semantics that clients cannot manipulate layouts directly. &lt;/p&gt;

&lt;p&gt;The next problem is where to truncate the original layout? There is no guarantee that the object on the full OST will have a nice size like 1GB, and currently there is a requirement that layouts must have sizes that are a multiple of 64KB. That implied we need to truncate the full object at the nearest multiple of 64KB, since we can&apos;t write more data to that OST, and write the remainder to the new component. Not a huge deal for &amp;lt; 64KB of data, but the one full stripe is not the largest issue. &lt;/p&gt;

&lt;p&gt;The final issue is that the &lt;em&gt;other&lt;/em&gt; OSTs the remaining stripes are on are presumably not full, so they may have continued being written before the client noticed one OST is full, and the file could be written from many other clients. That means potentially multiple GB of potentially sparse data that needs to be copied over to the new component atomically before the original layout is truncated and a new component is added. &lt;/p&gt;

&lt;p&gt;Taking this to the extreme, even if we had a layout that had a &quot;ragged&quot; starting offset to handle the different-sized objects, there would still be the issue of holes in the original component that could not be filled, if the file was not being written linearly from start to end. While linear write is the most common case, there would definitely be times where that wasn&apos;t true, so even very complex solutions (which I would be against) wouldn&apos;t solve all cases.&lt;/p&gt;

&lt;p&gt;That said, if this could be fixed for the common single client linear writing case (i.e. truncate existing layout, add new component, copy a small amount of data that was truncated off), it would not be worse than what we have today.  This could be simplified further if the layout was changed &lt;em&gt;before&lt;/em&gt; an OST was totally full, which would essentially become a form of self-extending PFL layout in the end. &lt;/p&gt;

&lt;p&gt;An ounce of prevention in the form of PFL and not filling OSTs to 100% in the first place is worth a pound of cure. &lt;/p&gt;</comment>
                            <comment id="212372" author="nrutman" created="Tue, 31 Oct 2017 00:22:17 +0000"  >&lt;blockquote&gt;
&lt;p&gt;This could be simplified further if the layout was changed before an OST was totally full, which would essentially become a form of self-extending PFL layout in the end.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;yes - this is why I was suggesting something like watching grant to trigger the layout change &lt;em&gt;before&lt;/em&gt; an actual ENOSPC. If all writers can&apos;t flush at this point, it&apos;s ENOSPC and give up. But we can avoid those cases just by more aggressively changing layout at say 95% full or something. We would truncate the layout at the furthest written extent, rounded up to something nice (say a full stripe size), again assuming we left ourselves plenty of spare room on each OST that hosts a stripe. That way we don&apos;t have to re-write or copy anything. Holes are perfectly fine &#8211; this will become one component of a PFL, and subsequent writes can fill in those holes if they want (since we left ourselves extra space). (Sure, you could come up with a sparse file scenario where this breaks down, but in those cases we just return ENOSPC as today.)&lt;br/&gt;
 Of the two significant cases that this addresses&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Eliminate ENOSPC for single full OST&lt;/li&gt;
	&lt;li&gt;Allow for &quot;tiering&quot; spillover from flash pool to disk pool&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;neither can be addressed with a static layout determined at file create time. Eg. someone creates a tiered PFL on flash/disk OSTs with plenty of room, then someone else fills all the flash drives with a checkpoint. &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10070&quot; title=&quot;PFL self-extending file layout&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10070&quot;&gt;&lt;del&gt;LU-10070&lt;/del&gt;&lt;/a&gt; moves kind of halfway toward a dynamic component, but the alternative expressed&#160;in this ticket seems (to me) to provide broader advantages.&lt;/p&gt;</comment>
                            <comment id="219379" author="nrutman" created="Mon, 29 Jan 2018 19:45:13 +0000"  >&lt;p&gt;Not handled as well by 10070:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Files must have a PFL layout already, primed for ENOSPC handling.&#160;10169 could handle &lt;em&gt;any&lt;/em&gt; layout type.&lt;/li&gt;
	&lt;li&gt;10070 grants a fixed amount of space to the next component. Too small, you will end up with too many components; too large, you may still run out of space as other files fill the OST. You&apos;re counting on bulk behavior, but one bad file could still cause ENOSPC for everyone else.&lt;/li&gt;
	&lt;li&gt;Layouts continue to grow with 10070, even though the system may have plenty of space, requiring more MDT space and network traffic for updating client layouts.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Not handled as well by 10169:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Sparse writer case. Some writer way off at the end of the file may have an empty OST, and someone at the beginning hits a full OST. We can&apos;t change the extent of the existing layout to &quot;short&quot; without copying the data for potentially a large component. (If we can detect this, we could just give up and ENOSPC this one).&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="253703" author="pfarrell" created="Tue, 27 Aug 2019 17:49:30 +0000"  >&lt;p&gt;This was implemented in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10070&quot; title=&quot;PFL self-extending file layout&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10070&quot;&gt;&lt;del&gt;LU-10070&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="48585">LU-10070</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzms7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>