<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:55:13 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-12738] PFL: append of PFL file should not instantiate full layout</title>
                <link>https://jira.whamcloud.com/browse/LU-12738</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;This is a continuation of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9341&quot; title=&quot;PFL: append should not instantiate full layout&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9341&quot;&gt;&lt;del&gt;LU-9341&lt;/del&gt;&lt;/a&gt;, created to track related work which has not been completed yet.&lt;br/&gt;
-------&lt;/p&gt;

&lt;p&gt;Appending to a PFL file will cause all layout components to be instantiated because it isn&apos;t possible to know what the ending offset is at the time the write is started.&lt;/p&gt;

&lt;p&gt;It would be better to avoid this, potentially by locking/instantiating some large(r), but not gigantic range beyond current EOF, and if that fails retry the layout intent? The client must currently be in charge of locking the file during append, so it should know at write time how much of the file to instantiate, and it could retry.&lt;/p&gt;

&lt;p&gt;-------&lt;/p&gt;

&lt;p&gt;A detailed breakdown of the remaining issue &amp;amp; a possible approach to fixing it (originally from &lt;a href=&quot;#comment-252605&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;comment-252605&lt;/a&gt;) :&lt;/p&gt;

&lt;p&gt;There are two basic problems/constraints we are trying to meet.&lt;/p&gt;

&lt;p&gt;Append writes must be atomic, meaning two things:&lt;br/&gt;
1. No &quot;write tearing&quot;.&#160; Every byte of a particular &lt;tt&gt;O_APPEND&lt;/tt&gt; write must be adjacent, no gaps.&#160; (This is true of any write, but it is more relevant for &lt;tt&gt;O_APPEND&lt;/tt&gt; writes, which always start at EOF, so if we are not careful, a restarted append write will restart at a new EOF.)&lt;/p&gt;

&lt;p&gt;2. They must not be mixed with other &lt;tt&gt;O_APPEND&lt;/tt&gt; writes.&#160; If two &lt;tt&gt;O_APPEND&lt;/tt&gt; writes, A and B, are racing, either AB or BA is valid, but it is&#160;&lt;b&gt;not&lt;/b&gt;&#160;valid for any part of A or B to overwrite the other one.&#160; This is of course&#160;&lt;b&gt;not&lt;/b&gt;&#160;true of regular writes, which are started at a specific offset.&#160; This means that if a regular write is racing with an O_APPEND write, it can overwrite part of the &lt;tt&gt;O_APPEND&lt;/tt&gt; write.&#160; This is acceptable.&lt;/p&gt;

&lt;p&gt;The first problem (write tearing) is solved by getting the file size before starting the write and using the same one throughout the &lt;tt&gt;O_APPEND&lt;/tt&gt; write.&#160; (The current code checks the file size repeatedly, I believe for every iteration of &lt;tt&gt;cl_io_loop()&lt;/tt&gt;.)&#160; This means that if another write races with our &lt;tt&gt;O_APPEND&lt;/tt&gt; write we must&#160;&lt;b&gt;not&lt;/b&gt;&#160;&apos;tear&apos; the &lt;tt&gt;O_APPEND&lt;/tt&gt; write by moving to a new EOF in the middle of the write.&#160; Note that we must also retain the size across I/O restarts, for the case where we have to update the layout or resend RPCs in the middle of an &lt;tt&gt;O_APPEND&lt;/tt&gt; write.&lt;/p&gt;

&lt;p&gt;The second problem requires that we allow only one &lt;tt&gt;O_APPEND&lt;/tt&gt; write at a time.&#160; There are probably a few ways to solve this, but I think the correct way (it is definitely the simplest way) is to add another bit to the MDT IBITS lock, an &lt;tt&gt;MDS_INODELOCK_APPEND&lt;/tt&gt; bit.&#160; All &lt;tt&gt;O_APPEND&lt;/tt&gt; writes must ask for this bit in PW mode before starting to write.&#160; (We cannot use the &lt;tt&gt;MDS_INODELOCK_LAYOUT&lt;/tt&gt; bit for this exclusion because the server revokes our &lt;tt&gt;LAYOUT&lt;/tt&gt; lock bit when we have to update the file layout.)&#160; The &lt;tt&gt;APPEND&lt;/tt&gt; lock must be held across i/o restarts, so it should be taken before the &lt;tt&gt;LAYOUT&lt;/tt&gt; lock.&#160; (Or it could possibly be taken&#160;&lt;b&gt;with&lt;/b&gt;&#160;the &lt;tt&gt;LAYOUT&lt;/tt&gt; lock bit?&#160; We have to be careful about ordering/pairing issues with the &lt;tt&gt;LAYOUT&lt;/tt&gt; and &lt;tt&gt;APPEND&lt;/tt&gt; bits, I have not thought about this carefully yet.)&lt;/p&gt;

&lt;p&gt;Note that excluding &lt;tt&gt;O_APPEND&lt;/tt&gt; writes does require excluding multiple &lt;tt&gt;O_APPEND&lt;/tt&gt; writes on the same node as well.&#160; This can be done using the local tree_lock in the write path, locking it from 0 to EOF.&lt;/p&gt;

&lt;p&gt;This combination of things should allow not instantiating the full layout and locking every object.&#160; It&apos;s a fair bit of work.&lt;/p&gt;

&lt;p&gt;Note of course that this is a split client/server solution, so it will need a compatibility flag so the client knows it can use the &lt;tt&gt;O_APPEND&lt;/tt&gt; flag.&#160; The good news is that this should interop safely with older clients - The older clients simply instantiate and lock&#160;&lt;b&gt;everything&lt;/b&gt;&#160;for &lt;tt&gt;O_APPEND&lt;/tt&gt;, which will give the correct exclusion vs newer clients.&lt;/p&gt;</description>
                <environment></environment>
        <key id="56863">LU-12738</key>
            <summary>PFL: append of PFL file should not instantiate full layout</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="pfarrell">Patrick Farrell</reporter>
                        <labels>
                    </labels>
                <created>Mon, 9 Sep 2019 19:19:36 +0000</created>
                <updated>Mon, 8 Jan 2024 20:13:13 +0000</updated>
                                                                                <due></due>
                            <votes>0</votes>
                                    <watches>2</watches>
                                                                            <comments>
                            <comment id="337746" author="peggy" created="Tue, 14 Jun 2022 17:52:58 +0000"  >&lt;p&gt;Any activity or update on this issue?&lt;br/&gt;
I&apos;ve seen recent cases of small files showing the full PFL layout, where 100+ empty OST objects are allocated.&#160; Not sure yet whether this append issue is the problem, but it seems more serious than minor to have hundreds of ost inodes consumed for no apparent reason.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="45980">LU-9479</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="45499">LU-9341</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="79878">LU-17403</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00mgv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>