<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:26:10 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-9435] DNE2 - object placement QoS policy</title>
                <link>https://jira.whamcloud.com/browse/LU-9435</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;In current implementation, when a file is created, the file&apos;s inode must be in the same MDT where the name entry locates. It&apos;s more desirable to allocate MDT objects with the QoS policies like what we do for OST objects.&lt;/p&gt;</description>
                <environment></environment>
        <key id="45823">LU-9435</key>
            <summary>DNE2 - object placement QoS policy</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="2">Won&apos;t Fix</resolution>
                                        <assignee username="laisiyao">Lai Siyao</assignee>
                                    <reporter username="jay">Jinshan Xiong</reporter>
                        <labels>
                            <label>dne3</label>
                    </labels>
                <created>Tue, 2 May 2017 21:21:00 +0000</created>
                <updated>Wed, 6 Mar 2019 13:49:35 +0000</updated>
                            <resolved>Sat, 4 Aug 2018 19:07:43 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="194215" author="adilger" created="Tue, 2 May 2017 23:10:29 +0000"  >&lt;p&gt;I don&apos;t think it is as straight forward as always creating the name on one MDT and allocating the inode on another arbitrary MDT. One of the major problems that would arise is that having remote directory entries for every file would hurt file creation performance, as well as every lookup or unlink of that file in the future. With a remote entry, the client first has to do name-&amp;gt;FID lookup in the parent directory, and then separately do FID-&amp;gt;MDT lookup in the FID Location Database (FLDB, typically very fast since it is compact and cached on the client), and then fetch attributes/layout/xattrs for the FID from the second MDT. This would double the number of RPCs needed to access the majority of files. Keeping the directory entries and inodes on the same MDT is far more efficient for creation, lookup, and deletion.&lt;/p&gt;

&lt;p&gt;Instead, there are several mechanisms that can be used to distribute metadata loads/allocations across MDTs while keeping names/inodes mostly local to a single MDT.&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;&lt;b&gt;automatic MDT selection for striped directories&lt;/b&gt;: if the shards of a DNE2 directory are load balanced across MDTs then the names created in those shards will also be balanced. Currently (AFAIK) the shards are allocated sequentially from the master MDT index unless otherwise specified, which is not ideal:
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;lmv_stripe_count: 2 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     0           [0x200000400:0x2:0x0]
     1           [0x240000401:0x2:0x0]
lmv_stripe_count: 2 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     0           [0x200000400:0x4:0x0]
     1           [0x240000401:0x4:0x0]
lmv_stripe_count: 2 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     0           [0x200000400:0x6:0x0]       
     1           [0x240000401:0x6:0x0]

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This should be relatively easy to implement when striped directories are explicitly created, since all of this is decided on the MDS, and it can do &lt;tt&gt;MDS_STATFS&lt;/tt&gt; RPCs to the other MDTs (as we already do with OSTs) to select MDTs based on free space, if the number of stripes is less than the number of MDTs.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;&lt;b&gt;automatic MDT selection for remote directories&lt;/b&gt;: is a bit more tricky, since the client specifies the FID for the remote directory, but one possibility is to have &quot;&lt;tt&gt;lfs mkdir&lt;/tt&gt;&quot; get the MDT space usage on the client to decide which MDT to use, if it is not specified by the user. Another alternative would be for the MDS to just ignore the FID supplied by the client, and allocate its own remote directory and return the new FID to the client (this is already handled by clients, in case the file/directory already exists).&lt;/li&gt;
	&lt;li&gt;&lt;b&gt;automatic remote MDT selection for new directories&lt;/b&gt;: once the above MDT selection mechanism exists, it would be possible to automatically create some subset of new directories on remote MDTs in order to balance the load across MDS nodes.&lt;/li&gt;
	&lt;li&gt;&lt;b&gt;automatic restriping of large directories&lt;/b&gt;: this is related to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4684&quot; title=&quot;DNE3: allow migrating DNE striped directory&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4684&quot;&gt;&lt;del&gt;LU-4684&lt;/del&gt;&lt;/a&gt; &quot;allow migrating DNE striped directory&quot;.  Basically, when a directory grows too large (e.g. over 5000 entries), the LMV layout is changed to a striped directory so that it is automatically load balanced across MDS nodes.  Either a PFL-like layout that keeps existing entries in the &quot;master&quot; directory and new entries are inserted into the shards (lower overhead at split time, higher overhead during later lookups), and/or migrating existing entries to the new shards (higher overhead at split, lower overhead during later lookups), or a combination of both (delayed migration from master to shards some arbitrary time after split).  The benefit of automatically sharding large directories is that any subdirectories will also be distributed, and space used by Data-on-MDT objects will also be balanced naturally.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;All of these options allow the majority of entries to remain local to the MDT where the inode is created, while distributing load across MDTs more evenly without user interaction.&lt;/p&gt;</comment>
                            <comment id="194223" author="di.wang" created="Wed, 3 May 2017 00:19:54 +0000"  >&lt;blockquote&gt;
&lt;p&gt;One of the major problems that would arise is that having remote directory entries for every file would hurt file creation performance, as well as every lookup or unlink of that file in the future. With a remote entry, the client first has to do name-&amp;gt;FID lookup in the parent directory, and then separately do FID-&amp;gt;MDT lookup in the FID Location Database (FLDB, typically very fast since it is compact and cached on the client), and then fetch attributes/layout/xattrs for the FID from the second MDT. This would double the number of RPCs needed to access the majority of files.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Indeed, so we only split name-entry and object for the directory (remote directory), that probably means we only do QOS thing for directory creation. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This should be relatively easy to implement when striped directories are explicitly created, since all of this is decided on the MDS, and it can do MDS_STATFS RPCs to the other MDTs (as we already do with OSTs) to select MDTs based on free space, if the number of stripes is less than the number of MDTs.&lt;/p&gt;

&lt;p&gt;Another alternative would be for the MDS to just ignore the FID supplied by the client, and allocate its own remote directory and return the new FID to the client (this is already handled by clients, in case the file/directory already exists).&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;This really makes sense to me, and it probably also means we only need put MD QOS into LOD, (no need in LMV).  which will allow us easily share MDT/OST QOS code.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;automatic restriping of large directories: this is related to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4684&quot; title=&quot;DNE3: allow migrating DNE striped directory&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4684&quot;&gt;&lt;del&gt;LU-4684&lt;/del&gt;&lt;/a&gt; &quot;allow migrating DNE striped directory&quot;.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Even curent migrate tool (rebalance the objects over MDTs) will suit a lot of QOS needs, though it is not automatic. Btw: we also need a ticket for migrating data-on-MDT objects.&lt;/p&gt;





</comment>
                            <comment id="231452" author="adilger" created="Sat, 4 Aug 2018 19:03:50 +0000"  >&lt;p&gt;This is probably best handled at the directory level, rather than making every file remote by default. &lt;/p&gt;</comment>
                            <comment id="231454" author="adilger" created="Sat, 4 Aug 2018 19:07:43 +0000"  >&lt;p&gt;This will be handled via other tickets and per-directory balancing. &lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="49435">LU-10277</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="52903">LU-11213</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="35073">LU-7827</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="51154">LU-10784</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzbpj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>