<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:51:54 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-12359] Remote shared burst buffer PCC on a shared backend fs</title>
                <link>https://jira.whamcloud.com/browse/LU-12359</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;HPC burst buffers are a fast storage layer positioned between the compute engines and the backend storage systems.&lt;/p&gt;

&lt;p&gt;There are two representative burst buffer architectures: remote shared burst buffers and node-local burst buffer. DataWarp and Infinite Memory Engine belong to the former. In the case of remote shared burst buffers, the SSD storage resides in I/O nodes positioned between the compute nodes and the backend storage. Data movement between compute nodes and the burst buffer needs to go through a network. Placing burst buffers in I/O nodes facilitates their independent development, deployment, and maintenance. The&#160;aggregate bandwidth of node local burst buffers grows linearly with the number of compute nodes. Node-local burst buffers&lt;br/&gt;
 also require a scalable metadata management to maintain a global namespace across all nodes.&lt;/p&gt;

&lt;p&gt;RW-PCC provides an elegant way to couple node-local burst buffers with Lustre. The metadata is managed by Lustre and stored on MDTs. Thus, it becomes part of the global Lustre namespace. Moreover, the file data can be migrated from the LPCC cache to the Lustre OSTs via file restores, and it is transparent to the application. Furthermore,we can customize various cache strategies and provide cache isolation according to files&#8217; attributes.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Although the node-local PCC nearly does not occupy any network resrouce when perform data IO, but the capacity of the node-local PCC is limited by the storage media on this client.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;A novel remote shread PCC for Lustre filesystem is proposed, which can be used as a remote shared burst buffer on a shared PCC backend fs. This shared PCC backend fs could be a high speed networked filesystem (i.e. another Lustre storage) using high speed NVMe or SSD while the current Lustre filesystem is minaly using slow speed HDDs.&#160;&lt;/p&gt;

&lt;p&gt;By this way, all Lustre clients can use the shared PCC backend fs with larger capacity. And we can have 4 level storage tires for a single Lustre filesystem&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;OST storage tire&lt;/li&gt;
	&lt;li&gt;Original node-local PCC&lt;/li&gt;
	&lt;li&gt;Remote shared PCC on a shared backend fs&lt;/li&gt;
	&lt;li&gt;Traditional Lustre HSM solution&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;The implementation of remote shared PCC can use the Foundation and framework of current node-local PCC.&lt;/p&gt;

&lt;p&gt;Moreover, Under the remote shared RO-PCC, once a file is attached into the shared PCC backend fs, it can sharely read from PCC by all clients.&lt;/p&gt;

&lt;p&gt;For the remote shared RW-PCC, it works as original, and can only read/write by a single client.&lt;/p&gt;</description>
                <environment></environment>
        <key id="55787">LU-12359</key>
            <summary>Remote shared burst buffer PCC on a shared backend fs</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="qian_wc">Qian Yingjin</assignee>
                                    <reporter username="qian_wc">Qian Yingjin</reporter>
                        <labels>
                    </labels>
                <created>Wed, 29 May 2019 20:54:26 +0000</created>
                <updated>Tue, 25 Aug 2020 22:05:51 +0000</updated>
                                                                                <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="248005" author="pfarrell" created="Wed, 29 May 2019 21:00:55 +0000"  >&lt;p&gt;&quot;A novity remote shread PCC for&quot; what is &quot;novity&quot; supposed to be?&#160; (shread is obviously shared &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&#160;)&lt;/p&gt;</comment>
                            <comment id="248006" author="pfarrell" created="Wed, 29 May 2019 21:01:48 +0000"  >&lt;p&gt;I think I don&apos;t understand the point here - Why is this better than (for example) using Lustre directly to access the back end fs?&lt;/p&gt;</comment>
                            <comment id="248007" author="qian_wc" created="Wed, 29 May 2019 21:09:44 +0000"  >&lt;p&gt;The reason is the limited capacity of current node-local PCC. And I think the remote shared PCC is&#160;compare favourably with other remote shared burst buffer such as IME and&#160;DataWarp, as&#160; it has a unified global name space and can transparently access data, etc, al.&lt;/p&gt;

&lt;p&gt;&quot;&lt;/p&gt;

&lt;p&gt;Thus, it becomes part of the global Lustre namespace. Moreover, the file data can be migrated from the LPCC cache to the Lustre OSTs via file restores, and it is transparent to the application. Furthermore,we can customize various cache strategies and provide cache isolation according to files&#8217; attributes.&lt;/p&gt;

&lt;p&gt;&#160;&quot;&lt;/p&gt;</comment>
                            <comment id="248008" author="pfarrell" created="Wed, 29 May 2019 21:13:15 +0000"  >&lt;p&gt;So one of your examples was using a second Lustre file system, right?&#160; So a shared, global file system - Why not access that file system directly?&lt;/p&gt;

&lt;p&gt;Is the main idea that you could use this to access files stored on that second file system from the namespace of the first Lustre file system?&#160; Hmmmmm!&lt;/p&gt;</comment>
                            <comment id="248009" author="adilger" created="Wed, 29 May 2019 21:32:45 +0000"  >&lt;p&gt;Before any development effort is spent on this, there are several other things that are more useful to work on, such as productizing the WBC feature, CCI, HSM integration into composite layouts, etc.  &lt;/p&gt;</comment>
                            <comment id="248011" author="sihara" created="Wed, 29 May 2019 21:36:54 +0000"  >&lt;p&gt;I have same question of Patrick asked. &quot;remote PCC&quot; should be Lustre on flash devices right? That means client is able to mount it directy and it should be faster than another PCC layer? Also, if there are mixed SSD and HDD OSTs in same Lustre namespace, FLR is more better way of writing data into SSD layer first, then migrate data to HDD OSTs with FLR mirror, no?&lt;/p&gt;</comment>
                            <comment id="248012" author="qian_wc" created="Wed, 29 May 2019 21:37:00 +0000"  >&lt;p&gt;Compared with other remote shared burst buffer (IME and DataWarp, see &lt;a href=&quot;https://www.nersc.gov/users/computational-systems/cori/burst-buffer/burst-buffer/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://www.nersc.gov/users/computational-systems/cori/burst-buffer/burst-buffer/&lt;/a&gt;), I think our remote shared PCC has its advantages:&lt;/p&gt;

&lt;p&gt;1) Unified global namespace&lt;/p&gt;

&lt;p&gt;2) Any high-speed networked filesystem can be used as PCC backend&lt;/p&gt;

&lt;p&gt;3)&#160;The file data can be migrated from the PCC cache to the Lustre OSTs via file restores, and it is transparent to the application. While traditional remote shared burst buffer needs to stage data in /out burst buffer cache.&lt;/p&gt;

&lt;p&gt;4)&#160;we can customize various cache strategies and provide cache isolation according to files&#8217; attributes.&lt;/p&gt;</comment>
                            <comment id="248013" author="qian_wc" created="Wed, 29 May 2019 21:41:37 +0000"  >&lt;p&gt;From my understanding, I don&apos;t think it needs any development effort (at least it works for RW-PCC without open attach feature/ and RO-PCC with/without open attach eanbled), the current PCC can already be used as remote shared burst buffer.&#160;&lt;/p&gt;</comment>
                            <comment id="248014" author="adilger" created="Wed, 29 May 2019 21:50:10 +0000"  >&lt;p&gt;My preference for long-term development in this area is to have mirrors/copies/archives of data integrated with composite layouts.  The way that DAOS is using a foreign layout, and/or moving the HSM xattr into a component to link to copies outside of Lustre seems like the right approach.  This allows multiple archive copies per file, and would unify the tools needed to manage PCC, HSM, and FLR.&lt;/p&gt;

&lt;p&gt;That would also allow, for example, two Lustre filesystems to link to each other&apos;s files (the foreign layout xattr would contain the FID of the remote filesystem copy).  In the normal case, each filesystem would have an &quot;archive&quot; copy in the remote filesystem, and if it wants to make a change locally it would mark the local archive copy as being dirty, set the remote primary copy as stale, modify the file locally (similar to PCC), then &quot;archive&quot; the file back to the remote filesystem.  This would be symmetrical between both filesystems and allow e.g. remote replication where either copy could be updated (though not both copies at the same time).&lt;/p&gt;</comment>
                            <comment id="248017" author="qian_wc" created="Wed, 29 May 2019 22:35:16 +0000"  >&lt;p&gt;&amp;gt;if there are mixed SSD and HDD OSTs in same Lustre namespace, FLR is more better way of writing data into SSD layer first, then migrate data to HDD OSTs with FLR mirror, no?&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Compared with FLR, PCC&#160;can:&lt;/p&gt;

&lt;p&gt;1&#65289;&#160;Transparently restore data from PCC into Lustre OSTs when it hit -ENOSPC&#160; or -EDQUOT error; while FLR on SSD can not tolerate this kind of failures, I think.&lt;/p&gt;

&lt;p&gt;2)customize various cache strategies and provide cache isolation according to files&#8217; attributes.&lt;/p&gt;

&lt;p&gt;i.e. PCC can provide cache isolation mechanisms for administrators to manage how much PCC storage capacity each user/group/project can use.&lt;/p&gt;

&lt;p&gt;FLR can not customize how much SSD space each user/group/project can use.&lt;/p&gt;

&lt;p&gt;Pool-based quota maybe, but not in user/group/project dimension, I think.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Moreover, PCC can implement a job-base quota via poject quota on the PCC backend fs, I think.&lt;/p&gt;

&lt;p&gt;We just need to add a mapping between the job&#160;identifier (i.e. job name) and a dedicated project ID (i.e. 100) of PCC backend fs.&lt;/p&gt;

&lt;p&gt;1) Before the job starts, set this mapping and the project quota enforcement&#160; on the PCC backend fs;&lt;/p&gt;

&lt;p&gt;2) when the job runs, at the time of attaching the file inot PCC, set the project ID (100) to the PCC copy to achive job-base quota.&lt;/p&gt;

&lt;p&gt;3) when the job finishes, unmapping the relation between the job identifier and a project ID (i.e. 100) and remove the project quota enforcement associated with this project ID on the PCC backend fs.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="248021" author="adilger" created="Wed, 29 May 2019 23:56:10 +0000"  >&lt;p&gt;The very significant problem with PCC today is that users will &lt;b&gt;LOSE THEIR DATA&lt;/b&gt; if the client is offline.  Using FLR to keep an &lt;b&gt;extra&lt;/b&gt; copy of the data in the PCC cache is much more usable.  I don&apos;t think FLR and PCC are &lt;em&gt;incompatible&lt;/em&gt; with each other, if we move the PCC/HSM xattr into an FLR component.  The main difference is that FLR can mark one mirror STALE, but keep the data in the OSTs, while HSM has to release the data permanently.  Also, HSM today can only have one archive copy of the data, but if the HSM xattr is moved into a component we could have many copies.&lt;/p&gt;

&lt;p&gt;I don&apos;t think that any of the PCC quota options are incompatible with Lustre providing quotas itself.  Having to manage quota on each PCC node separately/locally would also be complex for the administrator, and not what they want for a distributed filesystem.&lt;/p&gt;

&lt;p&gt;I think the first and most important thing to do in this area is to consolidate the HSM xattr with PFL/FLR/composite layouts to give us the flexibility to combine these features in interesting ways.  Secondly, PCC is still limited to creating one file on the MDS for each local file, so productizing the WBC feature would allow a client to create files at a high speed locally, without an MDS RPC for each file, which is useful for many things.&lt;/p&gt;

&lt;p&gt;Before we add many more complex features to PCC, we also need to get some feedback from users on how it is being used, fix bugs, etc. to know that time spent there is worthwhile compared to other features.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="50588">LU-10606</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00h73:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>