<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:53:46 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-5703] Quiesce client mountpoints from the server</title>
                <link>https://jira.whamcloud.com/browse/LU-5703</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;In order to minimize user disruptions NASA performs some  system maintenance &quot;Live&quot;. Typical maintance includes activities such as adding new compute node or reconfigurations of IB fabric. During such times users jobs are suspend via pbs. Although we are able to suspend user job, which does minimize usage of lustre, it does not stop all lustre client/server activity. Therefore NASA requires:&lt;br/&gt;
    1. mechanism to halt and block all lustre client IO.&lt;br/&gt;
    2. Halt client/server keep alive ping and all other network traffic.&lt;br/&gt;
    3. Clients should be able to recover after the quiesce without eviction.&lt;/p&gt;</description>
                <environment></environment>
        <key id="26823">LU-5703</key>
            <summary>Quiesce client mountpoints from the server</summary>
                <type id="2" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11311&amp;avatarType=issuetype">New Feature</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="pjones">Peter Jones</assignee>
                                    <reporter username="mhanafi">Mahmoud Hanafi</reporter>
                        <labels>
                    </labels>
                <created>Thu, 2 Oct 2014 17:13:49 +0000</created>
                <updated>Thu, 18 Nov 2021 22:33:30 +0000</updated>
                                            <version>Lustre 2.4.3</version>
                                                        <due></due>
                            <votes>1</votes>
                                    <watches>14</watches>
                                                                            <comments>
                            <comment id="95567" author="cliffw" created="Thu, 2 Oct 2014 18:46:42 +0000"  >&lt;p&gt;The simplest way to do this would be to umount the clients, Then remount after your maintenance.  That leaves the client in a good state, and you should be able to restart without issue. &lt;br/&gt;
If you are adding new compute nodes, (not servers) that should be completely transparent to all other clients, and should never require any quiescing of Lustre. Clients are very independent of one another. &lt;/p&gt;

&lt;p&gt;If you are changing IB values, it would be best to umount all Lustre, unload the LNET modules and restart. That way you are certain your IB changes would propagate. &lt;/p&gt;

&lt;p&gt;In most cases of Lustre &apos;live&apos; maintenance, any live but idle Lustre machines should cause you no issues. There is no need for a &apos;quiesce&apos; If you need to completely eliminate all Lustre traffic from your network, the quickest and safest way to do this is to simply stop Lustre on the affected nodes. &lt;/p&gt;</comment>
                            <comment id="95575" author="adilger" created="Thu, 2 Oct 2014 20:19:07 +0000"  >&lt;p&gt;Cliff, I expect there may be problems relating to open files on the mounted filesystem that cannot be closed without killing the running application.&lt;/p&gt;

&lt;p&gt;This request sounds a lot like a requirement we had for a feature called &quot;Simplified Interoperability&quot;, which would flush all uncommitted RPCs from the clients and quiesce them in advance of a server shutdown for a Lustre software upgrade, so that we didn&apos;t have to manage recovery/replay of Lustre RPCs across different versions.  This requires work on both the clients (to be able to &quot;detach&quot; themselves from their open files and &quot;reattach&quot; them once the server is restarted), and on the servers to notify the clients of the impending shutdown and to allow the clients to reconnect without evicting them.&lt;/p&gt;</comment>
                            <comment id="98864" author="jeremyenos" created="Tue, 11 Nov 2014 07:58:00 +0000"  >&lt;p&gt;At NCSA, there is a similar need although possibly lighter weight than the Simplified Interoperability feature described, and possibly just items 1 &amp;amp; 3 described by Mahmoud.  &lt;br/&gt;
The specific application I have in mind at the moment is for confirmation benchmarking after an online configuration tuning.  Ideally in this case, all clients would remain actively mounted (and pinging) with existing open files, but would suspend operations beyond that.  A /proc control on the client would tell it whether or not to &quot;suspend&quot; or not, which then leaves the capability to have some clients active (presumably used to execute the regression test).&lt;br/&gt;
A search for this capability landed this ticket as a result- perhaps it&apos;s different enough that I should open a separate RFE?&lt;/p&gt;</comment>
                            <comment id="226844" author="spitzcor" created="Thu, 26 Apr 2018 23:17:37 +0000"  >&lt;p&gt;Is this request for maintenance capabilities about server changes only? &#160;That is, the request isn&apos;t about keeping files open while the Lustre client is changed, correct?&lt;/p&gt;</comment>
                            <comment id="226850" author="jeremyenos" created="Fri, 27 Apr 2018 02:37:38 +0000"  >&lt;p&gt;I think it&apos;s about client and configuration changes first... if server changes were possible too, that&apos;d be outstanding. It is &lt;b&gt;not&lt;/b&gt; about changing config or versions on clients in use by jobs while holding files open.&#160; The idea is to idle the clients in use by jobs so that &lt;em&gt;others&lt;/em&gt; in a test pool could run regression tests after a config or client change, or regression tests on a periodic basis.&lt;/p&gt;

&lt;p&gt;Getting consistent, and therefore meaningful, regression tests w/o a dedicated system is impossible otherwise.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="57637">LU-13078</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="57460">LU-13010</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="10115">LU-18</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="32398">LU-7236</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="18739">LU-3290</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="59063">LU-13521</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="67241">LU-15250</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="34026" name="SC09-Simplified-Interop.pdf" size="375262" author="adilger" created="Fri, 13 Dec 2019 23:37:06 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10490" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>End date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Thu, 26 Feb 2015 17:13:49 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwxnj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>15964</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10493" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>Start date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Thu, 2 Oct 2014 17:13:49 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    </customfields>
    </item>
</channel>
</rss>