<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:37:49 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-3891] leases for HSM - some questions</title>
                <link>https://jira.whamcloud.com/browse/LU-3891</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;AFAICS, some lease code landed for HSM needs.&lt;br/&gt;
Unfortunately, leases have the same problems as SOM had in past, related to evictions.&lt;/p&gt;

&lt;p&gt;On eviction, locks are cancelled on MDS and client. However, a new lease may conflict with open files, but after client eviction and later re-connect, client does not re-open files, while they are still opened on the client and it is able to proceed with its IO.&lt;/p&gt;

&lt;p&gt;However, HSM has a layout lock as well, which is supposed to block such new IO.&lt;br/&gt;
do I understand correctly, that lease is &lt;em&gt;always&lt;/em&gt; taken together with an exclusive layout lock? so that all the other clients, even if they were evicted in past, would be blocked on layout lock with their new IO ?&lt;/p&gt;

&lt;p&gt;if not, lease lock gives no guarantee for recently evicted clients.&lt;/p&gt;

&lt;p&gt;The 2nd problem is that the evicted state has a latency being propagated from MDS to client, when client does not know it has connection problems while it is already evicted - could be up to obd_timeout which could be also pretty long.&lt;/p&gt;

&lt;p&gt;layout lock will not help here. The solution could be the same as with SOM - just deny all the HSM releases for X*obd_timeouts period after the last eviction, to be sure clients are aware about their evictions and have cancelled layout locks.&lt;/p&gt;


&lt;p&gt;are these lease lock issues known and somehow resolved?&lt;/p&gt;</description>
                <environment></environment>
        <key id="20800">LU-3891</key>
            <summary>leases for HSM - some questions</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="vitaly_fertman">Vitaly Fertman</reporter>
                        <labels>
                    </labels>
                <created>Thu, 5 Sep 2013 23:42:31 +0000</created>
                <updated>Fri, 2 Dec 2016 23:42:08 +0000</updated>
                                                                                <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="65909" author="jay" created="Fri, 6 Sep 2013 05:29:37 +0000"  >&lt;p&gt;I thought about this problem, heh. The lease implementation is okay for HSM. Check out the code mdt_hsm_release() you will find that the lease_broken is actually checked on the MDT side. So the HSM release operation is performed as follows:&lt;/p&gt;

&lt;p&gt;1. open + lease the file&lt;br/&gt;
2. do some operations on the client&lt;br/&gt;
3. close + release the file, MDT will check if the lease lock is still there otherwise release won&apos;t happen and the operation becomes a pure close.&lt;/p&gt;</comment>
                            <comment id="66174" author="vitaly_fertman" created="Tue, 10 Sep 2013 12:23:26 +0000"  >&lt;p&gt;this does not answer the original question, because the lease lock &lt;b&gt;will be there&lt;/b&gt; but will still guarantee nothing.&lt;/p&gt;

&lt;p&gt;I looked at mdt_hsm_release() and see it takes an exclusive layout lock, what is good as it covers the 1st issue.&lt;br/&gt;
2nd one is still open.&lt;/p&gt;</comment>
                            <comment id="67318" author="jay" created="Tue, 24 Sep 2013 05:37:35 +0000"  >&lt;p&gt;Indeed. The problem is that when the MDT grants an open lease to the release client, an evicted client may still keep writing to the file, so that the file may lose some status after release. But this problem should be minor, because the opening file on the evicted file will be returned with EIO eventually after the release because OST objects have already disappeared.&lt;/p&gt;</comment>
                            <comment id="67446" author="vitaly_fertman" created="Tue, 24 Sep 2013 19:58:50 +0000"  >&lt;p&gt;AFAICS, the following is possible:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;client is evicted&lt;/li&gt;
	&lt;li&gt;copy is made.&lt;/li&gt;
	&lt;li&gt;check for a copy succeeds, under a granted lease&lt;/li&gt;
	&lt;li&gt;IO from evicted client happens&lt;/li&gt;
	&lt;li&gt;release happens, lease is cancelled, no more IO errors&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;whereas the window size is relatively small between the check and release, it seems still possible, and it will lead to data loss&lt;/p&gt;</comment>
                            <comment id="176305" author="vitaly_fertman" created="Fri, 2 Dec 2016 23:42:08 +0000"  >&lt;p&gt;summarising the current state of the HSM locking, the main question is if the copy is valid after the release, the whole logic can be viewed starting from ll_hsm_release:&lt;/p&gt;

&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;take a lease, blocks new opens&lt;/li&gt;
	&lt;li&gt;get data version from OST&lt;br/&gt;
&#8212;&#8212; flush all the cached data from clients&lt;/li&gt;
	&lt;li&gt;mdt_hsm_release is initiated for this version&lt;br/&gt;
&#8212;&#8212; compare this version &amp;amp; the archive version&lt;br/&gt;
&#8212;&#8212; check the lease exists - skip release if this client was evicted&lt;br/&gt;
&#8212;&#8212; MDS_INODELOCK_LAYOUT EX - just a protection for the layout change, nothing about IO here (a client may be not informed about its eviction yet and may still operate under its previous layout lock);&lt;/li&gt;
	&lt;li&gt;cancel lease&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;also, the release happens ~2weeks later or more after the last access.&lt;/p&gt;

&lt;p&gt;therefore, everything is protected if:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;no eviction happens;&lt;/li&gt;
	&lt;li&gt;an open happened after an eviction;&lt;/li&gt;
	&lt;li&gt;IO happened before the version check or after the file release;&lt;/li&gt;
	&lt;li&gt;IO tried to happen after the release had taken the layout lock and the client was informed about sits eviction;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;however it is not protected even with the 2 weeks delay if a new IO and an eviction have happened just before the release and:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;lockless IO / punch / enqueue + IO happens between the data version check and the release;&lt;/li&gt;
	&lt;li&gt;the same even during the release itself as the client may be not informed about its eviction thus may still operate under its previous layout lock;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;possible improvements could be:&lt;br/&gt;
1.  move the data version check under the layout lock;&lt;br/&gt;
2. never release during at_max after a client eviction or MDS failover completion, so that the client is informed about its eviction and would not initiate a new IO without a new layout lock (has no effect without (1) as IO may happen just between the version check and the release);&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzw0fb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>10148</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>