<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:02:06 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-6657] Eviction Notifier</title>
                <link>https://jira.whamcloud.com/browse/LU-6657</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;In suppress ping environment the evicted client is not able to recover from evicted state until the first access to the server which evicted the client. In the situation, the access gets -EIO and immediately return. This may cause user job ends with error termination.&lt;/p&gt;

&lt;p&gt;We can avoid the situation by running &quot;lfs df&quot; before every single operation. But it&apos;s really troublesome and we actually cannot do such a thing.&lt;/p&gt;

&lt;p&gt;Eviction notifier, this patch provides, is one of the solution to the problem. With this function.&lt;br/&gt;
At first, the target(MDT, OST) which evicted a client notifies MGS an eviction event. &lt;br/&gt;
Then MGS send a request to the evicted client. &lt;br/&gt;
Finally, getting the request and the client sends a ping to the target server to find &quot;I&apos;m evicted&quot;.&lt;/p&gt;</description>
                <environment></environment>
        <key id="30408">LU-6657</key>
            <summary>Eviction Notifier</summary>
                <type id="2" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11311&amp;avatarType=issuetype">New Feature</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="nozaki">Hiroya Nozaki</assignee>
                                    <reporter username="nozaki">Hiroya Nozaki</reporter>
                        <labels>
                            <label>patch</label>
                    </labels>
                <created>Thu, 28 May 2015 06:20:23 +0000</created>
                <updated>Thu, 5 Apr 2018 13:44:56 +0000</updated>
                                            <version>Lustre 2.7.0</version>
                    <version>Lustre 2.8.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="116638" author="gerrit" created="Thu, 28 May 2015 06:36:44 +0000"  >&lt;p&gt;Hiroya Nozaki (nozaki.hiroya@jp.fujitsu.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/14987&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/14987&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6657&quot; title=&quot;Eviction Notifier&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6657&quot;&gt;LU-6657&lt;/a&gt; mgs: eviction notifier&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: f16ff2e5cab2179692fdfd380a3b520df3bb71c6&lt;/p&gt;</comment>
                            <comment id="116910" author="morrone" created="Fri, 29 May 2015 20:21:43 +0000"  >&lt;p&gt;This sounds like a pretty major protocol change to Lustre, and I think we need to have more discussion about whether this is a reasonable approach.&lt;/p&gt;

&lt;p&gt;I believe that the current high level design dictates that when a client performs an operation and discovers that it has been evicted, it should reconnect and resend the operation.  So in places where the client does not currently do that, can we not simply fix the client?&lt;/p&gt;</comment>
                            <comment id="117373" author="nozaki" created="Thu, 4 Jun 2015 00:54:47 +0000"  >&lt;p&gt;There are lots of codes derived from the eviction mechanism on the client-side like if-statement checking exp_failed. So I think it takes piles of time to see if whether or not the replacement is perfectly completed. Which is why I thought I shouldn&apos;t have touch this now and created a new logic on the eviction mechanism. &lt;/p&gt;

&lt;p&gt;This feature works independently of the other features though the code is dependent on fsdb. And we can disable it if we like. so I think getting this feature is more reasonable than thinking up and confirming the non-eviciton logic on the client side.  &lt;/p&gt;</comment>
                            <comment id="117374" author="morrone" created="Thu, 4 Jun 2015 01:09:14 +0000"  >&lt;p&gt;Yes, but we avoided having eviction notifications for good reason: evictions normally occur because we are unable to talk to the client.  Adding new communication for a client with which we are unable to communicate seems like a less than desirable design decision.  While I can imagine situations where that would work, I can also imagine situations where the added communication causes more harm then good.&lt;/p&gt;

&lt;p&gt;Also, f I understand your solution you have not really solved the underlying problem, you have merely shrunk the window in which the problem can occur.  Since the notification goes sideways through the MGS, there is still a window in which that is happening when the client can reconnect to the server and still get an error.&lt;/p&gt;

&lt;p&gt;But fixing the client bugs would fix the problem completely, would it not?&lt;/p&gt;</comment>
                            <comment id="117395" author="nozaki" created="Thu, 4 Jun 2015 05:30:31 +0000"  >&lt;p&gt;I cannot help answering &quot;yes&quot; to the question. We should think and get a fundamental solution, though I think this feature is a kinda reasonably cheap workaround, at least now, when using suppress ping.&lt;/p&gt;

&lt;p&gt;Considering bulk-I/O, can we simply resend an request or should we carefully examine which operations can be resend depending on an each situation ? &lt;/p&gt;</comment>
                            <comment id="117475" author="morrone" created="Thu, 4 Jun 2015 18:00:09 +0000"  >&lt;p&gt;Generally speaking, no, we would not retry bulk IO after an eviction.  If a client already has an open file handle when the eviction occurs, then any currently under way or future operations on that file should receive an error.  If a client is able to reconnect after eviction without a reboot of the node, even though the client might seem like the same client to us, the client is a completely new client instance from the servers&apos; perspective.  All previous state was lost.&lt;/p&gt;

&lt;p&gt;The eviction notifier approach does not help with that particular issue.&lt;/p&gt;

&lt;p&gt;I think the best path forward is for you to open tickets on exactly what operations were failing for you so we work on fixing them.&lt;/p&gt;</comment>
                            <comment id="117527" author="nozaki" created="Fri, 5 Jun 2015 01:03:30 +0000"  >&lt;p&gt;hmm ... under suppress ping environment clients are left evicted state since an eviction event and we don&apos;t want to leave clients evicted until the first access since the event.  That&apos;s why there&apos;s no particular target case. Fujitsu is expected to reduce the number of error which the end users get from eviction.&lt;/p&gt;</comment>
                            <comment id="117611" author="morrone" created="Fri, 5 Jun 2015 18:53:49 +0000"  >&lt;p&gt;There are a finite number of operations that a client will do after being evicted and idle for some time.  Just walk through them and figure out which work and which do not.&lt;/p&gt;

&lt;p&gt;For instance, I would hope that an open() call would work after eviction.  Hopefully when the open() fails, the client reconnects and retries the open(), and the application is none the wiser that this occurred.  Is that the case?&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzxee7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>