<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:35:47 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-3655] Reoccurrence of permanent eviction scenario</title>
                <link>https://jira.whamcloud.com/browse/LU-3655</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;I am afraid we suffer again from the issue described in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2683&quot; title=&quot;Client deadlock in cl_lock_mutex_get&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2683&quot;&gt;&lt;del&gt;LU-2683&lt;/del&gt;&lt;/a&gt; and &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1690&quot; title=&quot;Permanent eviction scenario starting with Lustre 2.1.1&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1690&quot;&gt;&lt;del&gt;LU-1690&lt;/del&gt;&lt;/a&gt;. But this time we are running Lustre 2.1.5, which includes the 4 patches from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-874&quot; title=&quot;Client eviction on lock callback timeout &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-874&quot;&gt;&lt;del&gt;LU-874&lt;/del&gt;&lt;/a&gt;. We also backported patch &lt;a href=&quot;http://review.whamcloud.com/5208&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/5208&lt;/a&gt; from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2683&quot; title=&quot;Client deadlock in cl_lock_mutex_get&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2683&quot;&gt;&lt;del&gt;LU-2683&lt;/del&gt;&lt;/a&gt; in our sources.&lt;/p&gt;

&lt;p&gt;So those 5 patches might not be enough to fix this problem.&lt;/p&gt;

&lt;p&gt;Here is the information collected from the crash:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;crash&amp;gt;dmesg
...
LustreError: 65257:0:(cl_io.c:1702:cl_sync_io_wait()) SYNC IO failed with error: -110, try to cancel 1 remaining pages
LustreError: 65257:0:(cl_io.c:967:cl_io_cancel()) Canceling ongoing page trasmission
...

crash&amp;gt; ps | grep 65257
  65257 2 5 ffff880fe2ac27d0 IN 0.0 0 0 [ldlm_bl_62]
crash&amp;gt; bt 65257
PID: 65257 TASK: ffff880fe2ac27d0 CPU: 5 COMMAND: &quot;ldlm_bl_62&quot;
 #0 [ffff880fe32a7ae0] schedule at ffffffff81484c15
 #1 [ffff880fe32a7ba8] cfs_waitq_wait at ffffffffa055a6de [libcfs]
 #2 [ffff880fe32a7bb8] cl_sync_io_wait at ffffffffa067f3cb [obdclass]
 #3 [ffff880fe32a7c58] cl_io_submit_sync at ffffffffa067f643 [obdclass]
 #4 [ffff880fe32a7cb8] cl_lock_page_out at ffffffffa0676997 [obdclass]
 #5 [ffff880fe32a7d28] osc_lock_flush at ffffffffa0a6abaf [osc]
 #6 [ffff880fe32a7d78] osc_lock_cancel at ffffffffa0a6acbf [osc]
 #7 [ffff880fe32a7dc8] cl_lock_cancel0 at ffffffffa0675575 [obdclass]
 #8 [ffff880fe32a7df8] cl_lock_cancel at ffffffffa067639b [obdclass]
 #9 [ffff880fe32a7e18] osc_ldlm_blocking_ast at ffffffffa0a6bd9a [osc]
#10 [ffff880fe32a7e88] ldlm_handle_bl_callback at ffffffffa07a0293 [ptlrpc]
#11 [ffff880fe32a7eb8] ldlm_bl_thread_main at ffffffffa07a06d1 [ptlrpc]
#12 [ffff880fe32a7f48] kernel_thread at ffffffff8100412a


crash&amp;gt; dmesg | grep &apos;SYNC IO&apos;
LustreError: 3140:0:(cl_io.c:1702:cl_sync_io_wait()) SYNC IO failed with error: -110, try to cancel 1 remaining pages
LustreError: 63611:0:(cl_io.c:1702:cl_sync_io_wait()) SYNC IO failed with error: -110, try to cancel 1 remaining pages
LustreError: 65257:0:(cl_io.c:1702:cl_sync_io_wait()) SYNC IO failed with error: -110, try to cancel 1 remaining pages
LustreError: 65316:0:(cl_io.c:1702:cl_sync_io_wait()) SYNC IO failed with error: -110, try to cancel 1 remaining pages
LustreError: 65235:0:(cl_io.c:1702:cl_sync_io_wait()) SYNC IO failed with error: -110, try to cancel 1 remaining pages
LustreError: 65277:0:(cl_io.c:1702:cl_sync_io_wait()) SYNC IO failed with error: -110, try to cancel 1 remaining pages
LustreError: 63605:0:(cl_io.c:1702:cl_sync_io_wait()) SYNC IO failed with error: -110, try to cancel 1 remaining pages
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Sebastien.&lt;/p&gt;</description>
                <environment></environment>
        <key id="20041">LU-3655</key>
            <summary>Reoccurrence of permanent eviction scenario</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="2">Won&apos;t Fix</resolution>
                                        <assignee username="niu">Niu Yawei</assignee>
                                    <reporter username="sebastien.buisson">Sebastien Buisson</reporter>
                        <labels>
                    </labels>
                <created>Mon, 29 Jul 2013 09:30:27 +0000</created>
                <updated>Tue, 18 Jul 2017 13:05:47 +0000</updated>
                            <resolved>Tue, 18 Jul 2017 13:05:47 +0000</resolved>
                                    <version>Lustre 2.1.5</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>8</watches>
                                                                            <comments>
                            <comment id="63131" author="pjones" created="Mon, 29 Jul 2013 12:48:48 +0000"  >&lt;p&gt;Niu&lt;/p&gt;

&lt;p&gt;Could you please comment on this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="63208" author="niu" created="Tue, 30 Jul 2013 03:12:55 +0000"  >&lt;p&gt;Hi, Sebastien&lt;/p&gt;

&lt;p&gt;Is there any log from OST? Is there any other abnormal messages from client log except the &quot;SYNC IO failed with error: -110 ...&quot; ? Thanks.&lt;/p&gt;</comment>
                            <comment id="63868" author="dmoreno" created="Thu, 8 Aug 2013 12:25:16 +0000"  >&lt;p&gt;2 files in attachment:&lt;/p&gt;

&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;oss.log = log of all OSS at the same time (17h2*)&lt;/li&gt;
	&lt;li&gt;sync_io.log = log on the client at (17h2*)&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;No more information in the log... and nothing on the MDS&lt;/p&gt;</comment>
                            <comment id="63928" author="niu" created="Fri, 9 Aug 2013 02:42:31 +0000"  >&lt;p&gt;There are only few lines of messages in the attached logs, looks line the client has been evicted by OST, so the sync write to the OST failed, but I don&apos;t see why the client was evicted from the log. Maybe there was some network problem between the client and OST?&lt;/p&gt;</comment>
                            <comment id="64958" author="louveta" created="Fri, 23 Aug 2013 15:47:56 +0000"  >&lt;p&gt;Niu,&lt;/p&gt;

&lt;p&gt;Unfortunately, we doesn&apos;t have more info in the server log. The OSS is quiet for hours until we get the lock callback timeout. There is nothing in the OSS syslog before the callback message and the physical network (infiniband) doesn&apos;t show errors.&lt;/p&gt;

&lt;p&gt;note that we are running with Lustre 2.1.5 + some patches that play in the network land :&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;ORNL-22 general ptlrpcd threads pool support&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1144&quot; title=&quot;implement a NUMA aware ptlrpcd binding policy&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1144&quot;&gt;&lt;del&gt;LU-1144&lt;/del&gt;&lt;/a&gt; implement a NUMA aware ptlrpcd binding policy&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1110&quot; title=&quot;MDS Oops in osd_xattr_get() during file open by FID&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1110&quot;&gt;&lt;del&gt;LU-1110&lt;/del&gt;&lt;/a&gt; MDS Oops in osd_xattr_get() during file open by FID&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2613&quot; title=&quot;opening and closing file can generate &amp;#39;unreclaimable slab&amp;#39; space&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2613&quot;&gt;&lt;del&gt;LU-2613&lt;/del&gt;&lt;/a&gt; to much unreclaimable slab space&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2624&quot; title=&quot;Stop of ptlrpcd threads is long&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2624&quot;&gt;&lt;del&gt;LU-2624&lt;/del&gt;&lt;/a&gt; ptlrpc fix thread stop&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2683&quot; title=&quot;Client deadlock in cl_lock_mutex_get&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2683&quot;&gt;&lt;del&gt;LU-2683&lt;/del&gt;&lt;/a&gt; client deadlock in cl_lock_mutex_get&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Alex.&lt;/p&gt;</comment>
                            <comment id="81941" author="louveta" created="Fri, 18 Apr 2014 13:53:22 +0000"  >&lt;p&gt;Back on stage. We are seeing this issue more and more. Any idea of what can be collected ?&lt;/p&gt;

&lt;p&gt;Regards,&lt;/p&gt;</comment>
                            <comment id="82034" author="niu" created="Mon, 21 Apr 2014 01:51:15 +0000"  >&lt;p&gt;I think you should collect log from both client and OSS, and get a full stack-trace on client.&lt;/p&gt;</comment>
                            <comment id="202448" author="niu" created="Tue, 18 Jul 2017 13:05:47 +0000"  >&lt;p&gt;Close old 2.1 issue.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="13336" name="oss.log" size="1008" author="dmoreno" created="Thu, 8 Aug 2013 12:25:16 +0000"/>
                            <attachment id="13335" name="sync_io.log" size="6415" author="dmoreno" created="Thu, 8 Aug 2013 12:25:16 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10490" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>End date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Mon, 21 Apr 2014 09:30:27 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvwe7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9404</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10493" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>Start date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Mon, 29 Jul 2013 09:30:27 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    </customfields>
    </item>
</channel>
</rss>