<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:08:30 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-7392] client evicted: namespace resource [0x2b9a7de:0x0:0x0].0x0 (ffff8806d80cfcc0) refcount nonzero (1)</title>
                <link>https://jira.whamcloud.com/browse/LU-7392</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Error occurred during soak testing of build &apos;20151104.1&apos; on cluster lola (see &lt;a href=&quot;https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20151104.1&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20151104.1&lt;/a&gt;). MDTs are fromated with ldiskfs and OSTs with zfs as storage backend. DNE is enabled. MDSes are configured in HA failover configuration. &lt;/p&gt;

&lt;p&gt;Sequence of events:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;2015-11-04 18:47:30  &amp;#8211;  mds_restart lola-9 completed&lt;/li&gt;
	&lt;li&gt;2015-11-04 18:50:30  &amp;#8211;  OSS (&lt;tt&gt;lola-5&lt;/tt&gt;) evict client
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;lola-5.log:Nov  4 18:50:30 lola-5 kernel: LustreError: 0:0:(ldlm_lockd.c:342:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 192.168.1.131@o2ib100  ns: filter-soaked-OST0007_UUID lock: ffff880313f841c0/0x15cebc1506e2a9b5 lrc: 3/0,0 mode: PW/PW res: [0x2b9a7de:0x0:0x0].0x0 rrc: 4 type: EXT [0-&amp;gt;18446744073709551615] (req 0-&amp;gt;18446744073709551615) flags: 0x60000080010020 nid: 192.168.1.131@o2ib100 remote: 0x175f680569401922 expref: 5 pid: 10807 timeout: 4306845817 lvb_type: 0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;similar message exist on all OSS nodes &lt;/p&gt;&lt;/li&gt;
	&lt;li&gt;2015-11-04 18:55:45  &amp;#8211;  client (&lt;tt&gt;lola-31&lt;/tt&gt;) evicted from ost7
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Nov  4 18:50:45 lola-31 kernel: LustreError: 167-0: soaked-Nov  4 18:50:30 -osc-ffff881071e62400: This client w
as evicted by soaked-OST0007; in progress operations using this service will fail.
Nov  4 18:50:45 lola-31 kernel: LustreError: 16677:0:(ldlm_resource.c:887:ldlm_resource_complain()) so
aked-OST0007-osc-ffff881071e62400: namespace resource [0x2b9a7de:0x0:0x0].0x0 (ffff8806d80cfcc0) refco
unt nonzero (1) after lock cleanup; forcing cleanup.
Nov  4 18:50:45 lola-31 kernel: LustreError: 16677:0:(ldlm_resource.c:1502:ldlm_resource_dump()) --- R
esource: [0x2b9a7de:0x0:0x0].0x0 (ffff8806d80cfcc0) refcount = 2
Nov  4 18:50:45 lola-31 kernel: LustreError: 16677:0:(ldlm_resource.c:1505:ldlm_resource_dump()) Grant
ed locks (in reverse order):
Nov  4 18:50:45 lola-31 kernel: LustreError: 16677:0:(ldlm_resource.c:1508:ldlm_resource_dump()) ### #
## ns: soaked-OST0007-osc-ffff881071e62400 lock: ffff880850f12a80/0x175f680569401922 lrc: 3/0,1 mode: 
PW/PW res: [0x2b9a7de:0x0:0x0].0x0 rrc: 2 type: EXT [0-&amp;gt;18446744073709551615] (req 0-&amp;gt;1844674407370955
1615) flags: 0x526480000000 nid: local remote: 0x15cebc1506e2a9b5 expref: -99 pid: 15402 timeout: 0 lv
b_type: 1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;&lt;/li&gt;
	&lt;li&gt;till 2015-11-04 18:59:05  ost
{8, a, b, c}
&lt;p&gt; are evicted with same error messages on client and OSSes&lt;/p&gt;&lt;/li&gt;
	&lt;li&gt;2015-11-05  &amp;#8211;  client (&lt;tt&gt;lola-31&lt;/tt&gt;) osc stay in state DISCONN, EVICTED for the OSTs affected (see&lt;br/&gt;
file &apos;evicted-client.txt.bz2)&lt;/li&gt;
	&lt;li&gt;client node is unusable and all jobs crashed&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;The problem might be related to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2067&quot; title=&quot;ldlm_resource_complain()) Namespace MGC resource refcount nonzero after lock cleanup&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2067&quot;&gt;&lt;del&gt;LU-2067&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Attached files:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;OSSes (&lt;tt&gt;lola-&lt;span class=&quot;error&quot;&gt;&amp;#91;2-5&amp;#93;&lt;/span&gt;&lt;/tt&gt;: messages, console log files&lt;/li&gt;
	&lt;li&gt;client &lt;tt&gt;lola-31&lt;/tt&gt;: messages, console log files, &apos;lctl &lt;b&gt;.&lt;/b&gt;.state* - output&lt;/li&gt;
&lt;/ul&gt;
</description>
                <environment>lola&lt;br/&gt;
build: 2.7.62-28-g0754bc8, 0754bc8f2623bea184111af216f7567608db35b6; soakbuild &amp;#39;20151104.1&amp;#39;</environment>
        <key id="33013">LU-7392</key>
            <summary>client evicted: namespace resource [0x2b9a7de:0x0:0x0].0x0 (ffff8806d80cfcc0) refcount nonzero (1)</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="6" iconUrl="https://jira.whamcloud.com/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="2">Won&apos;t Fix</resolution>
                                        <assignee username="jay">Jinshan Xiong</assignee>
                                    <reporter username="heckes">Frank Heckes</reporter>
                        <labels>
                            <label>soak</label>
                    </labels>
                <created>Thu, 5 Nov 2015 14:51:28 +0000</created>
                <updated>Thu, 8 Feb 2018 18:33:59 +0000</updated>
                            <resolved>Thu, 8 Feb 2018 18:33:59 +0000</resolved>
                                    <version>Lustre 2.8.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="132770" author="jgmitter" created="Thu, 5 Nov 2015 18:35:18 +0000"  >&lt;p&gt;Hi Jinshan,&lt;br/&gt;
Can you please have a look at this issue?&lt;br/&gt;
Thanks.&lt;br/&gt;
Joe&lt;/p&gt;</comment>
                            <comment id="132777" author="adilger" created="Thu, 5 Nov 2015 18:51:01 +0000"  >&lt;p&gt;Peter thinks this may related to the patch &lt;a href=&quot;http://review.whamcloud.com/15127&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/15127&lt;/a&gt; &quot;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4865&quot; title=&quot;osd-zfs: increase object block size dynamically as object grows&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4865&quot;&gt;&lt;del&gt;LU-4865&lt;/del&gt;&lt;/a&gt; zfs: grow block size by write pattern&quot; which was shown to be causing problems in 2.7.1 testing and was reverted.  I&apos;ve submitted a reversion patch against master for testing to see if the problem goes away:&lt;br/&gt;
&lt;a href=&quot;https://review.whamcloud.com/17053&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/17053&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="134728" author="heckes" created="Mon, 30 Nov 2015 16:28:00 +0000"  >&lt;p&gt;Sorry, for the delay. I&apos;ll take care to include the patch in the next soak build.&lt;/p&gt;</comment>
                            <comment id="138341" author="heckes" created="Fri, 8 Jan 2016 17:18:31 +0000"  >&lt;p&gt;This error also occured for build &apos;20160106&apos; (see &lt;a href=&quot;https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160106&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160106&lt;/a&gt;) again:&lt;br/&gt;
&lt;b&gt;client&lt;/b&gt;:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;lola-26.log:Jan  7 19:52:06 lola-26 kernel: LustreError: 167-0: soaked-OST000f-osc-ffff8808301ba000: This client was evicted by soaked-OST000f; in progress operations using this service will fail.
lola-26.log:Jan  7 19:52:06 lola-26 kernel: LustreError: 130892:0:(ldlm_resource.c:887:ldlm_resource_complain()) soaked-OST000f-osc-ffff8808301ba000: namespace resource [0x7c0000401:0x18fbb85:0x0].0x0 (ffff88006f2206c0) refcount nonzero (2) after lock cleanup; forcing cleanup.
lola-26.log:Jan  7 19:52:06 lola-26 kernel: LustreError: 130892:0:(ldlm_resource.c:1502:ldlm_resource_dump()) --- Resource: [0x7c0000401:0x18fbb85:0x0].0x0 (ffff88006f2206c0) refcount = 3
lola-26.log:Jan  7 19:52:06 lola-26 kernel: LustreError: 130892:0:(ldlm_resource.c:1505:ldlm_resource_dump()) Granted locks (in reverse order):
lola-26.log:Jan  7 19:52:06 lola-26 kernel: LustreError: 130892:0:(ldlm_resource.c:1508:ldlm_resource_dump()) ### ### ns: soaked-OST000f-osc-ffff8808301ba000 lock: ffff880366080940/0xedab12f62583edad lrc: 3/0,1 mode: PW/PW res: [0x7c0000401:0x18fbb85:0x0].0x0 rrc: 3 type: EXT [0-&amp;gt;18446744073709551615] (req 0-&amp;gt;18446744073709551615) flags: 0x526400000000 nid: local remote: 0x6dd9a1f2125effbd expref: -99 pid: 129953 timeout: 0 lvb_type: 1
lola-26.log:Jan  7 19:52:06 lola-26 kernel: LustreError: 130892:0:(ldlm_resource.c:1523:ldlm_resource_dump()) Waiting locks:
lola-26.log:Jan  7 19:52:06 lola-26 kernel: LustreError: 130892:0:(ldlm_resource.c:1525:ldlm_resource_dump()) ### ### ns: soaked-OST000f-osc-ffff8808301ba000 lock: ffff880a0b5668c0/0xedab12f62583edbb lrc: 4/0,1 mode: --/PW res: [0x7c0000401:0x18fbb85:0x0].0x0 rrc: 3 type: EXT [0-&amp;gt;18446744073709551615] (req 0-&amp;gt;18446744073709551615) flags: 0x106400020000 nid: local remote: 0x6dd9a1f2125effc4 expref: -99 pid: 129954 timeout: 0 lvb_type: 1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;b&gt;OSS&lt;/b&gt;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;lola-5.log:Jan  7 19:51:46 lola-5 kernel: LustreError: 0:0:(ldlm_lockd.c:342:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 192.168.1.126@o2ib100  ns: filter-soaked-OST000f_UUID lock: ffff880341ac0300/0x6dd9a1f2125effbd lrc: 3/0,0 mode: PW/PW res: [0x7c0000401:0x18fbb85:0x0].0x0 rrc: 2 type: EXT [0-&amp;gt;18446744073709551615] (req 0-&amp;gt;18446744073709551615) flags: 0x60000000000020 nid: 192.168.1.126@o2ib100 remote: 0xedab12f62583edad expref: 6 pid: 18388 timeout: 4385304734 lvb_type: 0
lola-5.log:Jan  7 19:51:46 lola-5 kernel: LustreError: 17976:0:(client.c:1130:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff8802b92190c0 x1522673517776812/t0(0) o105-&amp;gt;soaked-OST000f@192.168.1.126@o2ib100:15/16 lens 360/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
lola-5.log:Jan  7 19:51:46 lola-5 kernel: LustreError: 17976:0:(ldlm_lockd.c:689:ldlm_handle_ast_error()) ### client (nid 192.168.1.126@o2ib100) failed to reply to completion AST (req status 0 rc -5), evict it ns: filter-soaked-OST000f_UUID lock: ffff8803e925c3c0/0x6dd9a1f2125effc4 lrc: 3/0,0 mode: PW/PW res: [0x7c0000401:0x18fbb85:0x0].0x0 rrc: 3 type: EXT [0-&amp;gt;18446744073709551615] (req 0-&amp;gt;18446744073709551615) flags: 0x40000000020000 nid: 192.168.1.126@o2ib100 remote: 0xedab12f62583edbb expref: 4 pid: 18388 timeout: 0 lvb_type: 0
lola-5.log:Jan  7 19:51:47 lola-5 kernel: LustreError: 17976:0:(ldlm_lockd.c:689:ldlm_handle_ast_error()) Skipped 4 previous similar messages
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="173281" author="ake_s" created="Fri, 11 Nov 2016 12:13:43 +0000"  >&lt;p&gt;Hi!&lt;/p&gt;

&lt;p&gt;Is there any progress on this problem?&lt;/p&gt;

&lt;p&gt;We are getting hit by a problem that judging from the error messages are a good match, except that there is no zfs involved here.&lt;/p&gt;

&lt;p&gt;We&apos;re running 2.8.56 + the fixes for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6808&quot; title=&quot;Interop 2.5.3&amp;lt;-&amp;gt;master sanity test_224c: Bulk IO write error&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6808&quot;&gt;&lt;del&gt;LU-6808&lt;/del&gt;&lt;/a&gt; on the client side and 2.5.41-DDN on the servers.&lt;/p&gt;

&lt;p&gt;(The clients also have the tentative fix for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6854&quot; title=&quot;Setting page_writeback on a non-dirty page&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6854&quot;&gt;&lt;del&gt;LU-6854&lt;/del&gt;&lt;/a&gt; installed at the moment since we seemd to suffer from that specific problem too, but the above evicted problem have been showing up long before that patch was applied)&lt;/p&gt;</comment>
                            <comment id="220476" author="jay" created="Thu, 8 Feb 2018 18:33:59 +0000"  >&lt;p&gt;close old tickets&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                                        </outwardlinks>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="19532" name="console-lola-31.log.bz2" size="43761" author="heckes" created="Thu, 5 Nov 2015 15:51:22 +0000"/>
                            <attachment id="19533" name="evicted-client.txt.bz2" size="869" author="heckes" created="Thu, 5 Nov 2015 15:51:22 +0000"/>
                            <attachment id="19534" name="lola-2.log.bz2" size="66473" author="heckes" created="Thu, 5 Nov 2015 15:51:22 +0000"/>
                            <attachment id="19535" name="lola-3.log.bz2" size="77099" author="heckes" created="Thu, 5 Nov 2015 15:51:22 +0000"/>
                            <attachment id="19536" name="lola-4.log.bz2" size="65620" author="heckes" created="Thu, 5 Nov 2015 15:51:22 +0000"/>
                            <attachment id="19537" name="lola-5.log.bz2" size="67227" author="heckes" created="Thu, 5 Nov 2015 15:51:22 +0000"/>
                            <attachment id="19538" name="messages-lola-2.log.bz2" size="467172" author="heckes" created="Thu, 5 Nov 2015 15:51:22 +0000"/>
                            <attachment id="19539" name="messages-lola-3.log.bz2" size="420598" author="heckes" created="Thu, 5 Nov 2015 15:51:22 +0000"/>
                            <attachment id="19542" name="messages-lola-31.log.bz2" size="222192" author="heckes" created="Thu, 5 Nov 2015 15:51:22 +0000"/>
                            <attachment id="19540" name="messages-lola-4.log.bz2" size="231467" author="heckes" created="Thu, 5 Nov 2015 15:51:22 +0000"/>
                            <attachment id="19541" name="messages-lola-5.log.bz2" size="464002" author="heckes" created="Thu, 5 Nov 2015 15:51:22 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzxsbr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>