<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:00:02 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-6416] Client evicted on lock cancel</title>
                <link>https://jira.whamcloud.com/browse/LU-6416</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Running iorssf (job 33751) on 3 client nodes, one of the clients got evicted. The IB fabric seems to be sane (no downlink, symbolerrors,...) for ports in question. No mds restart or failover operation was ongoing at the time the error occured.&lt;br/&gt;
Client error&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Jan 22 20:12:35 lola-24 kernel: LustreError: 11-0: soaked-OST000f-osc-ffff880819252c00: Communicating with 192.1
68.1.105@o2ib10, operation obd_ping failed with -107.
Jan 22 20:12:35 lola-24 kernel: Lustre: soaked-OST000f-osc-ffff880819252c00: Connection to soaked-OST000f (at 19
2.168.1.105@o2ib10) was lost; in progress operations using this service will wait for recovery to complete
Jan 22 20:12:35 lola-24 kernel: Lustre: Skipped 1 previous similar message
Jan 22 20:12:35 lola-24 kernel: LustreError: 167-0: soaked-OST000f-osc-ffff880819252c00: This client was evicted
 by soaked-OST000f; in progress operations using this service will fail.
Jan 22 20:12:35 lola-24 kernel: LustreError: 163903:0:(osc_lock.c:832:osc_ldlm_completion_ast()) lock@ffff880650
352228[3 3 0 1 1 00000000] W(2):[8702328, 18446744073709551615]@[0x1000f0000:0x1514518:0x0] {
Jan 22 20:12:35 lola-24 kernel: LustreError: 163903:0:(osc_lock.c:832:osc_ldlm_completion_ast())     lovsub@ffff
8804b7635460: [0 ffff88067225d470 R(1):[8702328, 18446744073709551615]@[0x60002c323:0x96d3:0x0]] 
Jan 22 20:12:35 lola-24 kernel: LustreError: 163903:0:(osc_lock.c:832:osc_ldlm_completion_ast())     osc@ffff880
4fdc1dce8: ffff8803bf23e680 0x20040000000 0x5ec022e3dfcd440e 3 ffff88073a4476f8 size: 53278838784 mtime: 1421986
119 atime: 1421985489 ctime: 1421986114 blocks: 102256017
Jan 22 20:12:35 lola-24 kernel: LustreError: 163903:0:(osc_lock.c:832:osc_ldlm_completion_ast()) } lock@ffff8806
50352228
Jan 22 20:12:35 lola-24 kernel: LustreError: 163903:0:(osc_lock.c:832:osc_ldlm_completion_ast()) dlmlock returne
d -5
Jan 22 20:12:35 lola-24 kernel: LustreError: 163903:0:(ldlm_resource.c:809:ldlm_resource_complain()) soaked-OST0
00f-osc-ffff880819252c00: namespace resource [0x1514518:0x0:0x0].0 (ffff8802cef43d00) refcount nonzero (1) after
 lock cleanup; forcing cleanup.
Jan 22 20:12:35 lola-24 kernel: LustreError: 163903:0:(ldlm_resource.c:1448:ldlm_resource_dump()) --- Resource: 
[0x1514518:0x0:0x0].0 (ffff8802cef43d00) refcount = 2
Jan 22 20:12:35 lola-24 kernel: LustreError: 163903:0:(ldlm_resource.c:1451:ldlm_resource_dump()) Granted locks 
(in reverse order):
Jan 22 20:12:35 lola-24 kernel: LustreError: 163903:0:(ldlm_resource.c:1454:ldlm_resource_dump()) ### ### ns: so
aked-OST000f-osc-ffff880819252c00 lock: ffff8803bf23e680/0x5ec022e3dfcd440e lrc: 3/0,0 mode: PW/PW res: [0x15145
18:0x0:0x0].0 rrc: 2 type: EXT [35644735488-&amp;gt;18446744073709551615] (req 35644735488-&amp;gt;18446744073709551615) flags
: 0x126400000000 nid: local remote: 0xe42bc2b426f6efd7 expref: -99 pid: 163262 timeout: 0 lvb_type: 1
Jan 22 20:12:35 lola-24 kernel: LustreError: 163262:0:(cl_lock.c:1422:cl_unuse_try()) result = -5, this is unlik
ely!
Jan 22 20:12:35 lola-24 kernel: Lustre: soaked-OST000f-osc-ffff880819252c00: Connection restored to soaked-OST00
0f (at 192.168.1.105@o2ib10)
Jan 22 20:12:35 lola-24 kernel: LustreError: 163262:0:(cl_lock.c:1437:cl_unuse_locked()) lock@ffff880650352660[2
 0 0 1 0 00000000] R(1):[8702328, 18446744073709551615]@[0x60002c323:0x96d3:0x0] {
Jan 22 20:12:35 lola-24 kernel: LustreError: 163262:0:(cl_lock.c:1437:cl_unuse_locked())     vvp@ffff8804e2282c4
0: 
Jan 22 20:12:35 lola-24 kernel: LustreError: 163262:0:(cl_lock.c:1437:cl_unuse_locked())     lov@ffff88067225d47
0: 1
Jan 22 20:12:35 lola-24 kernel: LustreError: 163262:0:(cl_lock.c:1437:cl_unuse_locked())     0 0: ---
Jan 22 20:12:35 lola-24 kernel: LustreError: 163262:0:(cl_lock.c:1437:cl_unuse_locked()) 
Jan 22 20:12:35 lola-24 kernel: LustreError: 163262:0:(cl_lock.c:1437:cl_unuse_locked()) } lock@ffff880650352660
Jan 22 20:12:35 lola-24 kernel: LustreError: 163262:0:(cl_lock.c:1437:cl_unuse_locked()) unuse return -5
Jan 22 20:12:35 lola-24 slurmd[lola-24][163257]: error: *** STEP 33751.0 KILLED AT 2015-01-22T20:12:35 WITH SIGNAL 9 ***
Jan 22 20:12:35 lola-24 slurmd[lola-24][163257]: error: *** STEP 33751.0 KILLED AT 2015-01-22T20:12:35 WITH SIGNAL 9 ***
Jan 22 20:12:35 lola-24 slurmd[lola-24][163257]: done with job
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;OSS error:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Jan 22 20:12:31 lola-5 kernel: LustreError: 0:0:(ldlm_lockd.c:344:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 192.168.1.124@o2ib100  ns: filter-soaked-OST000f_UUID lock: ffff88073a776480/0xe42bc2b426f6ee2c lrc: 3/0,0 mode: PR/PR res: [0x1514518:0x0:0x0].0 rrc: 7 type: EXT [35519225856-&amp;gt;35644735487] (req 35519225856-&amp;gt;35550605311) flags: 0x60000000000020 nid: 192.168.1.124@o2ib100 remote: 0x5ec022e3dfcd4407 expref: 32 pid: 103043 timeout: 5797977767 lvb_type: 0
Jan 22 20:14:10 lola-5 kernel: LustreError: 93964:0:(ost_handler.c:1775:ost_blocking_ast()) Error -2 syncing data on lock cancel
Jan 22 20:14:12 lola-5 kernel: LNet: Service thread pid 102689 completed after 319.51s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
Jan 22 20:14:12 lola-5 kernel: LNet: Skipped 3 previous similar messages
Jan 22 20:14:55 lola-5 kernel: LNet: Service thread pid 102685 completed after 362.60s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
Jan 22 20:14:55 lola-5 kernel: LNet: Skipped 1 previous similar message
Jan 22 20:14:59 lola-5 kernel: LustreError: 93958:0:(ldlm_lib.c:2679:target_bulk_io()) @@@ bulk PUT failed: rc -107  req@ffff8807306bc800 x1490477310356324/t0(0) o3-&amp;gt;76d86919-2338-4877-542c-9e7f34e71591@192.168.1.124@o2ib100:0/0 lens 488/432 e 8 to 0 dl 1421986372 ref 1 fl Interpret:/0/0 rc 0/0
Jan 22 20:14:59 lola-5 kernel: Lustre: soaked-OST000f: Bulk IO read error with 76d86919-2338-4877-542c-9e7f34e71591 (at 192.168.1.124@o2ib100), client will retry: rc -107
Jan 22 20:14:59 lola-5 kernel: Lustre: Skipped 1 previous similar message
Jan 22 20:14:59 lola-5 kernel: Lustre: 102688:0:(service.c:2045:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (243:127s); client may timeout.  req@ffff880271b8ac00 x1490477310356328/t0(0) o3-&amp;gt;76d86919-2338-4877-542c-9e7f34e71591@192.168.1.124@o2ib100:0/0 lens 488/432 e 8 to 0 dl 1421986372 ref 1 fl Complete:/0/ffffffff rc 0/-1
Jan 22 20:14:59 lola-5 kernel: Lustre: 102688:0:(service.c:2045:ptlrpc_server_handle_request()) Skipped 1 previous similar message
Jan 22 20:14:59 lola-5 kernel: LustreError: 93958:0:(ldlm_lib.c:2679:target_bulk_io()) Skipped 2 previous similar messages
Jan 22 20:18:22 lola-5 kernel: LustreError: 94003:0:(ost_handler.c:1775:ost_blocking_ast()) Error -2 syncing data on lock cancel
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment></environment>
        <key id="29336">LU-6416</key>
            <summary>Client evicted on lock cancel</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="liang">Liang Zhen</assignee>
                                    <reporter username="green">Oleg Drokin</reporter>
                        <labels>
                            <label>bgti</label>
                    </labels>
                <created>Tue, 31 Mar 2015 15:36:59 +0000</created>
                <updated>Thu, 5 May 2016 19:55:27 +0000</updated>
                            <resolved>Tue, 31 Mar 2015 20:29:34 +0000</resolved>
                                                    <fixVersion>Lustre 2.8.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="111110" author="green" created="Tue, 31 Mar 2015 15:37:39 +0000"  >&lt;p&gt;Liang&apos;s analysis:&lt;br/&gt;
I suspect it&apos;s a race case like this:&lt;br/&gt;
If a lock was not granted straight away on server, but it&apos;s granted with LDLM_FL_AST_SENT set before ldlm_handle_enqueue0 sends out reply, client side will know she needs to cancel this lock.&lt;/p&gt;

&lt;p&gt;At the meanwhile, this lock can be added to a long granting list by another server thread.&lt;/p&gt;

&lt;p&gt;When lock cancel request arrives at server and server calls into&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;      ldlm_lock_cancel()-&amp;gt;
          ldlm_cancel_callback()-&amp;gt;
              tgt_blocking_ast(...LDLM_CB_CANCELING)-&amp;gt;
                  tgt_sync()
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The other server thread eventually get a chance to send completion AST for this lock with LDLM_FL_AST_SENT set, and add this lock to waiting list again.&lt;/p&gt;

&lt;p&gt;However, tgt_sync may take arbitrary time which is irrelevant to AT of lock revoke on client, server could evict client only because itself has slow IO.&lt;/p&gt;
</comment>
                            <comment id="111114" author="pjones" created="Tue, 31 Mar 2015 15:52:29 +0000"  >&lt;p&gt;&lt;a href=&quot;http://review.whamcloud.com/#/c/14085/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/14085/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="111159" author="gerrit" created="Tue, 31 Mar 2015 19:41:40 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/14085/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/14085/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6416&quot; title=&quot;Client evicted on lock cancel&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6416&quot;&gt;&lt;del&gt;LU-6416&lt;/del&gt;&lt;/a&gt; ldlm: no canceled lock on waiting list&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 657bbc4969be581aca66549ca0427ccec89ea5a2&lt;/p&gt;</comment>
                            <comment id="111166" author="pjones" created="Tue, 31 Mar 2015 20:29:34 +0000"  >&lt;p&gt;Landed for 2.8&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                                        </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="35236">LU-7860</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzx9tr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>