<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:15:09 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-1271] client evicted by ost during simul truncate test</title>
                <link>https://jira.whamcloud.com/browse/LU-1271</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;During simul test #36 truncate individual mode, I removed the test directory&lt;br/&gt;
Simul was failed as expected but a few clients were evicted by ost&lt;/p&gt;

&lt;p&gt;Mar 29 14:41:14 ehyperion-dit34 kernel: LustreError: 0:0:(ldlm_lockd.c:357:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 192.168.114.15@o2ib1  ns: filter-lustre-OST002f_UUI&lt;br/&gt;
D lock: ffff880637e41240/0x841523d0d78a63ce lrc: 3/0,0 mode: PW/PW res: 1408564/0 rrc: 2 type: EXT &lt;span class=&quot;error&quot;&gt;&amp;#91;0-&amp;gt;18446744073709551615&amp;#93;&lt;/span&gt; (req 0-&amp;gt;18446744073709551615) flags: 0x10020 remote: 0xcfdc2859f55f67fa expref: 4 pid: 209&lt;br/&gt;
65 timeout 4307447235&lt;br/&gt;
Mar 29 14:41:14 ehyperion-dit34 kernel: LustreError: 0:0:(ldlm_lockd.c:357:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 192.168.114.22@o2ib1  ns: filter-lustre-OST001f_UUID lock: ffff8805eb75bb40/0x841523d0d78a63f1 lrc: 3/0,0 mode: PW/PW res: 1406133/0 rrc: 2 type: EXT &lt;span class=&quot;error&quot;&gt;&amp;#91;0-&amp;gt;18446744073709551615&amp;#93;&lt;/span&gt; (req 0-&amp;gt;18446744073709551615) flags: 0x10020 remote: 0x5d5b157fb75724f0 expref: 4 pid: 209&lt;br/&gt;
50 timeout 4307447236&lt;br/&gt;
Mar 29 14:41:14 ehyperion-dit30 kernel: LustreError: 0:0:(ldlm_lockd.c:357:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 192.168.115.128@o2ib1  ns: filter-lustre-OST0032_UU&lt;br/&gt;
ID lock: ffff88060c42db40/0x545dd15c52687038 lrc: 3/0,0 mode: PW/PW res: 1443808/0 rrc: 2 type: EXT &lt;span class=&quot;error&quot;&gt;&amp;#91;0-&amp;gt;18446744073709551615&amp;#93;&lt;/span&gt; (req 0-&amp;gt;18446744073709551615) flags: 0x10020 remote: 0x96951ee279363733 expref: 4 pid: 20&lt;br/&gt;
778 timeout 4307432200&lt;br/&gt;
Mar 29 14:41:16 ehyperion305 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.127.65@o2ib1. The obd_ping operation failed with -107&lt;br/&gt;
Mar 29 14:41:16 ehyperion305 kernel: LustreError: Skipped 35 previous similar messages&lt;br/&gt;
Mar 29 14:41:16 ehyperion305 kernel: Lustre: lustre-OST001f-osc-ffff880228f4ac00: Connection to service lustre-OST001f via nid 192.168.127.65@o2ib1 was lost; in progress operations using this service will wait for recovery to complete.&lt;br/&gt;
Mar 29 14:41:16 ehyperion305 kernel: Lustre: Skipped 9 previous similar messages&lt;br/&gt;
Mar 29 14:41:16 ehyperion305 kernel: LustreError: 167-0: This client was evicted by lustre-OST001f; in progress operations using this service will fail.&lt;br/&gt;
Mar 29 14:41:16 ehyperion305 kernel: Lustre: lustre-OST001f-osc-ffff880228f4ac00: Connection restored to service lustre-OST001f using nid 192.168.127.65@o2ib1.&lt;br/&gt;
Mar 29 14:41:16 ehyperion305 kernel: Lustre: Skipped 61 previous similar messages&lt;br/&gt;
Mar 29 14:41:29 ehyperion-dit30 kernel: LustreError: 20822:0:(ldlm_lib.c:2239:target_send_reply_msg()) @@@ processing error (&lt;del&gt;107)  req@ffff8805f00be400 x1397535667303049/t0(0) o400&lt;/del&gt;&amp;gt;&amp;lt;?&amp;gt;@&amp;lt;?&amp;gt;:0/0 lens 192/0 e 0 to 0 &lt;br/&gt;
dl 1333057350 ref 1 fl Interpret:H/0/ffffffff rc -107/-1&lt;br/&gt;
Mar 29 14:41:29 ehyperion-dit30 kernel: LustreError: 20822:0:(ldlm_lib.c:2239:target_send_reply_msg()) Skipped 10764 previous similar messages&lt;br/&gt;
Mar 29 14:41:29 ehyperion557 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.127.61@o2ib1. The obd_ping operation failed with -107&lt;br/&gt;
Mar 29 14:41:29 ehyperion557 kernel: LustreError: Skipped 16 previous similar messages&lt;br/&gt;
Mar 29 14:41:29 ehyperion557 kernel: Lustre: lustre-OST0032-osc-ffff880210e9a400: Connection to service lustre-OST0032 via nid 192.168.127.61@o2ib1 was lost; in progress operations using this service will wait for r&lt;br/&gt;
ecovery to complete.&lt;br/&gt;
Mar 29 14:41:29 ehyperion557 kernel: Lustre: Skipped 14 previous similar messages&lt;br/&gt;
Mar 29 14:41:29 ehyperion557 kernel: LustreError: 167-0: This client was evicted by lustre-OST0032; in progress operations using this service will fail.&lt;br/&gt;
Mar 29 14:41:29 ehyperion557 kernel: Lustre: lustre-OST0032-osc-ffff880210e9a400: Connection restored to service lustre-OST0032 using nid 192.168.127.61@o2ib1.&lt;br/&gt;
Mar 29 14:41:29 ehyperion557 kernel: Lustre: Skipped 61 previous similar messages&lt;br/&gt;
Mar 29 14:41:29 ehyperion298 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.127.65@o2ib1. The obd_ping operation failed with -107&lt;br/&gt;
Mar 29 14:41:29 ehyperion298 kernel: LustreError: Skipped 15 previous similar messages&lt;br/&gt;
Mar 29 14:41:29 ehyperion298 kernel: Lustre: lustre-OST002f-osc-ffff880226002400: Connection to service lustre-OST002f via nid 192.168.127.65@o2ib1 was lost; in progress operations using this service will wait for r&lt;br/&gt;
ecovery to complete.&lt;br/&gt;
Mar 29 14:41:29 ehyperion298 kernel: Lustre: Skipped 53 previous similar messages&lt;br/&gt;
Mar 29 14:41:29 ehyperion298 kernel: LustreError: 167-0: This client was evicted by lustre-OST002f; in progress operations using this service will fail.&lt;br/&gt;
Mar 29 14:41:29 ehyperion298 kernel: Lustre: lustre-OST002f-osc-ffff880226002400: Connection restored to service lustre-OST002f using nid 192.168.127.65@o2ib1.&lt;br/&gt;
Mar 29 14:41:29 ehyperion298 kernel: Lustre: Skipped 61 previous similar messages&lt;/p&gt;</description>
                <environment>Servers: 2.2.0 RC2&lt;br/&gt;
Clients: 2.2.0 RC2 rhel5 and rhel6&lt;br/&gt;
105 rhel6 clients, 130 rhel5 clients</environment>
        <key id="13796">LU-1271</key>
            <summary>client evicted by ost during simul truncate test</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="green">Oleg Drokin</assignee>
                                    <reporter username="mdiep">Minh Diep</reporter>
                        <labels>
                    </labels>
                <created>Fri, 30 Mar 2012 00:53:56 +0000</created>
                <updated>Mon, 29 May 2017 04:12:16 +0000</updated>
                            <resolved>Mon, 29 May 2017 04:12:16 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>1</watches>
                                                                            <comments>
                            <comment id="33431" author="green" created="Wed, 4 Apr 2012 01:34:09 +0000"  >&lt;p&gt;From the logs we can see there is some sort of a signal delivered to the writing thread while it has a lco kenqueued during truncate.&lt;/p&gt;

&lt;p&gt;As such the syscall is aborted, and the not yet granted lock is placed onto the LRU.&lt;br/&gt;
The lock meanwhile is granted with CBPENDING already set. But since we only check it on lock LRU placement, and the lock is already in LRU, it&apos;s never noticed and eventually the client is evicted for not releasing the lock.&lt;/p&gt;

&lt;p&gt;The straightforward fix is to check if the lock is in the LRU already when we get the completion AST with CBPENDING set and if so, release it right away.&lt;/p&gt;</comment>
                            <comment id="197387" author="adilger" created="Mon, 29 May 2017 04:12:16 +0000"  >&lt;p&gt;Close old ticket.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzw25j:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>10430</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>