<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:47:22 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4963] client eviction during IOR test - lock callback timer expired</title>
                <link>https://jira.whamcloud.com/browse/LU-4963</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;During the performance testing on 64 core AMD nodes we have observed client evictions in IOR.&lt;/p&gt;

&lt;p&gt;Problem occurs in the read phase of the test for 2.4.x and 2.5.x clients &lt;/p&gt;

&lt;p&gt;Message on the server side:&lt;br/&gt;
LustreError: 0:0:(ldlm_lockd.c:344:waiting_locks_callback()) ### lock callback timer expired after 203s: evicting client at 172.16.204.67@o2ib  ns: filter-scratch2-OST000f_UUID lock: ffff880427dd4bc0/0xa7267be7d79bca20 lrc: 3/0,0 mode: PW/PW res: &lt;span class=&quot;error&quot;&gt;&amp;#91;0x2997:0x0:0x0&amp;#93;&lt;/span&gt;.0 rrc: 2 type: EXT &lt;span class=&quot;error&quot;&gt;&amp;#91;0-&amp;gt;18446744073709551615&amp;#93;&lt;/span&gt; (req 0-&amp;gt;1048575) flags: 0x60000000010020 nid: 172.16.204.67@o2ib remote: 0x2c2795a988d99fa0 expref: 152 pid: 20319 timeout: 6755680954 lvb_type: 0&lt;/p&gt;

&lt;p&gt;Example ior run for problem reproduction:&lt;br/&gt;
ior -b 4g -e -C -F -i 10 -k -M 4g -N 24 -o /mnt/lustre/scratch2/test -O lustreStripeCount=1 -t 1m -w -r&lt;/p&gt;

&lt;p&gt;Problem does not appear with thread count lower than 24 per node.&lt;br/&gt;
From 24 and up it is always occuring in the read phase:&lt;/p&gt;

&lt;p&gt;IOR-3.0.1: MPI Coordinated Test of Parallel I/O&lt;/p&gt;

&lt;p&gt;Began: Sat Apr 26 01:18:33 2014&lt;br/&gt;
Command line used: /people/x/jor/IOR/ior-3.0.1/src/ior -b 4g -e -C -F -i 10 -k -M 4g -N 24 -o /mnt/lustre/scratch2/people/x/ior-test/test -O lustreStripeCount=1 -t 1m -w -r&lt;br/&gt;
Machine: Linux n1085-amd.zeus&lt;/p&gt;

&lt;p&gt;Test 0 started: Sat Apr 26 01:18:33 2014&lt;br/&gt;
Summary:&lt;br/&gt;
        api                = POSIX&lt;br/&gt;
        test filename      = /mnt/lustre/scratch2/people/x/ior-test/test&lt;br/&gt;
        access             = file-per-process&lt;br/&gt;
        ordering in a file = sequential offsets&lt;br/&gt;
        ordering inter file= constant task offsets = 1&lt;br/&gt;
        clients            = 24 (24 per node)&lt;br/&gt;
        memoryPerNode      = 10.09 GiB&lt;br/&gt;
        repetitions        = 10&lt;br/&gt;
        xfersize           = 1 MiB&lt;br/&gt;
        blocksize          = 4 GiB&lt;br/&gt;
        aggregate filesize = 96 GiB&lt;br/&gt;
        Lustre stripe size = Use default&lt;br/&gt;
              stripe count = 1&lt;/p&gt;

&lt;p&gt;access    bw(MiB/s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)   close(s)   total(s)   iter&lt;br/&gt;
------    ---------  ---------- ---------  --------   --------   --------   --------   ----&lt;br/&gt;
write     2242.27    4194304    1024.00    0.037550   43.84      2.69       43.84      0   &lt;br/&gt;
read      3710       4194304    1024.00    0.005054   26.50      21.84      26.50      0   &lt;br/&gt;
write     2234.48    4194304    1024.00    0.022456   43.99      1.78       43.99      1   &lt;br/&gt;
read      8040       4194304    1024.00    0.019482   12.23      5.21       12.23      1   &lt;br/&gt;
WARNING: Task 19, partial write(), 4096 of 1048576 bytes at offset 3892314112&lt;br/&gt;
ior ERROR: write() failed, errno 5, Input/output error (aiori-POSIX.c:236)&lt;br/&gt;
...&lt;/p&gt;


</description>
                <environment>servers: 2.5.1&lt;br/&gt;
clients: 2.5.1, 2.4.x</environment>
        <key id="24399">LU-4963</key>
            <summary>client eviction during IOR test - lock callback timer expired</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="6" iconUrl="https://jira.whamcloud.com/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="lflis">Lukasz Flis</reporter>
                        <labels>
                    </labels>
                <created>Fri, 25 Apr 2014 23:49:57 +0000</created>
                <updated>Mon, 28 Apr 2014 21:47:24 +0000</updated>
                            <resolved>Mon, 28 Apr 2014 17:04:27 +0000</resolved>
                                    <version>Lustre 2.5.1</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="82563" author="lflis" created="Sat, 26 Apr 2014 00:39:00 +0000"  >&lt;p&gt;my mistake. Problem occurs in the write phase not the read one&lt;/p&gt;</comment>
                            <comment id="82624" author="simmonsja" created="Mon, 28 Apr 2014 16:05:24 +0000"  >&lt;p&gt;I have been having the same problem. This issue is being tracked under &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4584&quot; title=&quot;Lock revocation process fails consistently&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4584&quot;&gt;&lt;del&gt;LU-4584&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="82692" author="lflis" created="Mon, 28 Apr 2014 21:47:24 +0000"  >
&lt;p&gt;Debug log with dlmtrace&lt;/p&gt;

&lt;p&gt;ftp.whamclud.com/uploads/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4963&quot; title=&quot;client eviction during IOR test - lock callback timer expired&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4963&quot;&gt;&lt;del&gt;LU-4963&lt;/del&gt;&lt;/a&gt;/lustre-client-2.4.1-6.cyfronet.log.gz&lt;/p&gt;
</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="23009">LU-4584</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwl1z:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>13726</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>