<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:25:44 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-2499] Help debug waiting_locks_callback causing client eviction</title>
                <link>https://jira.whamcloud.com/browse/LU-2499</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We are seeing the following error.&lt;/p&gt;

&lt;p&gt;Dec 13 08:35:39 nbp2-oss1 kernel: LustreError: 0:0:(ldlm_lockd.c:358:waiting_locks_callback()) ### lock callback timer expired after 351s: evicting client at 10.151.34.219@o2ib  ns: filter-nbp2-OST0018_UUID lock: ffff8804c55d8480/0x1ca7e7e6c780ff4d lrc: 3/0,0 mode: PW/PW res: 182889173/0 rrc: 5 type: EXT &lt;span class=&quot;error&quot;&gt;&amp;#91;0-&amp;gt;18446744073709551615&amp;#93;&lt;/span&gt; (req 0-&amp;gt;18446744073709551615) flags: 0x20 remote: 0xd281632991b12020 expref: 6 pid: 19246 timeout 7391670727&lt;/p&gt;


&lt;p&gt;With the client evicted we get dirty_page_discards like this.&lt;/p&gt;

&lt;p&gt;Dec 13 08:35:40 r305i3n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;1164772.491928&amp;#93;&lt;/span&gt; Lustre: 7178:0:(llite_lib.c:2283:ll_dirty_page_discard_warn()) nbp2: dirty page discard: 10.151.26.5@o2ib:/nbp2/fid: &lt;span class=&quot;error&quot;&gt;&amp;#91;0x5677ca33040:0x2d5:0x0&amp;#93;&lt;/span&gt;//mlellis/RunStilt/runs/20120523-Cherskii-d01-WRF-TEST-20121213.15.46.32.UTC/run_d01/Exe/Copy8/cdump may get corrupted (rc -4)&lt;/p&gt;

&lt;p&gt;We have seen this happen at the beginning of a job. Now we are runing lflush before the start of every job. Could lflush cause this?&lt;/p&gt;

&lt;p&gt;We stilling trying to to reproduce it and gather additional logs.  &lt;/p&gt;</description>
                <environment></environment>
        <key id="16939">LU-2499</key>
            <summary>Help debug waiting_locks_callback causing client eviction</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="bobijam">Zhenyu Xu</assignee>
                                    <reporter username="mhanafi">Mahmoud Hanafi</reporter>
                        <labels>
                            <label>ptr</label>
                    </labels>
                <created>Fri, 14 Dec 2012 18:53:04 +0000</created>
                <updated>Tue, 29 Oct 2013 18:40:39 +0000</updated>
                            <resolved>Tue, 29 Oct 2013 18:40:39 +0000</resolved>
                                    <version>Lustre 2.1.3</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="49264" author="pjones" created="Fri, 14 Dec 2012 19:46:02 +0000"  >&lt;p&gt;Bobijam&lt;/p&gt;

&lt;p&gt;lflush is a tool-produced by LLNL. You may find some information on it by Googling. Could you please see what conditions would trigger this error and possible reasons? &lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="49265" author="pjones" created="Fri, 14 Dec 2012 19:47:59 +0000"  >&lt;p&gt;Chris&lt;/p&gt;

&lt;p&gt;I think that you were involved in the creation of lflush. Is LLNL still using this tool on your 2.1.x production system? Have you ever seen any errors of this nature as a result if so?&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="49323" author="morrone" created="Mon, 17 Dec 2012 13:44:11 +0000"  >&lt;p&gt;See the scripts directory of this project:&lt;/p&gt;

&lt;p&gt;  &lt;a href=&quot;https://github.com/chaos/lustre-tools-llnl&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/chaos/lustre-tools-llnl&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It is fairly simple.  These days it could be done even shorter if we just used an &quot;lctl set_param&quot;.&lt;/p&gt;

&lt;p&gt;We do still use it in the slurm epilog script at the end of every job.  We&apos;re not seeing that problem.  At least not specifically associated with lflush, to the best of my knowledge.&lt;/p&gt;

&lt;p&gt;But &quot;lock callback timer expired&quot; is a very, very common error that we have seen, for many different reasons.  Many nodes dropping their locks at the same time could certainly provide the load that uncovers a bug, network problem, or something else.  Full logs will be needed to figure out what happened in this case.&lt;/p&gt;</comment>
                            <comment id="54624" author="bobijam" created="Thu, 21 Mar 2013 23:50:38 +0000"  >&lt;p&gt;Do you have a detailed log around the time when this issue happens?&lt;/p&gt;</comment>
                            <comment id="70152" author="mhanafi" created="Tue, 29 Oct 2013 18:16:51 +0000"  >&lt;p&gt;this can be closed&lt;/p&gt;</comment>
                            <comment id="70165" author="pjones" created="Tue, 29 Oct 2013 18:40:39 +0000"  >&lt;p&gt;Thanks Mahmoud&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvdvz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>5857</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10021"><![CDATA[2]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>