<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:39:06 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-10892] hang at &apos;echo clear &gt; /proc/fs/lustre/ldlm/namespaces/.../lru_size&apos; </title>
                <link>https://jira.whamcloud.com/browse/LU-10892</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We are encountering frequent hangs when we execute:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;echo clear &amp;gt; $server/lru_size
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;where &lt;tt&gt;$server&lt;/tt&gt; is a path like &lt;tt&gt;/proc/fs/lustre/ldlm/namespaces/ls6-OST000a-osc-&amp;lt;UUID&amp;gt;/&lt;/tt&gt;.&lt;/p&gt;

&lt;p&gt;In the cases we&apos;ve documented the target is an OST. &#160; That OST shows as active in &lt;tt&gt;lfs check servers&lt;/tt&gt;. &#160;We see no indication of problems (on the OST (nothing in console logs, no flapping connections, etc.).&lt;/p&gt;

&lt;p&gt;The stack trace looks like this.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;__ldlm_bl_to_thread+0x144
ldlm_bl_to_thread+0x473
ldlm_bl_to_thread_list+0x19
ldlm_cancel_lru+0x70
lprocfs_lru_size_seq_write+0x10c
proc_reg_write+0x7e
...
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The client version is lustre-2.5.5-11chaos. The server version is lustre 2.8.2.&lt;/p&gt;

&lt;p&gt;Code where stuck thread is blocking:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;(gdb) l *(__ldlm_bl_to_thread+0x144)
0x28874 is in __ldlm_bl_to_thread (/usr/src/debug/lustre-2.5.5/lustre/ldlm/ldlm_lockd.c:1997).
1992 wake_up(&amp;amp;blp-&amp;gt;blp_waitq);
1993
1994 /* can not check blwi-&amp;gt;blwi_flags as blwi could be already freed in
1995 LCF_ASYNC mode */
1996 &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (!(cancel_flags &amp;amp; LCF_ASYNC))
1997         wait_for_completion(&amp;amp;blwi-&amp;gt;blwi_comp);
1998
1999 RETURN(0);
2000 }
2001
(gdb) quit
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;br/&gt;
We find that we can kill the user space process without any obvious ill effects.  The user space process dies.&lt;/p&gt;

&lt;p&gt;We are working on retiring our Lustre 2.5 systems, so a workaround is sufficient.  Our questions are:&lt;br/&gt;
1. Is it correct that the locks being purged are not protecting any dirty cache on the client?&lt;br/&gt;
2. Can simply kill these stuck processes without data loss?&lt;/p&gt;</description>
                <environment></environment>
        <key id="51724">LU-10892</key>
            <summary>hang at &apos;echo clear &gt; /proc/fs/lustre/ldlm/namespaces/.../lru_size&apos; </summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="6" iconUrl="https://jira.whamcloud.com/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="pjones">Peter Jones</assignee>
                                    <reporter username="ofaaland">Olaf Faaland</reporter>
                        <labels>
                            <label>llnl</label>
                    </labels>
                <created>Mon, 9 Apr 2018 20:46:20 +0000</created>
                <updated>Mon, 16 Apr 2018 18:41:39 +0000</updated>
                            <resolved>Mon, 16 Apr 2018 18:41:29 +0000</resolved>
                                    <version>Lustre 2.5.5</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="225770" author="adilger" created="Wed, 11 Apr 2018 16:55:07 +0000"  >&lt;p&gt;Olaf,&lt;br/&gt;
the locks in the LRU may be protecting clean or dirty data pages (and metadata for MDC locks), but the locks are not cancelled until all of the pages covered by the lock have been written and dropped from cache.  The &quot;&lt;tt&gt;echo clear &amp;gt; ...&amp;lt;target&amp;gt;/lru&amp;#95;size&lt;/tt&gt;&quot; (or preferably &quot;&lt;tt&gt;lctl set&amp;#95;param osc.&amp;lt;target&amp;gt;.lru&amp;#95;size=clear&lt;/tt&gt;&quot; so that your scripts will not break as these files are moved from &lt;tt&gt;/proc&lt;/tt&gt; to &lt;tt&gt;/sys&lt;/tt&gt;) will drop all unused locks on the client (for the specific target, if specified or all targets if &apos;&lt;tt&gt;&amp;lt;target&amp;gt;=&amp;#42;&lt;/tt&gt;&apos; is used).  The only locks &lt;b&gt;not&lt;/b&gt; dropped will be those currently in use by an active system call.&lt;/p&gt;

&lt;p&gt;Killing the operation will, at worst, mean that some of the DLM locks may not be cancelled immediately, and will be expired by some other mechanism (age, number of locks, server load).&lt;/p&gt;</comment>
                            <comment id="226100" author="ofaaland" created="Mon, 16 Apr 2018 18:41:29 +0000"  >&lt;p&gt;Thank you.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzvj3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>