<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:53:10 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-12504] Lustre stalls with &quot;slow creates&quot; on disabled OST</title>
                <link>https://jira.whamcloud.com/browse/LU-12504</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Greetings,&lt;/p&gt;

&lt;p&gt;We had an OST which was physically damaged recently on our Lustre 2.5.5 system. We were able to deactivate new file creation on the OST from the MDS (using lctl --device&#160;data-OST0036-osc-MDT0000 deactivate) , and lfs_migrate the data off, but then there were still quota problems when contacting the damaged OST. So, we tried to disable the OST from the client side as well.&lt;/p&gt;

&lt;p&gt;That worked, but now there are stray messages from our MDS warning of &#8220;slow creates&#8221; to this supposedly disabled OST, and filesystem creates are now very slow:&lt;/p&gt;

&lt;p&gt;Jul 2 08:40:21 mds1 kernel: Lustre: data-OST0036-osc-MDT0000: slow creates, last=&lt;span class=&quot;error&quot;&gt;&amp;#91;0x100360000:0xe4f61:0x0&amp;#93;&lt;/span&gt;, next=&lt;span class=&quot;error&quot;&gt;&amp;#91;0x100360000:0xe4f61:0x0&amp;#93;&lt;/span&gt;, reserved=0, syn_changes=0, syn_rpc_in_progress=0, status=-19&lt;/p&gt;

&lt;p&gt;All of the below have been tried to fix this on the MDS:&lt;/p&gt;

&lt;p&gt;lctl --device data-OST0036-osc-MDT0000 deactivate&lt;br/&gt;
 lctl conf_param data-OST0036-osc-MDT0000.osc.active=0&lt;br/&gt;
 lctl conf_param data-OST0036.osc.active=0&lt;br/&gt;
 lctl set_param osp.data-OST0036-osc-MDT0000.active=0&lt;br/&gt;
 lctl set_param osp.data-OST0036-*.max_create_count=0&lt;/p&gt;

&lt;p&gt;On clients, the OST is disabled, and the logs show &#8220;Lustre: setting import data-OST0036_UUID INACTIVE by administrator request&#8221;:&lt;/p&gt;

&lt;p&gt;client$ lctl get_param osc.*&lt;b&gt;-OST0036&lt;/b&gt;*.active&lt;br/&gt;
 osc.data-OST0036-osc-ffff882023331800.active=0&lt;/p&gt;

&lt;p&gt;The MDS also believes this OST is inactive:&lt;/p&gt;

&lt;p&gt;mds$ cat /proc/fs/lustre/osp/data-OST0036-osc-MDT0000/active &lt;br/&gt;
 0&lt;/p&gt;

&lt;p&gt;However, the slow creates message persists on the MDS, about one every 10 minutes, always with the same &#8220;last&#8221; and &#8220;next&#8221; ids. Is there something we have missed, or some other way this should have been resolved to permanently remove this OST?&lt;/p&gt;

&lt;p&gt;We have not yet tried standing up a new OST at the same index, or restarting the MDS.&lt;/p&gt;

&lt;p&gt;(&lt;b&gt;Update: Standing up a new OST to replace the defunct blank one, and setting back to active, cleaned this up.&lt;/b&gt;&#160; It still would be nice to know&#160;the proper way to handle this situation, though.) &lt;/p&gt;

&lt;p&gt;Thanks for any advice you may have,&lt;/p&gt;

&lt;p&gt;Chris&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</description>
                <environment></environment>
        <key id="56252">LU-12504</key>
            <summary>Lustre stalls with &quot;slow creates&quot; on disabled OST</summary>
                <type id="9" iconUrl="https://jira.whamcloud.com/images/icons/issuetypes/undefined.png">Question/Request</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="hannac">Chris Hanna</assignee>
                                    <reporter username="hannac">Chris Hanna</reporter>
                        <labels>
                    </labels>
                <created>Tue, 2 Jul 2019 13:13:26 +0000</created>
                <updated>Wed, 3 Jul 2019 12:36:36 +0000</updated>
                            <resolved>Wed, 3 Jul 2019 12:36:36 +0000</resolved>
                                    <version>Lustre 2.5.5</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>2</watches>
                                                                            <comments>
                            <comment id="250585" author="adilger" created="Wed, 3 Jul 2019 06:31:57 +0000"  >&lt;p&gt;The handling of deactivated OSTs in 2.5.x was definitely not ideal.  There were a number of fixes to this code over the years, in particular the addition of using &lt;tt&gt;osp.&amp;#42;.max_create_count=0&lt;/tt&gt; to disable &lt;b&gt;only&lt;/b&gt; object precreation on that OST, without also disabling the unlink of objects as the files are migrated as setting &lt;tt&gt;active=0&lt;/tt&gt;.  Also, a number of other fixes to better handle object cleanup after disconnect and reconnect.&lt;/p&gt;

&lt;p&gt;It may be that the issue on your system was that the MDS thread was already in the loop trying to create objects on the failed OST, and it continued trying to create rather than checking/noticing that the OST was no longer available.&lt;/p&gt;

&lt;p&gt;The main patches to fix this in later releases were developed under:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4825&quot; title=&quot;lfs migrate not freeing space on OST&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4825&quot;&gt;&lt;del&gt;LU-4825&lt;/del&gt;&lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7012&quot; title=&quot;files not being deleted from OST after being re-activated&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7012&quot;&gt;&lt;del&gt;LU-7012&lt;/del&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;with a couple of follow-on fixes:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11605&quot; title=&quot;create_count stuck in 0 after changeing max_create_count to 0 and back 20 000&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11605&quot;&gt;&lt;del&gt;LU-11605&lt;/del&gt;&lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11115&quot; title=&quot;OST selection algorithm broken with max_create_count=0 or empty OSTs&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11115&quot;&gt;&lt;del&gt;LU-11115&lt;/del&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;These issues were (AFAIK) fully resolved in 2.10.7 and later releases.&lt;/p&gt;</comment>
                            <comment id="250599" author="hannac" created="Wed, 3 Jul 2019 12:35:58 +0000"  >&lt;p&gt;Thanks Andreas!&#160; Right now, we are tied to this old version. I&apos;ll keep this in mind when doing these replacements in the future. I&apos;m going to resolve this issue since we&apos;ve been able to fix it with the new OST.&lt;/p&gt;

&lt;p&gt;Chris&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00j3b:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>