<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:21:06 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-1950] oss high load, dead ll_ost_io</title>
                <link>https://jira.whamcloud.com/browse/LU-1950</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We have experiencing a problem on our OSS servers. The server load shoot up to as high a 500, with dead io threads, the ost&apos;s become unusable until the OSS is rebooted. This has happened 5 times on 4 different OSS server in the last 48 hours. I&apos;ve attached messages file from the server that was last affected.&lt;/p&gt;</description>
                <environment>Sun Fire X4540, kernel ver 2.6.18-308.4.1</environment>
        <key id="15993">LU-1950</key>
            <summary>oss high load, dead ll_ost_io</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="4">Incomplete</resolution>
                                        <assignee username="green">Oleg Drokin</assignee>
                                    <reporter username="hellenn">Hellen</reporter>
                        <labels>
                    </labels>
                <created>Sun, 16 Sep 2012 12:47:36 +0000</created>
                <updated>Sat, 15 Mar 2014 01:08:27 +0000</updated>
                            <resolved>Sat, 15 Mar 2014 01:08:27 +0000</resolved>
                                    <version>Lustre 1.8.8</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="44964" author="cliffw" created="Sun, 16 Sep 2012 13:38:19 +0000"  >&lt;p&gt;Can you getting a dump of all stacks with SysRq on the OSS when this is happening? Please attach if so. Any indications of IO hardware problems? &lt;/p&gt;</comment>
                            <comment id="44965" author="green" created="Sun, 16 Sep 2012 13:41:49 +0000"  >&lt;p&gt;The logs indicate back-end IO devices as being extremely slow to process requests, more than 10 thousand seconds to process a single request.&lt;br/&gt;
Typical reasons are including raid arrays going bad/degraded and lots of small IO workloads.&lt;br/&gt;
I see this is happening on several nodes, so I wonder if you tarted to run something recently with a very different IO footprint than before?&lt;br/&gt;
In particular some application that does a lot of small IO (either read or writes) or some application that does a lot of unlinks from a lot of clients (how many clients do you have?)&lt;/p&gt;

&lt;p&gt;In particular small IO workloads are pretty bothersome because to the disks they look like a stream of small random requests and rotating disks hate that workloads where they mostly do nothing but move the writing arm around (seeking) instead of writing.&lt;/p&gt;

&lt;p&gt;More or less healthy workload for rotating media is about 1 request in flight per spindle. Unfortunately there&apos;s no way in Lustre 1.8 to limit it like that (nether is 2.x capable of this yet, though NRS feature will eventually allow that).&lt;br/&gt;
If a lot of random small IO is an important workload for you, it might make sense to actually decrease number of IO threads on OSTs from 500 to ~number of spindles in total you have on that ost + 5.&lt;br/&gt;
Also if you have mostly write activity going on, enabling async journals might also help ( lctl set_param -n obdfilter.${ost_svc}.sync_journal 0&quot; for every OST in the system.&lt;/p&gt;

&lt;p&gt;Sadly Lustre is unable to write to the media any faster than the media is capable of and this needs to be taken into account when looking at various IO patterns your applications are having. &lt;/p&gt;</comment>
                            <comment id="44966" author="hellenn" created="Sun, 16 Sep 2012 14:17:27 +0000"  >&lt;p&gt;Will try to get a dump of all stacks with SysRq the next time it happens&lt;/p&gt;

&lt;p&gt;We have 672 clients, I haven&#8217;t noticed  something running with different IO footprint, but I&apos;ll keep looking.&lt;br/&gt;
We do have a possible controller/driver bug (&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1910&quot; title=&quot;OSS kernel panics after upgrade&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1910&quot;&gt;&lt;del&gt;LU-1910&lt;/del&gt;&lt;/a&gt;) and are planning to upgrade mptsas drivers ASAP, could a controller problem be a possible cause of this?&lt;/p&gt;

&lt;p&gt;Most activity going on is write activity, So I&apos;ll enable async journal and see if that helps. There are some users with a lot random small IO, so will also decrease number of IO threads to see if that helps too.&lt;/p&gt;</comment>
                            <comment id="45001" author="green" created="Mon, 17 Sep 2012 02:36:15 +0000"  >&lt;p&gt;While a controller problem could be one of the possible causes, did you really change the controller driver recently? If so, definitely try to roll-back to a previous version or otherwise try to change it.&lt;/p&gt;</comment>
                            <comment id="45002" author="hellenn" created="Mon, 17 Sep 2012 02:50:55 +0000"  >&lt;p&gt;yes, we upgraded the OS and kernel recently, The controller driver in use now, is a later version to what we had before the upgrade&lt;/p&gt;</comment>
                            <comment id="45124" author="hellenn" created="Tue, 18 Sep 2012 04:52:59 +0000"  >&lt;p&gt;Please see attached console messages with a dump of all stacks. This is from the last sever affected a few minutes ago.&lt;/p&gt;</comment>
                            <comment id="45125" author="hellenn" created="Tue, 18 Sep 2012 04:53:42 +0000"  >&lt;p&gt;SysRq dump of all stacks&lt;/p&gt;</comment>
                            <comment id="45146" author="green" created="Tue, 18 Sep 2012 12:06:15 +0000"  >&lt;p&gt;I see in the new log requests are &quot;only&quot; taking ~2600 seconds now instead of 10000 in previous one. Did you reduce number of threads or made other changes.&lt;/p&gt;

&lt;p&gt;Also one more option for you to try I guess is to disable read only cache, which it&apos;s enabled by default always so it&apos;s not something changed during your upgrade, there&apos;s a known problem that hit sometimes manifested in the cache on OSS taking too much memory and forcing cached metadata out resulting in a lot of small reads all the time. I imagine if your workload changed that might have been triggered too and you might want to try and disable ROC just in case at least temporarily to see if that would help.&lt;/p&gt;

&lt;p&gt;lctl set_param -n obdfilter.*.read_cache_enable=0&lt;br/&gt;
lctl set_param -n obdfilter.*.writethrough_cache_enable=0&lt;/p&gt;

&lt;p&gt;please let us know if this has any effect.&lt;/p&gt;</comment>
                            <comment id="45151" author="hellenn" created="Tue, 18 Sep 2012 12:48:55 +0000"  >&lt;p&gt;sysrq-t&lt;/p&gt;</comment>
                            <comment id="45166" author="hellenn" created="Tue, 18 Sep 2012 15:31:27 +0000"  >&lt;p&gt;cat /proc/fs/jbd2/md*/history&lt;/p&gt;</comment>
                            <comment id="45322" author="hellenn" created="Fri, 21 Sep 2012 04:09:55 +0000"  >&lt;p&gt;We&apos;ve reduced IO threads from 512 to 256, read_cache and writethrough_cache are disabled adn we&apos;ve enabled async journals. We also upgraded mptbase drivers from the Redhat default and are now using an Oracle flavored driver. &lt;br/&gt;
Its been about 2 days since we made the changes. This morning one of the OSS server load shot up to over 255. There were no errors in the messages log prior to the load shooting up.  The load stayed up for about an hour and client could not reach the OST&apos;s on the server during this time. The load eventually came down and everything worked normally again. I&apos;ve attached the messages file from the server affected(check from Sep 21 06:10:5 The load went from 8 to 255 between 05:45 and 06:15) and jbd2-history from the md devices - I&apos;ve been collecting these every 30 minutes.&lt;/p&gt;</comment>
                            <comment id="45324" author="hellenn" created="Fri, 21 Sep 2012 04:10:47 +0000"  >&lt;p&gt;jbd2 stats from all md devices&lt;/p&gt;</comment>
                            <comment id="45325" author="hellenn" created="Fri, 21 Sep 2012 04:11:09 +0000"  >&lt;p&gt;messages file&lt;/p&gt;</comment>
                            <comment id="45337" author="green" created="Fri, 21 Sep 2012 08:50:46 +0000"  >&lt;p&gt;I only see blocking in the block layer (in the disk driver likely), possibly Johann will have some more comments.&lt;/p&gt;

&lt;p&gt;Any chance you can try reverting mpt driver to the one used before your recent upgrade when you did not see such problems?&lt;/p&gt;</comment>
                            <comment id="45375" author="hellenn" created="Fri, 21 Sep 2012 21:50:40 +0000"  >&lt;p&gt;another server affected but this one did not recover like the last one and eventually had to reboot it., the problem started between 21:15 and 21:30 (Sep 21) - i&apos;ll attach, jbdstats, messages and console messages with sysrq-t output&lt;/p&gt;</comment>
                            <comment id="45569" author="hellenn" created="Wed, 26 Sep 2012 11:17:20 +0000"  >&lt;p&gt;Hi have you managed to have  a look at the sysrq-t output? &lt;br/&gt;
We been reverting every server affected back to old kernel(with lustre ver 1.8.5)&lt;/p&gt;</comment>
                            <comment id="45570" author="green" created="Wed, 26 Sep 2012 11:28:49 +0000"  >&lt;p&gt;What I find a bit bothersome is in some mpt threads there&apos;s this on the stack:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt; [&amp;lt;ffffffff880e4dfb&amp;gt;] :mptbase:mpt_fault_reset_work+0x0/0x276
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The reset is seemingly broken in your mpt driver (though it does not explain how everything comes back alive once you stop lustre, I guess. Other than perhaps lustre abandons the stuck IO and only seems alive then eve though the disks remain inaccessible?)&lt;/p&gt;

&lt;p&gt;I wonder if you can try to compile older mpt driver from older kernel against newer kernel and try int that way with newer lustre?&lt;/p&gt;</comment>
                            <comment id="78788" author="jfc" created="Sat, 8 Mar 2014 01:07:11 +0000"  >&lt;p&gt;Hellen,&lt;br/&gt;
Is there anything further we can do to help you on this issue, or can we mark it as resolved?&lt;br/&gt;
Many thanks,&lt;br/&gt;
~ jfc.&lt;/p&gt;</comment>
                            <comment id="79391" author="jfc" created="Sat, 15 Mar 2014 01:08:27 +0000"  >&lt;p&gt;Looks like we will not make any further progress on this issue.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="11875" name="jbd_history" size="25673" author="hellenn" created="Tue, 18 Sep 2012 15:31:27 +0000"/>
                            <attachment id="11865" name="lustre_log" size="7784643" author="hellenn" created="Sun, 16 Sep 2012 13:14:25 +0000"/>
                            <attachment id="11873" name="o1_console" size="2351059" author="hellenn" created="Tue, 18 Sep 2012 12:48:55 +0000"/>
                            <attachment id="11885" name="oss01-jbd2.tar" size="2426880" author="hellenn" created="Fri, 21 Sep 2012 04:10:47 +0000"/>
                            <attachment id="11887" name="oss01-lustre-log" size="923769" author="hellenn" created="Fri, 21 Sep 2012 04:14:43 +0000"/>
                            <attachment id="11886" name="oss01-messages" size="101751" author="hellenn" created="Fri, 21 Sep 2012 04:11:09 +0000"/>
                            <attachment id="11871" name="oss06_console" size="880300" author="hellenn" created="Tue, 18 Sep 2012 04:53:42 +0000"/>
                            <attachment id="11864" name="oss07_messages" size="364022" author="hellenn" created="Sun, 16 Sep 2012 12:47:36 +0000"/>
                            <attachment id="11894" name="oss2-jbd-stats.tar" size="3328000" author="hellenn" created="Fri, 21 Sep 2012 21:53:03 +0000"/>
                            <attachment id="11895" name="oss2-messages" size="1323228" author="hellenn" created="Fri, 21 Sep 2012 21:53:56 +0000"/>
                            <attachment id="11901" name="sysrq-t" size="1084470" author="hellenn" created="Sat, 22 Sep 2012 04:56:23 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzv387:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>4022</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10021"><![CDATA[2]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>