<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:28:22 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-9688] Stuck MDT in lod_qos_prep_create</title>
                <link>https://jira.whamcloud.com/browse/LU-9688</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Our MDT was stuck or barely usable twice in a row lately, and the second time we took a crash dump, which shows that several threads were blocked in lod_qos_prep_create...&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;PID: 291558  TASK: ffff88203c7b2f10  CPU: 9   COMMAND: &lt;span class=&quot;code-quote&quot;&gt;&quot;mdt01_030&quot;&lt;/span&gt;
 #0 [ffff881a157f7588] __schedule at ffffffff8168b6a5
 #1 [ffff881a157f75f0] schedule at ffffffff8168bcf9
 #2 [ffff881a157f7600] rwsem_down_write_failed at ffffffff8168d4a5
 #3 [ffff881a157f7688] call_rwsem_down_write_failed at ffffffff81327067
 #4 [ffff881a157f76d0] down_write at ffffffff8168aebd
 #5 [ffff881a157f76e8] lod_qos_prep_create at ffffffffa124031c [lod]
 #6 [ffff881a157f77a8] lod_declare_striped_object at ffffffffa1239a8c [lod]
 #7 [ffff881a157f77f0] lod_declare_object_create at ffffffffa123b0f1 [lod]
 #8 [ffff881a157f7838] mdd_declare_object_create_internal at ffffffffa129d21f [mdd]
 #9 [ffff881a157f7880] mdd_declare_create at ffffffffa1294133 [mdd]
#10 [ffff881a157f78f0] mdd_create at ffffffffa1295689 [mdd]
#11 [ffff881a157f79e8] mdt_reint_open at ffffffffa1176f05 [mdt]
#12 [ffff881a157f7ad8] mdt_reint_rec at ffffffffa116c4a0 [mdt]
#13 [ffff881a157f7b00] mdt_reint_internal at ffffffffa114edc2 [mdt]
#14 [ffff881a157f7b38] mdt_intent_reint at ffffffffa114f322 [mdt]
#15 [ffff881a157f7b78] mdt_intent_policy at ffffffffa1159b9c [mdt]
#16 [ffff881a157f7bd0] ldlm_lock_enqueue at ffffffffa0b461e7 [ptlrpc]
#17 [ffff881a157f7c28] ldlm_handle_enqueue0 at ffffffffa0b6f3a3 [ptlrpc]
#18 [ffff881a157f7cb8] tgt_enqueue at ffffffffa0befe12 [ptlrpc]
#19 [ffff881a157f7cd8] tgt_request_handle at ffffffffa0bf4275 [ptlrpc]
#20 [ffff881a157f7d20] ptlrpc_server_handle_request at ffffffffa0ba01fb [ptlrpc]
#21 [ffff881a157f7de8] ptlrpc_main at ffffffffa0ba42b0 [ptlrpc]
#22 [ffff881a157f7ec8] kthread at ffffffff810b06ff
#23 [ffff881a157f7f50] ret_from_fork at ffffffff81696b98



&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The disk array (from Dell) that we use for the MDT&#160;doesn&apos;t report any issue. The load was not particularly high. kmem -i does report&#160;76 GB of free memory (60% of TOTAL MEM).&lt;/p&gt;

&lt;p&gt;Attaching the output of `foreach bt`, maybe somebody will have a clue.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Each time, failing over the MDT resumed operations, but the recovery was a bit long and with a few evictions.&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;Lustre: oak-MDT0000: Recovery over after 13:39, of 1144 clients 1134 recovered and 10 were evicted.

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Thanks!&lt;br/&gt;
 Stephane&lt;/p&gt;</description>
                <environment>3.10.0-514.10.2.el7_lustre.x86_64, lustre-2.9.0_srcc6-1.el7.centos.x86_64</environment>
        <key id="46778">LU-9688</key>
            <summary>Stuck MDT in lod_qos_prep_create</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="6">Not a Bug</resolution>
                                        <assignee username="niu">Niu Yawei</assignee>
                                    <reporter username="sthiell">Stephane Thiell</reporter>
                        <labels>
                    </labels>
                <created>Mon, 19 Jun 2017 21:54:17 +0000</created>
                <updated>Tue, 18 Jul 2017 13:59:08 +0000</updated>
                            <resolved>Tue, 18 Jul 2017 13:59:08 +0000</resolved>
                                    <version>Lustre 2.9.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="199720" author="bzzz" created="Tue, 20 Jun 2017 14:51:36 +0000"  >&lt;p&gt;it&apos;s blocked by another thread waiting for OST objects. please, provide logs from MDTs/OSTs if possible.&lt;/p&gt;</comment>
                            <comment id="199740" author="sthiell" created="Tue, 20 Jun 2017 17:21:26 +0000"  >&lt;p&gt;Hi Alex,&lt;/p&gt;

&lt;p&gt;Thanks for the quick reply. That makes sense because&#160;we had some issues with the OSS oak-io1-s1 as it became unresponsive, we rebooted it on Jun 19 11:49:32, you can see that in the logs, and the OSTs were re-mounted at ~ Jun 19 12:00). Sorry I didn&apos;t mention that in the original ticket. So, I am attaching logs of the OSTs (OSS oak-io1-s1 and oak-io1-s2) and MDT&#160;(was mounted on MDS&#160;oak-md1-s1). While I was preparing the logs, I noticed that on the MDT (file oak-md1-s1.lustre.log), there are errors about objects precreation on one OST, could that be the issue?&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;Jun 19 11:47:23 oak-md1-s1 kernel: LustreError: 191781:0:(osp_precreate.c:615:osp_precreate_send()) oak-OST0016-osc-MDT0000: can&apos;t precreate: rc = -11
Jun 19 11:47:23 oak-md1-s1 kernel: LustreError: 191781:0:(osp_precreate.c:1243:osp_precreate_thread()) oak-OST0016-osc-MDT0000: cannot precreate objects: rc = -11


&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;notes:&lt;br/&gt;
 o2ib5 is the lnet network of the servers and a few clients&lt;br/&gt;
 o2ib, o2ib3, o2ib4 are client only networks&lt;/p&gt;

&lt;p&gt;Thanks,&lt;/p&gt;

&lt;p&gt;Stephane&lt;/p&gt;</comment>
                            <comment id="199865" author="pjones" created="Wed, 21 Jun 2017 17:25:46 +0000"  >&lt;p&gt;Niu&lt;/p&gt;

&lt;p&gt;Can you please advise on this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="199956" author="niu" created="Thu, 22 Jun 2017 13:10:48 +0000"  >&lt;p&gt;Yes, the error message you mentioned is related to this issue, because precreate failed instantly, all create threads are blocked on waiting objects being created.&lt;br/&gt;
I checked OST log and found some md raid threads are hung in md_update_sb() at that time, I think that could be the root cause. Is this problem disappeared?&lt;/p&gt;</comment>
                            <comment id="199964" author="sthiell" created="Thu, 22 Jun 2017 14:53:44 +0000"  >&lt;p&gt;Hi Niu,&lt;/p&gt;

&lt;p&gt;Thanks for looking at this. After making sure that the OSTs were OK and also failing over the MDT, the problem did not appear&#160;again.&#160;&#160;I&apos;m just a bit concerned that the MDT couldn&apos;t&#160;recover by itself in that specific case.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;/p&gt;

&lt;p&gt;Stephane&lt;/p&gt;</comment>
                            <comment id="201091" author="niu" created="Thu, 6 Jul 2017 02:00:24 +0000"  >&lt;p&gt;Hi, Stephane&lt;/p&gt;

&lt;p&gt;That&apos;s good news, if OST fail to create objects due to backend storage problem, the creation on MDT will be blocked, we can&apos;t do much about in such situation but waiting for the storage recovered. Can we close this ticket now? Thanks.&lt;/p&gt;</comment>
                            <comment id="202468" author="niu" created="Tue, 18 Jul 2017 13:59:08 +0000"  >&lt;p&gt;Bad disk, not Lustre issue.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="27068" name="oak-io1-s1.lustre.log" size="1393789" author="sthiell" created="Tue, 20 Jun 2017 17:20:30 +0000"/>
                            <attachment id="27069" name="oak-io1-s2.lustre.log" size="45336" author="sthiell" created="Tue, 20 Jun 2017 17:20:32 +0000"/>
                            <attachment id="27060" name="oak-md1-s1.foreach_bt.txt" size="439390" author="sthiell" created="Mon, 19 Jun 2017 21:55:58 +0000"/>
                            <attachment id="27067" name="oak-md1-s1.lustre.log" size="280808" author="sthiell" created="Tue, 20 Jun 2017 17:20:26 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzfe7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>