<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:43:45 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4552] osc_cache.c:899:osc_extent_wait() timeout quite often</title>
                <link>https://jira.whamcloud.com/browse/LU-4552</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We hit client hangs quite often on the all login nodes and following Lustre error messages printed out. It can&apos;t be recovery until client reboots.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Jan 22 17:23:23 ff01 kernel: LustreError: 84026:0,(osc_cache.c:899:osc_extent_wait()) extent ffff8831a49b0678@{[0 &amp;gt; 0/255], [3|0|+|rpc|wihY|ffff88283005bc48], [4096|1|+||ffff8828fb76b228|256|ffff88319695e040]} home2-OST000b-osc-ffff883fdbbd8800: wait ext to 0 timedout, recovery in progress?
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment>RHEL6</environment>
        <key id="22903">LU-4552</key>
            <summary>osc_cache.c:899:osc_extent_wait() timeout quite often</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="hongchao.zhang">Hongchao Zhang</assignee>
                                    <reporter username="ihara">Shuichi Ihara</reporter>
                        <labels>
                    </labels>
                <created>Tue, 28 Jan 2014 11:58:58 +0000</created>
                <updated>Wed, 5 Mar 2014 16:13:42 +0000</updated>
                            <resolved>Fri, 7 Feb 2014 18:34:52 +0000</resolved>
                                    <version>Lustre 2.5.0</version>
                    <version>Lustre 2.4.2</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>9</watches>
                                                                            <comments>
                            <comment id="75801" author="pjones" created="Tue, 28 Jan 2014 21:22:33 +0000"  >&lt;p&gt;Hongchao&lt;/p&gt;

&lt;p&gt;Could you please advise on this ticket?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="75808" author="ihara" created="Tue, 28 Jan 2014 22:17:47 +0000"  >&lt;p&gt;OK, reproduced many times. And we confirmed this hang happened when a software installer (install binaries to Lustre) is running and &quot;du&quot; command is also running as background job on same client.&lt;br/&gt;
We thought &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4509&quot; title=&quot;clio can be stuck in osc_extent_wait&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4509&quot;&gt;&lt;del&gt;LU-4509&lt;/del&gt;&lt;/a&gt; might be related to this issue, and applied that patch and tested it.&lt;/p&gt;

&lt;p&gt;As far as we tested, finally, the same hang happened (running installer and du commands were hang). However, the filesystem was still accessible. (It was possible to do IO to Lustre and it worked) So, this is different situation before apply &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4509&quot; title=&quot;clio can be stuck in osc_extent_wait&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4509&quot;&gt;&lt;del&gt;LU-4509&lt;/del&gt;&lt;/a&gt; patches. (without patches, we couldn&apos;t do anything after problem happened)&lt;/p&gt;

&lt;p&gt;However, we are still hitting application hangs (not rater than node hang) even applied &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4509&quot; title=&quot;clio can be stuck in osc_extent_wait&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4509&quot;&gt;&lt;del&gt;LU-4509&lt;/del&gt;&lt;/a&gt; patches.&lt;/p&gt;

&lt;p&gt;btw, we saw following call trace messages before osc_extent_wait() timeout messages, but this doesn&apos;t mapper whether applied &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4509&quot; title=&quot;clio can be stuck in osc_extent_wait&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4509&quot;&gt;&lt;del&gt;LU-4509&lt;/del&gt;&lt;/a&gt; patch or not.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Jan 29 03:30:51 ff04 kernel: INFO: task ldlm_bl_32:122671 blocked for more than 120 seconds.
Jan 29 03:30:51 ff04 kernel: &quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
Jan 29 03:30:51 ff04 kernel: ldlm_bl_32    D 000000000000000b     0 122671      2 0x00000000
Jan 29 03:30:51 ff04 kernel: ffff883f1f083ce0 0000000000000046 0000000000000000 0000000100000020
Jan 29 03:30:51 ff04 kernel: 52e7f60b0000000b 0000000000052daf 0001df2f00000000 0000015f00000000
Jan 29 03:30:51 ff04 kernel: ffff883f1f081ab8 ffff883f1f083fd8 000000000000fb88 ffff883f1f081ab8
Jan 29 03:30:51 ff04 kernel: Call Trace:
Jan 29 03:30:51 ff04 kernel: [&amp;lt;ffffffff8150ed3e&amp;gt;] __mutex_lock_slowpath+0x13e/0x180
Jan 29 03:30:51 ff04 kernel: [&amp;lt;ffffffff8150ebdb&amp;gt;] mutex_lock+0x2b/0x50
Jan 29 03:30:51 ff04 kernel: [&amp;lt;ffffffffa0d4225f&amp;gt;] cl_lock_mutex_get+0x6f/0xd0 [obdclass]
Jan 29 03:30:51 ff04 kernel: [&amp;lt;ffffffffa1286b9a&amp;gt;] osc_ldlm_blocking_ast+0x7a/0x350 [osc]
Jan 29 03:30:51 ff04 kernel: [&amp;lt;ffffffffa0c75a81&amp;gt;] ? libcfs_debug_msg+0x41/0x50 [libcfs]
Jan 29 03:30:51 ff04 kernel: [&amp;lt;ffffffffa0f06680&amp;gt;] ldlm_handle_bl_callback+0x130/0x400 [ptlrpc]
Jan 29 03:30:51 ff04 kernel: [&amp;lt;ffffffffa0f06bb1&amp;gt;] ldlm_bl_thread_main+0x261/0x3c0 [ptlrpc]
Jan 29 03:30:51 ff04 kernel: [&amp;lt;ffffffff81063310&amp;gt;] ? default_wake_function+0x0/0x20
Jan 29 03:30:51 ff04 kernel: [&amp;lt;ffffffffa0f06950&amp;gt;] ? ldlm_bl_thread_main+0x0/0x3c0 [ptlrpc]
Jan 29 03:30:51 ff04 kernel: [&amp;lt;ffffffff81096916&amp;gt;] kthread+0x96/0xa0
Jan 29 03:30:51 ff04 kernel: [&amp;lt;ffffffff8100c0ca&amp;gt;] child_rip+0xa/0x20
Jan 29 03:30:51 ff04 kernel: [&amp;lt;ffffffff81096880&amp;gt;] ? kthread+0x0/0xa0
Jan 29 03:30:51 ff04 kernel: [&amp;lt;ffffffff8100c0c0&amp;gt;] ? child_rip+0x0/0x20
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="75891" author="jay" created="Wed, 29 Jan 2014 20:01:04 +0000"  >&lt;p&gt;Will you show us the backtrace of all processes on the node when this occurs?&lt;/p&gt;</comment>
                            <comment id="75906" author="ihara" created="Wed, 29 Jan 2014 23:47:06 +0000"  >&lt;p&gt;Hi Jinshan,&lt;br/&gt;
We didn&apos;t have it yet, but we can reproduce same problem and get backtrace. I Will collect them soon. Meantime, I&apos;m uploading Lustre debug log that we got.&lt;/p&gt;</comment>
                            <comment id="75907" author="jay" created="Thu, 30 Jan 2014 00:27:40 +0000"  >&lt;p&gt;is it easy to be reproduced? In that case, it&apos;ll be a good idea to share us the reproduce program.&lt;/p&gt;</comment>
                            <comment id="75910" author="hongchao.zhang" created="Thu, 30 Jan 2014 02:06:25 +0000"  >&lt;p&gt;HI Ihara,&lt;/p&gt;

&lt;p&gt;do you use rpm as the software installer?&lt;br/&gt;
I run &quot; while [ true ]; do du -a /mnt/lustre &amp;gt;/dev/null 2&amp;gt;&amp;amp;1 ; done &amp;amp;&quot; in the background, and continuously run &quot;rpm -ivh&quot;,  but can&apos;t reproduce it.&lt;br/&gt;
and both 2.5.0 and 2.4.2 are tested.&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;</comment>
                            <comment id="75911" author="ihara" created="Thu, 30 Jan 2014 02:19:18 +0000"  >&lt;p&gt;Jinshan, Hongchao&lt;br/&gt;
No, this is not RPM package, and not opensource software. sow we can&apos;t share. It&apos;s java based installer. I don&apos;t think they are doing very specific things, but I didn&apos;t reproduce this problem with other way in our lab either.&lt;br/&gt;
Howerver, we can 100% reproduce this problem with that java instlaler at the customer site. If you want any addtiianl information, please let me know. We will collect all informaiton whatever you want and repruce problem.&lt;/p&gt;</comment>
                            <comment id="75929" author="ihara" created="Thu, 30 Jan 2014 15:08:36 +0000"  >&lt;p&gt;Here is backtrace of 1) before dump calltrace, 2) after dump calltrace and 3) after printout of osc_extent_wait() timeout messages.&lt;/p&gt;</comment>
                            <comment id="75979" author="bfaccini" created="Fri, 31 Jan 2014 10:18:50 +0000"  >&lt;p&gt;Hello Shuichi,&lt;br/&gt;
After having a look to the back-traces (still need to review the Lustre debug-logs!), your problem seems similar to the one reported in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4300&quot; title=&quot;ptlrpcd threads deadlocked in cl_lock_mutex_get&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4300&quot;&gt;&lt;del&gt;LU-4300&lt;/del&gt;&lt;/a&gt;.&lt;br/&gt;
Also, could you try to run the same 100% reproducer on a node where ELC has been disabled ?? I think this can be set with &quot;echo 0 &amp;gt; /proc/fs/lustre/ldlm/namespaces/*/early_lock_cancel&quot;.&lt;/p&gt;</comment>
                            <comment id="75998" author="ihara" created="Fri, 31 Jan 2014 18:19:03 +0000"  >&lt;p&gt;Thanks, Bruno!&lt;br/&gt;
After &quot;echo 0 &amp;gt; /proc/fs/lustre/ldlm/namespaces/*/early_lock_cancel&quot; setting, the problem was not reproduced. So, it looks like this is same problem to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4300&quot; title=&quot;ptlrpcd threads deadlocked in cl_lock_mutex_get&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4300&quot;&gt;&lt;del&gt;LU-4300&lt;/del&gt;&lt;/a&gt;. We tried a couple of times, but didn&apos;t happen anything and installer finisehd without errors.&lt;/p&gt;</comment>
                            <comment id="76466" author="bfaccini" created="Fri, 7 Feb 2014 14:35:52 +0000"  >&lt;p&gt;Hello Shuichi, do you agree if I close this ticket as a dup of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4300&quot; title=&quot;ptlrpcd threads deadlocked in cl_lock_mutex_get&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4300&quot;&gt;&lt;del&gt;LU-4300&lt;/del&gt;&lt;/a&gt; ?&lt;/p&gt;</comment>
                            <comment id="76488" author="ihara" created="Fri, 7 Feb 2014 18:31:50 +0000"  >&lt;p&gt;Bruno, Yes, as far as we tested, I think it&apos;s duplicated issue of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4300&quot; title=&quot;ptlrpcd threads deadlocked in cl_lock_mutex_get&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4300&quot;&gt;&lt;del&gt;LU-4300&lt;/del&gt;&lt;/a&gt;. Please close this ticket &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4552&quot; title=&quot;osc_cache.c:899:osc_extent_wait() timeout quite often&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4552&quot;&gt;&lt;del&gt;LU-4552&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="76489" author="pjones" created="Fri, 7 Feb 2014 18:34:52 +0000"  >&lt;p&gt;Thanks Ihara&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="22219">LU-4300</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="14035" name="lctl.dk.23.17.tgz" size="1465725" author="ihara" created="Wed, 29 Jan 2014 23:47:06 +0000"/>
                            <attachment id="14036" name="lctl.dk.after.tgz" size="906503" author="ihara" created="Wed, 29 Jan 2014 23:47:06 +0000"/>
                            <attachment id="14037" name="lctl.dk1.tgz" size="219" author="ihara" created="Wed, 29 Jan 2014 23:47:06 +0000"/>
                            <attachment id="14040" name="messages.after_call_trace" size="1837143" author="ihara" created="Thu, 30 Jan 2014 15:08:36 +0000"/>
                            <attachment id="14041" name="messages.after_osc_msg" size="2627185" author="ihara" created="Thu, 30 Jan 2014 15:08:36 +0000"/>
                            <attachment id="14042" name="messages.before_call_trace" size="1053841" author="ihara" created="Thu, 30 Jan 2014 15:08:36 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwdrj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>12438</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10021"><![CDATA[2]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>