<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:24:55 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-2404] Many threads stuck on cfs_down_read of lov-&gt;lo_type_guard</title>
                <link>https://jira.whamcloud.com/browse/LU-2404</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We&apos;ve had reports on Seqoia of many clients getting stuck during reads. I had a chance to dump the stacks on a client in this state and saw many threads which appeared to be stuck on the &lt;tt&gt;lov-&amp;gt;lo_type_gaurd&lt;/tt&gt; in a &lt;tt&gt;cfs_down_read&lt;/tt&gt;. Here&apos;s an example stack:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2012-11-28 09:33:05.886131 {DefaultControlEventListener} [mmcs]{716}.0.0: sysiod        D 00000fffa66863e0     0  8855   3105 0x00000002
2012-11-28 09:33:05.886181 {DefaultControlEventListener} [mmcs]{716}.0.0: Call Trace:
2012-11-28 09:33:05.886232 {DefaultControlEventListener} [mmcs]{716}.0.0: [c0000003ec8e7160] [c0000003ec8e71f0] 0xc0000003ec8e71f0 (unreliable)
2012-11-28 09:33:05.886282 {DefaultControlEventListener} [mmcs]{716}.0.0: [c0000003ec8e7330] [c000000000008de0] .__switch_to+0xc4/0x100
2012-11-28 09:33:05.886333 {DefaultControlEventListener} [mmcs]{716}.0.0: [c0000003ec8e73c0] [c00000000042b0e0] .schedule+0x858/0x9c0
2012-11-28 09:33:05.886384 {DefaultControlEventListener} [mmcs]{716}.0.0: [c0000003ec8e7670] [c00000000042dcac] .rwsem_down_failed_common+0x270/0x2b8
2012-11-28 09:33:05.886435 {DefaultControlEventListener} [mmcs]{716}.0.0: [c0000003ec8e7730] [c00000000042dd60] .rwsem_down_read_failed+0x2c/0x44
2012-11-28 09:33:05.886486 {DefaultControlEventListener} [mmcs]{716}.0.0: [c0000003ec8e77d0] [c00000000042cee8] .down_read+0x30/0x44
2012-11-28 09:33:05.886537 {DefaultControlEventListener} [mmcs]{716}.0.0: [c0000003ec8e7840] [80000000051460d8] .lov_lsm_addref+0x48/0x200 [lov]
2012-11-28 09:33:05.886587 {DefaultControlEventListener} [mmcs]{716}.0.0: [c0000003ec8e78e0] [8000000005146e94] .lov_io_init+0x84/0x160 [lov]
2012-11-28 09:33:05.886638 {DefaultControlEventListener} [mmcs]{716}.0.0: [c0000003ec8e7980] [800000000247aea4] .cl_io_init0+0x104/0x260 [obdclass]
2012-11-28 09:33:05.886689 {DefaultControlEventListener} [mmcs]{716}.0.0: [c0000003ec8e7a30] [800000000695a09c] .ll_file_io_generic+0x11c/0x670 [lustre]
2012-11-28 09:33:05.886740 {DefaultControlEventListener} [mmcs]{716}.0.0: [c0000003ec8e7b30] [800000000695b134] .ll_file_aio_read+0x1d4/0x3a0 [lustre]
2012-11-28 09:33:05.886790 {DefaultControlEventListener} [mmcs]{716}.0.0: [c0000003ec8e7c00] [800000000695b450] .ll_file_read+0x150/0x320 [lustre]
2012-11-28 09:33:05.886841 {DefaultControlEventListener} [mmcs]{716}.0.0: [c0000003ec8e7ce0] [c0000000000d21a0] .vfs_read+0xd0/0x1c4
2012-11-28 09:33:05.886893 {DefaultControlEventListener} [mmcs]{716}.0.0: [c0000003ec8e7d80] [c0000000000d2390] .SyS_read+0x54/0x98
2012-11-28 09:33:05.886943 {DefaultControlEventListener} [mmcs]{716}.0.0: [c0000003ec8e7e30] [c000000000000580] syscall_exit+0x0/0x2c
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;My initial guess is there is another thread holding a write lock on that semaphore, but I haven&apos;t able to pin down a thread holding the write lock looking at the stacks. We also don&apos;t have access to crash or kdumps enabled, so I can&apos;t directly inspect the mutex.&lt;/p&gt;

&lt;p&gt;I&apos;ve attached the stacks for all processes on the system (&lt;tt&gt;sysrq-t&lt;/tt&gt;), unfortunately I forgot to dump the running processes (&lt;tt&gt;sysrq-l&lt;/tt&gt;).&lt;/p&gt;</description>
                <environment></environment>
        <key id="16796">LU-2404</key>
            <summary>Many threads stuck on cfs_down_read of lov-&gt;lo_type_guard</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="jay">Jinshan Xiong</assignee>
                                    <reporter username="prakash">Prakash Surya</reporter>
                        <labels>
                            <label>sequoia</label>
                    </labels>
                <created>Wed, 28 Nov 2012 13:22:40 +0000</created>
                <updated>Mon, 3 Dec 2012 17:50:55 +0000</updated>
                            <resolved>Mon, 3 Dec 2012 12:28:54 +0000</resolved>
                                    <version>Lustre 2.4.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="48543" author="pjones" created="Thu, 29 Nov 2012 14:30:31 +0000"  >&lt;p&gt;Alex, what do you think about this one?&lt;/p&gt;</comment>
                            <comment id="48617" author="pjones" created="Fri, 30 Nov 2012 14:51:41 +0000"  >&lt;p&gt;Jinshan&lt;/p&gt;

&lt;p&gt;Andreas suggested that you should review this one&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="48622" author="jay" created="Fri, 30 Nov 2012 15:03:08 +0000"  >&lt;p&gt;This problem should have been fixed at &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1876&quot; title=&quot;Layout Lock Server Patch Landings to Master&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1876&quot;&gt;&lt;del&gt;LU-1876&lt;/del&gt;&lt;/a&gt; and the commit number is: ecaba99677b28536f9c376b2b835b554a7792668.&lt;/p&gt;

&lt;p&gt;Let&apos;s leave this ticket open until it&apos;s verified in the next test&lt;/p&gt;</comment>
                            <comment id="48671" author="prakash" created="Mon, 3 Dec 2012 12:14:50 +0000"  >&lt;p&gt;Thanks Jinshan. I&apos;ll pull that into our branch. I don&apos;t have a solid reproducer for this issue, so I&apos;m OK if you want to close it. It can always be reopened if it is seen with the fix applied.&lt;/p&gt;</comment>
                            <comment id="48674" author="pjones" created="Mon, 3 Dec 2012 12:28:54 +0000"  >&lt;p&gt;ok thanks Prakash!&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="15863">LU-1876</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="12066" name="RB2-ID-J04.log" size="2210554" author="prakash" created="Wed, 28 Nov 2012 13:22:40 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvczr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>5698</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>