<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:25:40 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-2492] MDT thread stuck: mdd_object_find -&gt; lu_object_find_at</title>
                <link>https://jira.whamcloud.com/browse/LU-2492</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;I think this is a duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1640&quot; title=&quot;Test failure on test suite lustre-rsync-test, subtest test_2c&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1640&quot;&gt;&lt;del&gt;LU-1640&lt;/del&gt;&lt;/a&gt;, but just in case it&apos;s not I&apos;m opening this as a separate ticket.                                                 &lt;/p&gt;

&lt;p&gt;I&apos;ve found the following message on the console of our Grove MDS today after upgrading to &lt;tt&gt;2.3.57-1chaos-1surya1&lt;/tt&gt; (&lt;tt&gt;2.3.57-1chaos&lt;/tt&gt; + a couple changes for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2109&quot; title=&quot;__llog_process_thread() GPF&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2109&quot;&gt;&lt;del&gt;LU-2109&lt;/del&gt;&lt;/a&gt;) and rebooting a few times.                             &lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;                                                                         
2012-12-13 13:25:25 LNet: Service thread pid 33078 was inactive for 450.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
2012-12-13 13:25:25 Pid: 33078, comm: mdt01_020                                    
2012-12-13 13:25:25                                                                
2012-12-13 13:25:25 Call Trace:                                                    
2012-12-13 13:25:25  [&amp;lt;ffffffffa05b07de&amp;gt;] cfs_waitq_wait+0xe/0x10 [libcfs]         
2012-12-13 13:25:25  [&amp;lt;ffffffffa0764fe3&amp;gt;] lu_object_find_at+0xb3/0x460 [obdclass]
2012-12-13 13:25:25  [&amp;lt;ffffffff8105ea30&amp;gt;] ? default_wake_function+0x0/0x20         
2012-12-13 13:25:25  [&amp;lt;ffffffff8126c735&amp;gt;] ? _atomic_dec_and_lock+0x55/0x80         
2012-12-13 13:25:25  [&amp;lt;ffffffffa07653cf&amp;gt;] lu_object_find_slice+0x1f/0x80 [obdclass]
2012-12-13 13:25:25  [&amp;lt;ffffffffa0c27310&amp;gt;] mdd_object_find+0x10/0x70 [mdd]          
2012-12-13 13:25:25  [&amp;lt;ffffffffa0c2b89f&amp;gt;] mdd_path+0x35f/0x1060 [mdd]              
2012-12-13 13:25:25  [&amp;lt;ffffffffa08f30f1&amp;gt;] ? lustre_pack_reply_v2+0x1e1/0x280 [ptlrpc]
2012-12-13 13:25:25  [&amp;lt;ffffffffa0c8dfd7&amp;gt;] mdt_get_info+0x567/0xbb0 [mdt]           
2012-12-13 13:25:25  [&amp;lt;ffffffffa0c8941d&amp;gt;] ? mdt_unpack_req_pack_rep+0x4d/0x4c0 [mdt]
2012-12-13 13:25:25  [&amp;lt;ffffffffa0c91782&amp;gt;] mdt_handle_common+0x932/0x1760 [mdt]  
2012-12-13 13:25:25  [&amp;lt;ffffffffa0c92685&amp;gt;] mdt_regular_handle+0x15/0x20 [mdt]       
2012-12-13 13:25:25  [&amp;lt;ffffffffa09036ac&amp;gt;] ptlrpc_server_handle_request+0x41c/0xe00 [ptlrpc]
2012-12-13 13:25:25  [&amp;lt;ffffffffa05b06be&amp;gt;] ? cfs_timer_arm+0xe/0x10 [libcfs]        
2012-12-13 13:25:25  [&amp;lt;ffffffffa05c214f&amp;gt;] ? lc_watchdog_touch+0x6f/0x180 [libcfs]
2012-12-13 13:25:25  [&amp;lt;ffffffffa08faa79&amp;gt;] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
2012-12-13 13:25:25  [&amp;lt;ffffffff81051ba3&amp;gt;] ? __wake_up+0x53/0x70                    
2012-12-13 13:25:25  [&amp;lt;ffffffffa0904c45&amp;gt;] ptlrpc_main+0xbb5/0x1970 [ptlrpc]        
2012-12-13 13:25:25  [&amp;lt;ffffffffa0904090&amp;gt;] ? ptlrpc_main+0x0/0x1970 [ptlrpc]        
2012-12-13 13:25:25  [&amp;lt;ffffffff8100c14a&amp;gt;] child_rip+0xa/0x20                       
2012-12-13 13:25:25  [&amp;lt;ffffffffa0904090&amp;gt;] ? ptlrpc_main+0x0/0x1970 [ptlrpc]        
2012-12-13 13:25:25  [&amp;lt;ffffffffa0904090&amp;gt;] ? ptlrpc_main+0x0/0x1970 [ptlrpc]        
2012-12-13 13:25:25  [&amp;lt;ffffffff8100c140&amp;gt;] ? child_rip+0x0/0x20                     
2012-12-13 13:25:25                                                                
2012-12-13 13:25:25 LustreError: dumping log to /tmp/lustre-log.1355433925.33078
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;                                                                         

&lt;p&gt;Even nearly an hour later, it looks like the thread is still stuck. I continually see the following message every few minutes:                           &lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;                                                                         
2012-12-13 14:22:12 INFO: task mdt01_020:33078 blocked for more than 120 seconds.
2012-12-13 14:22:12 &quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
2012-12-13 14:22:12 mdt01_020     D 0000000000000004     0 33078      2 0x00000000
2012-12-13 14:22:12  ffff880f444d7ae0 0000000000000046 0000000000000000 ffff880f444d7b60
2012-12-13 14:22:12  ffff880f444d7a90 ffffc9007e22e02c 0000000000000246 0000000000000246
2012-12-13 14:22:12  ffff880f444d5ab8 ffff880f444d7fd8 000000000000f4e8 ffff880f444d5ab8
2012-12-13 14:22:12 Call Trace:                                                    
2012-12-13 14:22:12  [&amp;lt;ffffffffa05b07de&amp;gt;] cfs_waitq_wait+0xe/0x10 [libcfs]         
2012-12-13 14:22:12  [&amp;lt;ffffffffa0764fe3&amp;gt;] lu_object_find_at+0xb3/0x460 [obdclass]
2012-12-13 14:22:12  [&amp;lt;ffffffff8105ea30&amp;gt;] ? default_wake_function+0x0/0x20         
2012-12-13 14:22:12  [&amp;lt;ffffffff8126c735&amp;gt;] ? _atomic_dec_and_lock+0x55/0x80         
2012-12-13 14:22:12  [&amp;lt;ffffffffa07653cf&amp;gt;] lu_object_find_slice+0x1f/0x80 [obdclass]
2012-12-13 14:22:12  [&amp;lt;ffffffffa0c27310&amp;gt;] mdd_object_find+0x10/0x70 [mdd]          
2012-12-13 14:22:12  [&amp;lt;ffffffffa0c2b89f&amp;gt;] mdd_path+0x35f/0x1060 [mdd]              
2012-12-13 14:22:12  [&amp;lt;ffffffffa08f30f1&amp;gt;] ? lustre_pack_reply_v2+0x1e1/0x280 [ptlrpc]
2012-12-13 14:22:12  [&amp;lt;ffffffffa0c8dfd7&amp;gt;] mdt_get_info+0x567/0xbb0 [mdt]           
2012-12-13 14:22:12  [&amp;lt;ffffffffa0c8941d&amp;gt;] ? mdt_unpack_req_pack_rep+0x4d/0x4c0 [mdt]
2012-12-13 14:22:12  [&amp;lt;ffffffffa0c91782&amp;gt;] mdt_handle_common+0x932/0x1760 [mdt]  
2012-12-13 14:22:12  [&amp;lt;ffffffffa0c92685&amp;gt;] mdt_regular_handle+0x15/0x20 [mdt]       
2012-12-13 14:22:12  [&amp;lt;ffffffffa09036ac&amp;gt;] ptlrpc_server_handle_request+0x41c/0xe00 [ptlrpc]
2012-12-13 14:22:12  [&amp;lt;ffffffffa05b06be&amp;gt;] ? cfs_timer_arm+0xe/0x10 [libcfs]        
2012-12-13 14:22:12  [&amp;lt;ffffffffa05c214f&amp;gt;] ? lc_watchdog_touch+0x6f/0x180 [libcfs]
2012-12-13 14:22:12  [&amp;lt;ffffffffa08faa79&amp;gt;] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
2012-12-13 14:22:12  [&amp;lt;ffffffff81051ba3&amp;gt;] ? __wake_up+0x53/0x70                    
2012-12-13 14:22:12  [&amp;lt;ffffffffa0904c45&amp;gt;] ptlrpc_main+0xbb5/0x1970 [ptlrpc]        
2012-12-13 14:22:12  [&amp;lt;ffffffffa0904090&amp;gt;] ? ptlrpc_main+0x0/0x1970 [ptlrpc]        
2012-12-13 14:22:12  [&amp;lt;ffffffff8100c14a&amp;gt;] child_rip+0xa/0x20                       
2012-12-13 14:22:12  [&amp;lt;ffffffffa0904090&amp;gt;] ? ptlrpc_main+0x0/0x1970 [ptlrpc]        
2012-12-13 14:22:12  [&amp;lt;ffffffffa0904090&amp;gt;] ? ptlrpc_main+0x0/0x1970 [ptlrpc]        
2012-12-13 14:22:12  [&amp;lt;ffffffff8100c140&amp;gt;] ? child_rip+0x0/0x20                     
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;                                                                         

&lt;p&gt;I&apos;ve attached the lustre log file dumped by the LNet message above:                &lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;                                                                         
2012-12-13 13:25:25 LustreError: dumping log to /tmp/lustre-log.1355433925.33078
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment></environment>
        <key id="16929">LU-2492</key>
            <summary>MDT thread stuck: mdd_object_find -&gt; lu_object_find_at</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bobijam">Zhenyu Xu</assignee>
                                    <reporter username="prakash">Prakash Surya</reporter>
                        <labels>
                            <label>MB</label>
                            <label>sequoia</label>
                    </labels>
                <created>Thu, 13 Dec 2012 17:52:03 +0000</created>
                <updated>Wed, 19 Mar 2014 17:39:31 +0000</updated>
                            <resolved>Tue, 8 Jan 2013 00:10:29 +0000</resolved>
                                    <version>Lustre 2.4.0</version>
                                    <fixVersion>Lustre 2.4.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="49223" author="pjones" created="Thu, 13 Dec 2012 19:15:22 +0000"  >&lt;p&gt;Bobijam&lt;/p&gt;

&lt;p&gt;Could you please comment as to whether this issue is a duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1640&quot; title=&quot;Test failure on test suite lustre-rsync-test, subtest test_2c&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1640&quot;&gt;&lt;del&gt;LU-1640&lt;/del&gt;&lt;/a&gt;?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="49350" author="bobijam" created="Mon, 17 Dec 2012 23:12:32 +0000"  >&lt;p&gt;patch tracking at &lt;a href=&quot;http://review.whamcloud.com/3439&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/3439&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedHeader panelHeader&quot; style=&quot;border-bottom-width: 1px;&quot;&gt;&lt;b&gt;commit message&lt;/b&gt;&lt;/div&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;LU-2492 obdclass: lu_object_find_at() waits forever

In lu_object_put(), cfs_hash_bd_dec_and_lock() could possibly count
the refcount added by htable_lookup(), so if the thread run to this
code path is the last object holder, it should have gone down to the
free part, but because the htable_lookup() messes the refcount, the
last object holder misses the change to free the object,
table_lookup() will then always find the 0 referred object in the hash
table, which no one will delete it from hash table and free it any
longer, and lu_object_find_at() keeps waiting forever.

In this case, we&apos;d do the unhash and free process in htable_lookup().

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="50093" author="bobijam" created="Tue, 8 Jan 2013 00:10:29 +0000"  >&lt;p&gt;landed on master for 2.4.0&lt;/p&gt;</comment>
                            <comment id="50132" author="prakash" created="Tue, 8 Jan 2013 10:10:21 +0000"  >&lt;p&gt;Awesome! Thanks Bobijam.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                        <issuelink>
            <issuekey id="20742">LU-3870</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="15250">LU-1640</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="12097" name="lustre-log.1355433925.33078" size="821018" author="prakash" created="Thu, 13 Dec 2012 17:52:04 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvdtj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>5846</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>