<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:11:14 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-14611] racer test 1 hangs in ls/locking</title>
                <link>https://jira.whamcloud.com/browse/LU-14611</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;There are a variety of racer test_1 hangs that look familiar to existing tickets, but the call traces don&#8217;t match. I&#8217;m opening this ticket to capture these racer hangs that don&#8217;t match existing tickets but all have similar call traces on the MDS. The root cause of this/these issues may be the same as existing tickets, but look different. &lt;/p&gt;

&lt;p&gt;All of these tests have similar call traces to the following tickets:&lt;br/&gt;
Not &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11751&quot; title=&quot;racer deadlocks due to DOM glimpse request&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11751&quot;&gt;&lt;del&gt;LU-11751&lt;/del&gt;&lt;/a&gt; because there are not dom locks in the traces.&lt;br/&gt;
Not &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12037&quot; title=&quot;Possible DNE issue leading to hung filesystem&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12037&quot;&gt;&lt;del&gt;LU-12037&lt;/del&gt;&lt;/a&gt; because not using DNE and don&#8217;t see osp_md locks in traces&lt;br/&gt;
Not &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10852&quot; title=&quot;racer test 1 hangs in locking&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10852&quot;&gt;LU-10852&lt;/a&gt; because no lod_object nor osp_md locks in traces&lt;br/&gt;
Similar to:&lt;br/&gt;
Not &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11359&quot; title=&quot;racer test 1 times out with client hung in dir_create.sh, ls, &#8230; and MDS in ldlm_completion_ast()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11359&quot;&gt;&lt;del&gt;LU-11359&lt;/del&gt;&lt;/a&gt; because no mdt_dom_discard_data in call trace&lt;br/&gt;
Not &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11358&quot; title=&quot;racer test 1 hangs in locking with DNE&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11358&quot;&gt;LU-11358&lt;/a&gt; because not using DNE&lt;/p&gt;

&lt;p&gt;For Lustre 2.14.51 CentOS 8.3 client/server no DNE at &lt;a href=&quot;https://testing.whamcloud.com/test_sets/205c9572-7e2a-4a12-ac67-0e717b44ee7c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/205c9572-7e2a-4a12-ac67-0e717b44ee7c&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[66773.230283] Lustre: mdt00_030: service thread pid 3250671 was inactive for 63.465 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
[66773.230306] Pid: 3250667, comm: mdt00_027 4.18.0-240.1.1.el8_lustre.x86_64 #1 SMP Tue Mar 23 05:51:49 UTC 2021
[66773.233745] Lustre: Skipped 1 previous similar message
[66773.235770] Call Trace TBD:
[66773.236421] [&amp;lt;0&amp;gt;] ldlm_completion_ast+0x789/0x8e0 [ptlrpc]
[66773.238637] [&amp;lt;0&amp;gt;] ldlm_cli_enqueue_local+0x2f9/0x830 [ptlrpc]
[66773.239982] [&amp;lt;0&amp;gt;] mdt_object_local_lock+0x506/0xb00 [mdt]
[66773.241078] [&amp;lt;0&amp;gt;] mdt_object_lock_internal+0x183/0x430 [mdt]
[66773.242236] [&amp;lt;0&amp;gt;] mdt_getattr_name_lock+0x843/0x1a00 [mdt]
[66773.243362] [&amp;lt;0&amp;gt;] mdt_intent_getattr+0x260/0x430 [mdt]
[66773.244433] [&amp;lt;0&amp;gt;] mdt_intent_opc+0x44d/0xa80 [mdt]
[66773.245423] [&amp;lt;0&amp;gt;] mdt_intent_policy+0x1f6/0x380 [mdt]
[66773.246510] [&amp;lt;0&amp;gt;] ldlm_lock_enqueue+0x4c1/0x9f0 [ptlrpc]
[66773.247761] [&amp;lt;0&amp;gt;] ldlm_handle_enqueue0+0x61a/0x16d0 [ptlrpc]
[66773.248962] [&amp;lt;0&amp;gt;] tgt_enqueue+0xa4/0x1f0 [ptlrpc]
[66773.250007] [&amp;lt;0&amp;gt;] tgt_request_handle+0xc78/0x1910 [ptlrpc]
[66773.251146] [&amp;lt;0&amp;gt;] ptlrpc_server_handle_request+0x31a/0xba0 [ptlrpc]
[66773.252430] [&amp;lt;0&amp;gt;] ptlrpc_main+0xba2/0x14a0 [ptlrpc]
[66773.253459] [&amp;lt;0&amp;gt;] kthread+0x112/0x130
[66773.254221] [&amp;lt;0&amp;gt;] ret_from_fork+0x35/0x40
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;On the OSS console&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[66664.337262] Pid: 3425654, comm: ll_ost_io00_007 4.18.0-240.1.1.el8_lustre.x86_64 #1 SMP Tue Mar 23 05:51:49 UTC 2021
[66664.340289] Call Trace TBD:
[66664.341777] [&amp;lt;0&amp;gt;] libcfs_call_trace+0x6f/0x90 [libcfs]
[66664.343472] [&amp;lt;0&amp;gt;] osd_trans_start+0x50c/0x530 [osd_ldiskfs]
[66664.345227] [&amp;lt;0&amp;gt;] ofd_commitrw_write+0x5bf/0x1990 [ofd]
[66664.346282] [&amp;lt;0&amp;gt;] ofd_commitrw+0x30e/0x970 [ofd]
[66664.348215] [&amp;lt;0&amp;gt;] tgt_brw_write+0x11f6/0x21b0 [ptlrpc]
[66664.349686] [&amp;lt;0&amp;gt;] tgt_request_handle+0xc78/0x1910 [ptlrpc]
[66664.351150] [&amp;lt;0&amp;gt;] ptlrpc_server_handle_request+0x31a/0xba0 [ptlrpc]
[66664.353326] [&amp;lt;0&amp;gt;] ptlrpc_main+0xba2/0x14a0 [ptlrpc]
[66664.354606] [&amp;lt;0&amp;gt;] kthread+0x112/0x130
[66664.355857] [&amp;lt;0&amp;gt;] ret_from_fork+0x35/0x40
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;


&lt;p&gt;We see something similar for interop testing:&lt;br/&gt;
2.14.51.42 clients/2.13.0 servers at &lt;a href=&quot;https://testing.whamcloud.com/test_sets/b709fcd0-f46b-4347-b768-dfc01efd3131&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/b709fcd0-f46b-4347-b768-dfc01efd3131&lt;/a&gt;&lt;br/&gt;
MDS console&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[  959.417633] format at mdt_io.c:215:mdt_rw_hpreq_check doesn&apos;t end in newline
[ 1026.377566] Lustre: mdt00_030: service thread pid 14229 was inactive for 64.034 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
[ 1026.377589] Lustre: mdt00_018: service thread pid 14206 was inactive for 64.037 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
[ 1026.377609] Pid: 14206, comm: mdt00_018 3.10.0-1062.1.1.el7_lustre.x86_64 #1 SMP Thu Dec 5 10:35:21 UTC 2019
[ 1026.377610] Call Trace:
[ 1026.379313]  [&amp;lt;ffffffffc0e20bd0&amp;gt;] ldlm_completion_ast+0x430/0x860 [ptlrpc]
[ 1026.379445]  [&amp;lt;ffffffffc0e21e0c&amp;gt;] ldlm_cli_enqueue_local+0x25c/0x850 [ptlrpc]
[ 1026.379939]  [&amp;lt;ffffffffc124e833&amp;gt;] mdt_object_local_lock+0x523/0xb50 [mdt]
[ 1026.379991]  [&amp;lt;ffffffffc124eed0&amp;gt;] mdt_object_lock_internal+0x70/0x360 [mdt]
[ 1026.380005]  [&amp;lt;ffffffffc125084a&amp;gt;] mdt_getattr_name_lock+0x92a/0x1c90 [mdt]
[ 1026.380017]  [&amp;lt;ffffffffc1257fe5&amp;gt;] mdt_intent_getattr+0x2b5/0x480 [mdt]
[ 1026.380050]  [&amp;lt;ffffffffc124ccfa&amp;gt;] mdt_intent_opc+0x1ba/0xb40 [mdt]
[ 1026.380063]  [&amp;lt;ffffffffc12554b4&amp;gt;] mdt_intent_policy+0x1a4/0x360 [mdt]
[ 1026.380116]  [&amp;lt;ffffffffc0e07e16&amp;gt;] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc]
[ 1026.380190]  [&amp;lt;ffffffffc0e30476&amp;gt;] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc]
[ 1026.380676]  [&amp;lt;ffffffffc0eba032&amp;gt;] tgt_enqueue+0x62/0x210 [ptlrpc]
[ 1026.380773]  [&amp;lt;ffffffffc0ec282a&amp;gt;] tgt_request_handle+0x98a/0x1630 [ptlrpc]
[ 1026.380822]  [&amp;lt;ffffffffc0e64a86&amp;gt;] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc]
[ 1026.380885]  [&amp;lt;ffffffffc0e685bc&amp;gt;] ptlrpc_main+0xbac/0x1540 [ptlrpc]
[ 1026.381081]  [&amp;lt;ffffffffa32c50d1&amp;gt;] kthread+0xd1/0xe0
[ 1026.381185]  [&amp;lt;ffffffffa398cd37&amp;gt;] ret_from_fork_nospec_end+0x0/0x39
[ 1026.382196]  [&amp;lt;ffffffffffffffff&amp;gt;] 0xffffffffffffffff
[ 1026.382222] Pid: 14204, comm: mdt00_016 3.10.0-1062.1.1.el7_lustre.x86_64 #1 SMP Thu Dec 5 10:35:21 UTC 2019
[ 1026.382222] Call Trace:
[ 1026.382265]  [&amp;lt;ffffffffc0e20bd0&amp;gt;] ldlm_completion_ast+0x430/0x860 [ptlrpc]
[ 1026.382297]  [&amp;lt;ffffffffc0e21e0c&amp;gt;] ldlm_cli_enqueue_local+0x25c/0x850 [ptlrpc]
[ 1026.382311]  [&amp;lt;ffffffffc124e833&amp;gt;] mdt_object_local_lock+0x523/0xb50 [mdt]
[ 1026.382323]  [&amp;lt;ffffffffc124eed0&amp;gt;] mdt_object_lock_internal+0x70/0x360 [mdt]
[ 1026.382336]  [&amp;lt;ffffffffc125084a&amp;gt;] mdt_getattr_name_lock+0x92a/0x1c90 [mdt]
[ 1026.382348]  [&amp;lt;ffffffffc1257fe5&amp;gt;] mdt_intent_getattr+0x2b5/0x480 [mdt]
[ 1026.382360]  [&amp;lt;ffffffffc124ccfa&amp;gt;] mdt_intent_opc+0x1ba/0xb40 [mdt]
[ 1026.382372]  [&amp;lt;ffffffffc12554b4&amp;gt;] mdt_intent_policy+0x1a4/0x360 [mdt]
[ 1026.382403]  [&amp;lt;ffffffffc0e07e16&amp;gt;] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc]
[ 1026.382437]  [&amp;lt;ffffffffc0e30476&amp;gt;] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc]
[ 1026.382483]  [&amp;lt;ffffffffc0eba032&amp;gt;] tgt_enqueue+0x62/0x210 [ptlrpc]
[ 1026.382533]  [&amp;lt;ffffffffc0ec282a&amp;gt;] tgt_request_handle+0x98a/0x1630 [ptlrpc]
[ 1026.382573]  [&amp;lt;ffffffffc0e64a86&amp;gt;] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc]
[ 1026.382610]  [&amp;lt;ffffffffc0e685bc&amp;gt;] ptlrpc_main+0xbac/0x1540 [ptlrpc]
[ 1026.382614]  [&amp;lt;ffffffffa32c50d1&amp;gt;] kthread+0xd1/0xe0
[ 1026.382616]  [&amp;lt;ffffffffa398cd37&amp;gt;] ret_from_fork_nospec_end+0x0/0x39
[ 1026.382624]  [&amp;lt;ffffffffffffffff&amp;gt;] 0xffffffffffffffff
[ 1026.382631] Pid: 14198, comm: mdt00_010 3.10.0-1062.1.1.el7_lustre.x86_64 #1 SMP Thu Dec 5 10:35:21 UTC 2019
[ 1026.382631] Call Trace:
[ 1026.382671]  [&amp;lt;ffffffffc0e20bd0&amp;gt;] ldlm_completion_ast+0x430/0x860 [ptlrpc]
[ 1026.382704]  [&amp;lt;ffffffffc0e21e0c&amp;gt;] ldlm_cli_enqueue_local+0x25c/0x850 [ptlrpc]
[ 1026.382717]  [&amp;lt;ffffffffc124e833&amp;gt;] mdt_object_local_lock+0x523/0xb50 [mdt]
[ 1026.382730]  [&amp;lt;ffffffffc124eed0&amp;gt;] mdt_object_lock_internal+0x70/0x360 [mdt]
[ 1026.382742]  [&amp;lt;ffffffffc125084a&amp;gt;] mdt_getattr_name_lock+0x92a/0x1c90 [mdt]
[ 1026.382754]  [&amp;lt;ffffffffc1257fe5&amp;gt;] mdt_intent_getattr+0x2b5/0x480 [mdt]
[ 1026.382767]  [&amp;lt;ffffffffc124ccfa&amp;gt;] mdt_intent_opc+0x1ba/0xb40 [mdt]
[ 1026.382778]  [&amp;lt;ffffffffc12554b4&amp;gt;] mdt_intent_policy+0x1a4/0x360 [mdt]
[ 1026.382810]  [&amp;lt;ffffffffc0e07e16&amp;gt;] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc]
[ 1026.382852]  [&amp;lt;ffffffffc0e30476&amp;gt;] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc]
[ 1026.382893]  [&amp;lt;ffffffffc0eba032&amp;gt;] tgt_enqueue+0x62/0x210 [ptlrpc]
[ 1026.382935]  [&amp;lt;ffffffffc0ec282a&amp;gt;] tgt_request_handle+0x98a/0x1630 [ptlrpc]
[ 1026.382972]  [&amp;lt;ffffffffc0e64a86&amp;gt;] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc]
[ 1026.383031]  [&amp;lt;ffffffffc0e685bc&amp;gt;] ptlrpc_main+0xbac/0x1540 [ptlrpc]
[ 1026.383036]  [&amp;lt;ffffffffa32c50d1&amp;gt;] kthread+0xd1/0xe0
[ 1026.383039]  [&amp;lt;ffffffffa398cd37&amp;gt;] ret_from_fork_nospec_end+0x0/0x39
[ 1026.383045]  [&amp;lt;ffffffffffffffff&amp;gt;] 0xffffffffffffffff
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment></environment>
        <key id="63772">LU-14611</key>
            <summary>racer test 1 hangs in ls/locking</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="simmonsja">James A Simmons</assignee>
                                    <reporter username="jamesanunez">James Nunez</reporter>
                        <labels>
                            <label>ORNL</label>
                    </labels>
                <created>Tue, 13 Apr 2021 19:34:57 +0000</created>
                <updated>Thu, 16 Mar 2023 18:20:41 +0000</updated>
                                            <version>Lustre 2.12.7</version>
                    <version>Lustre 2.15.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>8</watches>
                                                                            <comments>
                            <comment id="364979" author="simmonsja" created="Mon, 6 Mar 2023 15:47:12 +0000"  >&lt;p&gt;We also have production system not using DNE or DoM that is now showing this issue.&#160; The patch for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16389&quot; title=&quot;Lustre 2.12.9 ksocklnd crash with 100+GB ethernet&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16389&quot;&gt;LU-16389&lt;/a&gt; was recently added to our clients so it brought this problem to the surface. I don&apos;t blame &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16389&quot; title=&quot;Lustre 2.12.9 ksocklnd crash with 100+GB ethernet&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16389&quot;&gt;LU-16389&lt;/a&gt; patch as the source since this problem has been around for a long time. Our&#160;&lt;/p&gt;

&lt;p&gt;at_max = 600&lt;/p&gt;

&lt;p&gt;timeout = 100&lt;/p&gt;

&lt;p&gt;Any suggestions to work around this?&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="53268">LU-11358</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="55781">LU-12354</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i01s2v:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>