<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:06:56 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-14112] parallel-scale-nfsv3 test racer_on_nfs hangs in locking</title>
                <link>https://jira.whamcloud.com/browse/LU-14112</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;parallel-scale-nfsv3 test_racer_on_nfs hangs in locking. Looking at the failure at &lt;a href=&quot;https://testing.whamcloud.com/test_sets/be1b9c8d-3095-43fa-aaca-e43c7cbb84ce&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/be1b9c8d-3095-43fa-aaca-e43c7cbb84ce&lt;/a&gt;, in both client console logs, we see several dd and cp processes hung and nfs server not responding messages.&lt;br/&gt;
From client1 (vm5) console log,&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[57611.728321] Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client ======================================= 21:34:59 (1604093699)
[57611.932785] Lustre: DEBUG MARKER: MDSCOUNT=4 OSTCOUNT=8 LFS=/usr/bin/lfs /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/d0.parallel-scale-nfs
[57614.148227] 8[14798]: segfault at 8 ip 00007f76d2c73798 sp 00007fff8d8b79f0 error 4 in ld-2.17.so[7f76d2c68000+22000]
[57639.708629] 19[3351]: segfault at 8 ip 00007f1847a8b798 sp 00007ffd4c308b80 error 4 in ld-2.17.so[7f1847a80000+22000]
[57641.608140] 6[7340]: segfault at 0 ip           (null) sp 00007ffc2bb99258 error 14 in 6[400000+6000]
[57734.427404] 1[458]: segfault at 8 ip 00007f8013852798 sp 00007fff1727cc30 error 4 in ld-2.17.so[7f8013847000+22000]
[57740.031404] 16[11741]: segfault at 8 ip 00007f3ad2163798 sp 00007ffdce934730 error 4 in ld-2.17.so[7f3ad2158000+22000]
[58019.646614] nfs: server trevis-57vm8 not responding, still trying
[58019.647966] nfs: server trevis-57vm8 not responding, still trying
[58019.649123] nfs: server trevis-57vm8 not responding, still trying
[58020.038570] nfs: server trevis-57vm8 not responding, still trying
[58020.467569] nfs: server trevis-57vm8 not responding, still trying
[58020.911526] nfs: server trevis-57vm8 not responding, still trying
[58020.912834] nfs: server trevis-57vm8 not responding, still trying
[58020.916520] nfs: server trevis-57vm8 not responding, still trying
[58078.437386] INFO: task dd:19266 blocked for more than 120 seconds.
[58078.438624] &quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
[58078.439924] dd              D ffff971669d362a0     0 19266      1 0x00000084
[58078.441291] Call Trace:
[58078.441816]  [&amp;lt;ffffffffa797f340&amp;gt;] ? bit_wait+0x50/0x50
[58078.442797]  [&amp;lt;ffffffffa79811f9&amp;gt;] schedule+0x29/0x70
[58078.443700]  [&amp;lt;ffffffffa797ed01&amp;gt;] schedule_timeout+0x221/0x2d0
[58078.444914]  [&amp;lt;ffffffffc032f7e6&amp;gt;] ? rpc_run_task+0xf6/0x150 [sunrpc]
[58078.445993]  [&amp;lt;ffffffffc033e5d0&amp;gt;] ? rpc_put_task+0x10/0x20 [sunrpc]
[58078.447264]  [&amp;lt;ffffffffa726d35e&amp;gt;] ? kvm_clock_get_cycles+0x1e/0x20
[58078.448360]  [&amp;lt;ffffffffa7306392&amp;gt;] ? ktime_get_ts64+0x52/0xf0
[58078.449348]  [&amp;lt;ffffffffa797f340&amp;gt;] ? bit_wait+0x50/0x50
[58078.450237]  [&amp;lt;ffffffffa79808ed&amp;gt;] io_schedule_timeout+0xad/0x130
[58078.451240]  [&amp;lt;ffffffffa7980988&amp;gt;] io_schedule+0x18/0x20
[58078.452200]  [&amp;lt;ffffffffa797f351&amp;gt;] bit_wait_io+0x11/0x50
[58078.453120]  [&amp;lt;ffffffffa797ee77&amp;gt;] __wait_on_bit+0x67/0x90
[58078.454100]  [&amp;lt;ffffffffa73bcb91&amp;gt;] wait_on_page_bit+0x81/0xa0
[58078.455117]  [&amp;lt;ffffffffa72c74d0&amp;gt;] ? wake_bit_function+0x40/0x40
[58078.456155]  [&amp;lt;ffffffffa73bccc1&amp;gt;] __filemap_fdatawait_range+0x111/0x190
[58078.457363]  [&amp;lt;ffffffffa73bcd54&amp;gt;] filemap_fdatawait_range+0x14/0x30
[58078.458460]  [&amp;lt;ffffffffa73bf236&amp;gt;] filemap_write_and_wait_range+0x56/0x90
[58078.459672]  [&amp;lt;ffffffffc05dc746&amp;gt;] nfs_file_fsync+0x86/0x110 [nfs]
[58078.460758]  [&amp;lt;ffffffffa747fc2e&amp;gt;] vfs_fsync+0x2e/0x40
[58078.461674]  [&amp;lt;ffffffffc05dcb86&amp;gt;] nfs_file_flush+0x46/0x60 [nfs]
[58078.462737]  [&amp;lt;ffffffffa7447f87&amp;gt;] filp_close+0x37/0x90
[58078.463666]  [&amp;lt;ffffffffa746b76c&amp;gt;] __close_fd+0x8c/0xb0
[58078.464578]  [&amp;lt;ffffffffa7449aa3&amp;gt;] SyS_close+0x23/0x50
[58078.465473]  [&amp;lt;ffffffffa798dede&amp;gt;] system_call_fastpath+0x25/0x2a
[58078.466528]  [&amp;lt;ffffffffa798de21&amp;gt;] ? system_call_after_swapgs+0xae/0x146
[58092.571814] nfs: server trevis-57vm8 not responding, still trying
[58097.755630] nfs: server trevis-57vm8 not responding, still trying
[58198.462827] INFO: task dd:19266 blocked for more than 120 seconds.
[58198.464059] &quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
[58198.465353] dd              D ffff971669d362a0     0 19266      1 0x00000084
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;On the MDS1,3 (vm8) console log, we see&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[57623.525925] Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client ======================================= 21:34:59 (1604093699)
[57631.121275] LustreError: 14916:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x20004a783:0x313e:0x0] get parent: rc = -2
[57636.101177] LustreError: 14918:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x20004a783:0x3267:0x0] get parent: rc = -2
[57636.103551] LustreError: 14918:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 6 previous similar messages
[57637.950585] LustreError: 14913:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x20004a783:0x3288:0x0] get parent: rc = -2
[57647.690933] LustreError: 14918:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x20004a783:0x334a:0x0] get parent: rc = -2
[57656.183969] LustreError: 14919:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x20004a783:0x33ce:0x0] get parent: rc = -2
[57656.186407] LustreError: 14919:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 25 previous similar messages
[57671.270437] LustreError: 14916:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x20004a783:0x3542:0x0] get parent: rc = -2
[57671.273037] LustreError: 14916:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 12 previous similar messages
[57692.317861] LustreError: 14913:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x20004a783:0x3752:0x0] get parent: rc = -2
[57692.320930] LustreError: 14913:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 46 previous similar messages
[57725.517726] LustreError: 14918:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x20004a783:0x3ac4:0x0] get parent: rc = -2
[57725.521666] LustreError: 14918:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 128 previous similar messages
[57754.052394] reconnect_path: npd != pd
[57790.136336] LustreError: 14913:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x20004a783:0x3fe7:0x0] get parent: rc = -2
[57790.138707] LustreError: 14913:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 207 previous similar messages
[57913.388418] LNet: Service thread pid 22912 was inactive for 62.03s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
[57913.391801] Pid: 22912, comm: mdt00_009 3.10.0-1062.18.1.el7_lustre.x86_64 #1 SMP Mon Jun 8 13:47:48 UTC 2020
[57913.393552] Call Trace:
[57913.394117]  [&amp;lt;ffffffffc0c80150&amp;gt;] ldlm_completion_ast+0x430/0x860 [ptlrpc]
[57913.395890]  [&amp;lt;ffffffffc0c82181&amp;gt;] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc]
[57913.397261]  [&amp;lt;ffffffffc10b369b&amp;gt;] mdt_object_local_lock+0x50b/0xb20 [mdt]
[57913.398722]  [&amp;lt;ffffffffc10b3d20&amp;gt;] mdt_object_lock_internal+0x70/0x360 [mdt]
[57913.400030]  [&amp;lt;ffffffffc10b40c7&amp;gt;] mdt_object_lock_try+0x27/0xb0 [mdt]
[57913.401320]  [&amp;lt;ffffffffc10dab68&amp;gt;] mdt_object_open_lock+0x688/0xac0 [mdt]
[57913.402637]  [&amp;lt;ffffffffc10df3b7&amp;gt;] mdt_reint_open+0x15b7/0x3240 [mdt]
[57913.404007]  [&amp;lt;ffffffffc10d31b3&amp;gt;] mdt_reint_rec+0x83/0x210 [mdt]
[57913.405289]  [&amp;lt;ffffffffc10af383&amp;gt;] mdt_reint_internal+0x6e3/0xaf0 [mdt]
[57913.406619]  [&amp;lt;ffffffffc10bc622&amp;gt;] mdt_intent_open+0x82/0x3a0 [mdt]
[57913.407807]  [&amp;lt;ffffffffc10ba745&amp;gt;] mdt_intent_policy+0x435/0xd80 [mdt]
[57913.409088]  [&amp;lt;ffffffffc0c66e06&amp;gt;] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc]
[57913.410443]  [&amp;lt;ffffffffc0c8eba6&amp;gt;] ldlm_handle_enqueue0+0xa86/0x1620 [ptlrpc]
[57913.411785]  [&amp;lt;ffffffffc0d181a2&amp;gt;] tgt_enqueue+0x62/0x210 [ptlrpc]
[57913.413151]  [&amp;lt;ffffffffc0d1efba&amp;gt;] tgt_request_handle+0xada/0x1570 [ptlrpc]
[57913.414553]  [&amp;lt;ffffffffc0cc392b&amp;gt;] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[57913.415991]  [&amp;lt;ffffffffc0cc7294&amp;gt;] ptlrpc_main+0xb34/0x1470 [ptlrpc]
[57913.417401]  [&amp;lt;ffffffffaf4c6321&amp;gt;] kthread+0xd1/0xe0
[57913.418379]  [&amp;lt;ffffffffafb8ed37&amp;gt;] ret_from_fork_nospec_end+0x0/0x39
[57913.419653]  [&amp;lt;ffffffffffffffff&amp;gt;] 0xffffffffffffffff
[57913.420913] LustreError: dumping log to /tmp/lustre-log.1604093989.22912
[57913.858305] Pid: 19858, comm: mdt00_007 3.10.0-1062.18.1.el7_lustre.x86_64 #1 SMP Mon Jun 8 13:47:48 UTC 2020
[57913.860079] Call Trace:
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This failure is for RHEL7.7, DNE ldiskfs testing.&lt;/p&gt;

&lt;p&gt;This hang looks very similar to several other tickets, but for each case, there is a call missing from the logs:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;looks very similar to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11359&quot; title=&quot;racer test 1 times out with client hung in dir_create.sh, ls, &#8230; and MDS in ldlm_completion_ast()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11359&quot;&gt;&lt;del&gt;LU-11359&lt;/del&gt;&lt;/a&gt;, but I don&#8217;t see &#8220;mdt_dom_discard_data&#8221; in the MDS console logs/call traces.&lt;/li&gt;
	&lt;li&gt;looks very similar to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11358&quot; title=&quot;racer test 1 hangs in locking with DNE&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11358&quot;&gt;LU-11358&lt;/a&gt;, but I don&#8217;t see &#8220;mdt_brw_enqueue&#8221; in the MDS console logs/call traces.&lt;/li&gt;
&lt;/ul&gt;
</description>
                <environment></environment>
        <key id="61505">LU-14112</key>
            <summary>parallel-scale-nfsv3 test racer_on_nfs hangs in locking</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="jamesanunez">James Nunez</reporter>
                        <labels>
                    </labels>
                <created>Tue, 3 Nov 2020 18:54:23 +0000</created>
                <updated>Wed, 4 Nov 2020 18:20:54 +0000</updated>
                                            <version>Lustre 2.12.6</version>
                                                        <due></due>
                            <votes>1</votes>
                                    <watches>2</watches>
                                                                                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i01e67:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>