<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:39:13 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-10904] racer hangs on umount</title>
                <link>https://jira.whamcloud.com/browse/LU-10904</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;racer completes all tests, but hangs while unmounting the file system. We see this problem frequently.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;On the client console, we see the umount command hung&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[41678.588785] umount&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; D 0000000000000000&#160;&#160;&#160;&#160; 0 13439&#160; 13432 0x00000000
[41678.588786]&#160; ffff880058147da8 ffff880058147de0 ffff880063b84cc0 ffff880058148000
[41678.588788]&#160; ffff880058147de0 00000001009dda10 ffff88007fc10840 0000000000000000
[41678.588789]&#160; ffff880058147dc0 ffffffff81611eb5 ffff88007fc10840 ffff880058147e68
[41678.588789] Call Trace:
[41678.588792]&#160; [&amp;lt;ffffffff81611eb5&amp;gt;] schedule+0x35/0x80
[41678.588794]&#160; [&amp;lt;ffffffff81614c71&amp;gt;] schedule_timeout+0x161/0x2d0
[41678.589164]&#160; [&amp;lt;ffffffffa0d34837&amp;gt;] ll_kill_super+0x77/0x150 [lustre]
[41678.589219]&#160; [&amp;lt;ffffffffa0838054&amp;gt;] lustre_kill_super+0x34/0x40 [obdclass]
[41678.589247]&#160; [&amp;lt;ffffffff8120d4ff&amp;gt;] deactivate_locked_super+0x3f/0x70
[41678.589260]&#160; [&amp;lt;ffffffff812289cb&amp;gt;] cleanup_mnt+0x3b/0x80
[41678.589264]&#160; [&amp;lt;ffffffff8109d198&amp;gt;] task_work_run+0x78/0x90
[41678.589267]&#160; [&amp;lt;ffffffff8107b60f&amp;gt;] exit_to_usermode_loop+0x91/0xc2
[41678.589304]&#160; [&amp;lt;ffffffff81003ae5&amp;gt;] syscall_return_slowpath+0x85/0xa0
[41678.589327]&#160; [&amp;lt;ffffffff816160a8&amp;gt;] int_ret_from_sys_call+0x25/0x9f
[41678.591515] DWARF2 unwinder stuck at int_ret_from_sys_call+0x25/0x9f

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;The stack trace for the hung umount on the client is common to all these failures. What varies in these failures is what is in the other console logs.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;In some cases, there is little information in the console logs as to why umount is hung. The following failures are example of this&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/1a8349fe-3780-11e8-960d-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/1a8349fe-3780-11e8-960d-52540065bddc&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/49efa63a-2e87-11e8-b74b-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/49efa63a-2e87-11e8-b74b-52540065bddc&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Other racer hangs on unmount has more information in client and MDS console logs.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;On the MDS console log, we see two hung processes&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[41561.767660] jbd2/vda1-8&#160;&#160;&#160;&#160; D ffff880036476eb0&#160;&#160;&#160;&#160; 0&#160;&#160; 263&#160;&#160;&#160;&#160;&#160; 2 0x00000000
[41561.768587] Call Trace:
[41561.768925]&#160; [&amp;lt;ffffffff816b2060&amp;gt;] ? bit_wait+0x50/0x50
[41561.769652]&#160; [&amp;lt;ffffffff816b40e9&amp;gt;] schedule+0x29/0x70
[41561.770284]&#160; [&amp;lt;ffffffff816b1a49&amp;gt;] schedule_timeout+0x239/0x2c0
[41561.771136]&#160; [&amp;lt;ffffffff812fd2d0&amp;gt;] ? generic_make_request_checks+0x1a0/0x3a0
[41561.772022]&#160; [&amp;lt;ffffffff81063f5e&amp;gt;] ? kvm_clock_get_cycles+0x1e/0x20
[41561.772875]&#160; [&amp;lt;ffffffff816b2060&amp;gt;] ? bit_wait+0x50/0x50
[41561.773511]&#160; [&amp;lt;ffffffff816b35ed&amp;gt;] io_schedule_timeout+0xad/0x130
[41561.774276]&#160; [&amp;lt;ffffffff816b3688&amp;gt;] io_schedule+0x18/0x20
[41561.775031]&#160; [&amp;lt;ffffffff816b2071&amp;gt;] bit_wait_io+0x11/0x50
[41561.775682]&#160; [&amp;lt;ffffffff816b1b97&amp;gt;] __wait_on_bit+0x67/0x90
[41561.776361]&#160; [&amp;lt;ffffffff816b2060&amp;gt;] ? bit_wait+0x50/0x50
[41561.777112]&#160; [&amp;lt;ffffffff816b1c41&amp;gt;] out_of_line_wait_on_bit+0x81/0xb0
[41561.777910]&#160; [&amp;lt;ffffffff810b5080&amp;gt;] ? wake_bit_function+0x40/0x40
[41561.778756]&#160; [&amp;lt;ffffffff8123b3fa&amp;gt;] __wait_on_buffer+0x2a/0x30
[41561.779893]&#160; [&amp;lt;ffffffffc00b87f1&amp;gt;] jbd2_journal_commit_transaction+0x1781/0x19b0 [jbd2]
[41561.780881]&#160; [&amp;lt;ffffffff810c28a0&amp;gt;] ? finish_task_switch+0x50/0x170
[41561.781650]&#160; [&amp;lt;ffffffffc00bdac9&amp;gt;] kjournald2+0xc9/0x260 [jbd2]
[41561.782385]&#160; [&amp;lt;ffffffff810b4fc0&amp;gt;] ? wake_up_atomic_t+0x30/0x30
[41561.783226]&#160; [&amp;lt;ffffffffc00bda00&amp;gt;] ? commit_timeout+0x10/0x10 [jbd2]
[41561.784024]&#160; [&amp;lt;ffffffff810b4031&amp;gt;] kthread+0xd1/0xe0
[41561.784710]&#160; [&amp;lt;ffffffff810b3f60&amp;gt;] ? insert_kthread_work+0x40/0x40
[41561.785478]&#160; [&amp;lt;ffffffff816c0577&amp;gt;] ret_from_fork+0x77/0xb0
[41561.786238]&#160; [&amp;lt;ffffffff810b3f60&amp;gt;] ? insert_kthread_work+0x40/0x40
&#8230;
[41561.874288] auditd&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; D ffff88007911bf40&#160;&#160;&#160;&#160; 0&#160;&#160; 464&#160;&#160;&#160;&#160;&#160; 1 0x00000000
[41561.875280] Call Trace:
[41561.875598] &#160;[&amp;lt;ffffffff816b40e9&amp;gt;] schedule+0x29/0x70
[41561.876257]&#160; [&amp;lt;ffffffffc00bd4a5&amp;gt;] jbd2_log_wait_commit+0xc5/0x140 [jbd2]
[41561.877183]&#160; [&amp;lt;ffffffff810b4fc0&amp;gt;] ? wake_up_atomic_t+0x30/0x30
[41561.877931]&#160; [&amp;lt;ffffffffc00bea92&amp;gt;] jbd2_complete_transaction+0x52/0xa0 [jbd2]
[41561.879009]&#160; [&amp;lt;ffffffffc00e1d52&amp;gt;] ext4_sync_file+0x292/0x320 [ext4]
[41561.879805]&#160; [&amp;lt;ffffffff81238477&amp;gt;] do_fsync+0x67/0xb0
[41561.880476]&#160; [&amp;lt;ffffffff816c0655&amp;gt;] ? system_call_after_swapgs+0xa2/0x146
[41561.881309]&#160; [&amp;lt;ffffffff81238760&amp;gt;] SyS_fsync+0x10/0x20
[41561.882029]&#160; [&amp;lt;ffffffff816c0715&amp;gt;] system_call_fastpath+0x1c/0x21
[41561.882794]&#160; [&amp;lt;ffffffff816c0661&amp;gt;] ? system_call_after_swapgs+0xae/0x146
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Example of this are&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/28935b90-2e9f-11e8-9e0e-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/28935b90-2e9f-11e8-9e0e-52540065bddc&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Other test hangs have additional information on a client. We see a couple of errors before racer completes&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[38788.150606] Lustre: lustre-MDT0000-mdc-ffff880000015000: Connection restored to 10.2.8.165@tcp (at 10.2.8.165@tcp)
[38887.196337] LustreError: 8603:0:(statahead.c:1591:start_statahead_thread()) can&apos;t start ll_sa thread, rc: -4
[38920.523725] Lustre: 25748:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1523359053/real 1523359053]&#160; req@ffff8800639b96c0 x1597353644599184/t0(0) o36-&amp;gt;lustre-MDT0000-mdc-ffff88006424a000@10.2.8.165@tcp:12/10 lens 608/4768 e 0 to 1 dl 1523359107 ref 2 fl Rpc:IX/0/ffffffff rc 0/-1
[38920.523749] Lustre: 25748:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 2 previous similar messages[38920.523799] Lustre: lustre-MDT0000-mdc-ffff88006424a000: Connection to lustre-MDT0000 (at 10.2.8.165@tcp) was lost; in progress operations using this service will wait for recovery to complete
[38920.523800] Lustre: Skipped 2 previous similar messages
[38920.523857] LustreError: 25748:0:(llite_lib.c:1511:ll_md_setattr()) md_setattr fails: rc = -4
[38920.531836] Lustre: lustre-MDT0000-mdc-ffff88006424a000: Connection restored to 10.2.8.165@tcp (at 10.2.8.165@tcp)

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Examples of this are at&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/8209f4b2-3d46-11e8-8f8a-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/8209f4b2-3d46-11e8-8f8a-52540065bddc&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&#160;Other client have different information in the console log. Some have many of the following messages&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[10286.777823] ll_sa_21479&#160;&#160;&#160;&#160; D ffff880078c30000&#160;&#160;&#160;&#160; 0 21498&#160;&#160;&#160;&#160;&#160; 2 0x00000080
[10286.779544] Call Trace:
[10286.780839]&#160; [&amp;lt;ffffffff816b40e9&amp;gt;] schedule+0x29/0x70
[10286.782475]&#160; [&amp;lt;ffffffffc0cfa6d1&amp;gt;] ll_statahead_thread+0x651/0xbf0 [lustre]
[10286.784173]&#160; [&amp;lt;ffffffffc0cfa080&amp;gt;] ? ll_agl_thread+0x3e0/0x3e0 [lustre]
[10286.785825]&#160; [&amp;lt;ffffffff810b4031&amp;gt;] kthread+0xd1/0xe0
[10286.787342]&#160; [&amp;lt;ffffffff810b3f60&amp;gt;] ? insert_kthread_work+0x40/0x40
[10286.788946]&#160; [&amp;lt;ffffffff816c0577&amp;gt;] ret_from_fork+0x77/0xb0
[10286.790502]&#160; [&amp;lt;ffffffff810b3f60&amp;gt;] ? insert_kthread_work+0x40/0x40
[10286.792104] ll_agl_21479&#160;&#160;&#160; D ffff880078c35ee0&#160;&#160;&#160;&#160; 0 21499&#160;&#160;&#160;&#160;&#160; 2 0x00000080
[10286.793798] Call Trace:
[10286.795131]&#160; [&amp;lt;ffffffff816b40e9&amp;gt;] schedule+0x29/0x70
[10286.796658]&#160; [&amp;lt;ffffffffc0cf9f6b&amp;gt;] ll_agl_thread+0x2cb/0x3e0 [lustre]
[10286.798302]&#160; [&amp;lt;ffffffffc0cf9ca0&amp;gt;] ? ll_agl_trigger+0x520/0x520 [lustre]
[10286.799953]&#160; [&amp;lt;ffffffff810b4031&amp;gt;] kthread+0xd1/0xe0
[10286.801459]&#160; [&amp;lt;ffffffff810b3f60&amp;gt;] ? insert_kthread_work+0x40/0x40
[10286.803059]&#160; [&amp;lt;ffffffff816c0577&amp;gt;] ret_from_fork+0x77/0xb0
[10286.804593]&#160; [&amp;lt;ffffffff810b3f60&amp;gt;] ? insert_kthread_work+0x40/0x40

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Logs for these types of failures are at&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/847d5946-3152-11e8-b6a0-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/847d5946-3152-11e8-b6a0-52540065bddc&lt;/a&gt;&lt;/p&gt;</description>
                <environment></environment>
        <key id="51781">LU-10904</key>
            <summary>racer hangs on umount</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="jamesanunez">James Nunez</reporter>
                        <labels>
                    </labels>
                <created>Wed, 11 Apr 2018 18:31:48 +0000</created>
                <updated>Thu, 23 Nov 2023 22:10:47 +0000</updated>
                            <resolved>Thu, 23 Nov 2023 22:10:47 +0000</resolved>
                                    <version>Lustre 2.11.0</version>
                    <version>Lustre 2.10.4</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>1</watches>
                                                                                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="50334">LU-10543</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzvp3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>