<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:30:36 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-9935] Failover mount hangs on DNE MDT </title>
                <link>https://jira.whamcloud.com/browse/LU-9935</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Not entirely certain what is going on here, however this is very repeatable, and we have verified that the array is quite healthy. &lt;br/&gt;
Steps: &lt;br/&gt;
Soak-11 is power-cycled for a failover test. &lt;br/&gt;
Soak-10 attempts to mount the failed MDT. -&amp;gt; MDT0003&lt;br/&gt;
Soak-10 reports a few odd errors, dumps a lustre log.&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;Aug 31 20:01:27 soak-10 kernel: LNet: Service thread pid 3916 was inactive &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; 200.29s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; debugging purposes:
Aug 31 20:01:27 soak-10 kernel: Pid: 3916, comm: mdt01_027
Aug 31 20:01:27 soak-10 kernel: #012Call Trace:
Aug 31 20:01:27 soak-10 kernel: [&amp;lt;ffffffff8168c969&amp;gt;] schedule+0x29/0x70
Aug 31 20:01:27 soak-10 kernel: [&amp;lt;ffffffffa1165b82&amp;gt;] top_trans_wait_result+0xa3/0x14e [ptlrpc]
Aug 31 20:01:27 soak-10 kernel: [&amp;lt;ffffffff810c54e0&amp;gt;] ? default_wake_function+0x0/0x20
Aug 31 20:01:27 soak-10 kernel: [&amp;lt;ffffffffa114814b&amp;gt;] top_trans_stop+0x46b/0x970 [ptlrpc]
Aug 31 20:01:27 soak-10 kernel: [&amp;lt;ffffffffa15eb569&amp;gt;] lod_trans_stop+0x259/0x340 [lod]
Aug 31 20:01:27 soak-10 kernel: [&amp;lt;ffffffffa166f4d5&amp;gt;] ? mdd_changelog_ns_store+0x2e5/0x650 [mdd]
Aug 31 20:01:27 soak-10 kernel: [&amp;lt;ffffffffa1687b94&amp;gt;] mdd_trans_stop+0x24/0x40 [mdd]
Aug 31 20:01:28 soak-10 kernel: [&amp;lt;ffffffffa1672753&amp;gt;] mdd_create+0x10b3/0x1330 [mdd]
Aug 31 20:01:28 soak-10 kernel: [&amp;lt;ffffffffa1538d86&amp;gt;] mdt_create+0x846/0xbb0 [mdt]
Aug 31 20:01:28 soak-10 kernel: [&amp;lt;ffffffffa153925b&amp;gt;] mdt_reint_create+0x16b/0x350 [mdt]
Aug 31 20:01:28 soak-10 kernel: [&amp;lt;ffffffffa153a760&amp;gt;] mdt_reint_rec+0x80/0x210 [mdt]
Aug 31 20:01:28 soak-10 kernel: [&amp;lt;ffffffffa151c2fb&amp;gt;] mdt_reint_internal+0x5fb/0x9c0 [mdt]
Aug 31 20:01:28 soak-10 kernel: [&amp;lt;ffffffffa1527e37&amp;gt;] mdt_reint+0x67/0x140 [mdt]
Aug 31 20:01:28 soak-10 kernel: [&amp;lt;ffffffffa11348f5&amp;gt;] tgt_request_handle+0x925/0x1370 [ptlrpc]
Aug 31 20:01:28 soak-10 kernel: [&amp;lt;ffffffffa10dd2c6&amp;gt;] ptlrpc_server_handle_request+0x236/0xa90 [ptlrpc]
Aug 31 20:01:28 soak-10 kernel: [&amp;lt;ffffffffa0d6db97&amp;gt;] ? libcfs_debug_msg+0x57/0x80 [libcfs]
Aug 31 20:01:28 soak-10 kernel: [&amp;lt;ffffffffa10e12a0&amp;gt;] ptlrpc_main+0xaa0/0x1de0 [ptlrpc]
Aug 31 20:01:28 soak-10 kernel: [&amp;lt;ffffffffa10e0800&amp;gt;] ? ptlrpc_main+0x0/0x1de0 [ptlrpc]
Aug 31 20:01:28 soak-10 kernel: [&amp;lt;ffffffff810b0a4f&amp;gt;] kthread+0xcf/0xe0
Aug 31 20:01:28 soak-10 kernel: [&amp;lt;ffffffff810b0980&amp;gt;] ? kthread+0x0/0xe0
Aug 31 20:01:28 soak-10 kernel: [&amp;lt;ffffffff816978d8&amp;gt;] ret_from_fork+0x58/0x90
Aug 31 20:01:28 soak-10 kernel: [&amp;lt;ffffffff810b0980&amp;gt;] ? kthread+0x0/0xe0
Aug 31 20:01:28 soak-10 kernel:
Aug 31 20:01:28 soak-10 kernel: LustreError: dumping log to /tmp/lustre-log.1504209688.3916
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Mount then hangs with -114 -&amp;gt; EALREADY      /* Operation already in progress */&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;Aug 31 20:01:39 soak-10 kernel: LustreError: 11348:0:(obd_mount_server.c:1832:server_fill_super()) Unable to start osd on /dev/mapper/360080e50001fedb80000015752012949: -114
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;At this time the mount is unkillable.  Attempts to un-mount the other MDT also fail&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;Aug 31 20:58:14 soak-10 kernel: Lustre: soaked-MDT0002: Client 040f43a4-0589-a1a1-1a89-accb11c79d99 (at 192.168.1.116@o2ib) reconnecting
Aug 31 20:58:14 soak-10 kernel: Lustre: soaked-MDT0002: Connection restored to 040f43a4-0589-a1a1-1a89-accb11c79d99 (at 192.168.1.116@o2ib)
Aug 31 20:58:15 soak-10 kernel: Lustre: Failing over soaked-MDT0002
Aug 31 20:58:16 soak-10 kernel: LustreError: 21348:0:(ldlm_resource.c:1094:ldlm_resource_complain()) soaked-MDT0000-osp-MDT0002: namespace resource [0x200000bd2:0x6:0x0].0x0 (ffff8808011aad80) refcount nonzero (1) after lock cleanup; forcing cleanup.
Aug 31 20:58:16 soak-10 kernel: LustreError: 21348:0:(ldlm_resource.c:1676:ldlm_resource_dump()) --- Resource: [0x200000bd2:0x6:0x0].0x0 (ffff8808011aad80) refcount = 2
Aug 31 20:58:16 soak-10 kernel: LustreError: 21348:0:(ldlm_resource.c:1679:ldlm_resource_dump()) Granted locks (in reverse order):
Aug 31 20:58:16 soak-10 kernel: LustreError: 21348:0:(ldlm_resource.c:1682:ldlm_resource_dump()) ### ### ns: soaked-MDT0000-osp-MDT0002 lock: ffff88078ded8c00/0xdf81cb0d8be12866 lrc: 2/0,1 mode: EX/EX res: [0x200000bd2:0x6:0x0].0x0 bits 0x2 rrc: 3 type: IBT flags: 0x1106401000000 nid: local remote: 0x5fb81c9b7e5be5fe expref: -99 pid: 3916 timeout: 0 lvb_type: 0
Aug 31 20:58:16 soak-10 kernel: LustreError: 21348:0:(ldlm_resource.c:1676:ldlm_resource_dump()) --- Resource: [0x200000bd2:0x6:0x0].0x0 (ffff8808011aad80) refcount = 2
Aug 31 20:58:16 soak-10 kernel: LustreError: 21348:0:(ldlm_resource.c:1679:ldlm_resource_dump()) Granted locks (in reverse order):
Aug 31 20:58:18 soak-10 kernel: Lustre: soaked-MDT0002: Not available &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; connect from 192.168.1.136@o2ib (stopping)
Aug 31 20:58:19 soak-10 kernel: Lustre: soaked-MDT0002: Not available &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; connect from 192.168.1.130@o2ib (stopping)
Aug 31 20:58:19 soak-10 kernel: Lustre: Skipped 1 previous similar message
Aug 31 20:58:22 soak-10 kernel: Lustre: soaked-MDT0002: Not available &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; connect from 172.16.1.48@o2ib1 (stopping)
Aug 31 20:58:22 soak-10 kernel: Lustre: Skipped 3 previous similar messages
Aug 31 20:58:24 soak-10 kernel: Lustre: soaked-MDT0002: Not available &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; connect from 192.168.1.138@o2ib (stopping)
Aug 31 20:58:24 soak-10 kernel: Lustre: Skipped 3 previous similar messages
Aug 31 20:58:28 soak-10 kernel: Lustre: soaked-MDT0002: Not available &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; connect from 172.16.1.46@o2ib1 (stopping)
Aug 31 20:58:28 soak-10 kernel: Lustre: Skipped 5 previous similar messages
Aug 31 20:58:38 soak-10 kernel: Lustre: soaked-MDT0002: Not available &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; connect from 192.168.1.131@o2ib (stopping)
Aug 31 20:58:38 soak-10 kernel: Lustre: Skipped 28 previous similar messages
Aug 31 20:58:41 soak-10 kernel: LustreError: 0-0: Forced cleanup waiting &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; soaked-MDT0000-osp-MDT0002 namespace with 1 resources in use, (rc=-110)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Dumped the lustre log both before and after the un-mount attempt. Crash-dumped the system.&lt;br/&gt;
When the system returned from the crash dump/reboot, all devices mounted fine.&lt;br/&gt;
Ran for awhile with only mds_restart test, so devices were unmounted and remounted on the same node. This worked fine. This wedged only seems to occur during failover. &lt;br/&gt;
Multiple Lusre logs, system log and full stack traces attached&lt;/p&gt;</description>
                <environment>Soak stress cluster Lustre version=2.10.0_61_g6aabd4a</environment>
        <key id="48090">LU-9935</key>
            <summary>Failover mount hangs on DNE MDT </summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="laisiyao">Lai Siyao</assignee>
                                    <reporter username="cliffw">Cliff White</reporter>
                        <labels>
                            <label>soak</label>
                    </labels>
                <created>Thu, 31 Aug 2017 21:39:58 +0000</created>
                <updated>Thu, 26 Oct 2017 18:18:06 +0000</updated>
                                            <version>Lustre 2.10.1</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                            <comments>
                            <comment id="207273" author="pjones" created="Fri, 1 Sep 2017 17:37:51 +0000"  >&lt;p&gt;Lai&lt;/p&gt;

&lt;p&gt;Could you please advise on this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="207274" author="cliffw" created="Fri, 1 Sep 2017 17:42:35 +0000"  >&lt;p&gt;There may have been a configuration issue, let me re-test to confirm&lt;/p&gt;</comment>
                            <comment id="207722" author="sarah" created="Wed, 6 Sep 2017 23:07:17 +0000"  >&lt;p&gt;I don&apos;t see this error after Cliff restarted. Instead soak.log shows on 9/4, when trying to fail back from soak-9 to soak-8,  soak-9 hang during umount.  Don&apos;t find core dump for soak-9 though.&lt;/p&gt;

&lt;p&gt;soak.log&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2017-09-04 07:15:19,080:fsmgmt.fsmgmt:INFO     soak-8 is up!!!
2017-09-04 07:15:30,092:fsmgmt.fsmgmt:INFO     Failing over soaked-MDT0000 ...
2017-09-04 07:15:30,092:fsmgmt.fsmgmt:INFO     Mounting soaked-MDT0000 on soak-9 ...
2017-09-04 07:15:44,501:sched.sched :ERROR    soak-44: client still faulty. Please check node. Next check in 180s.
2017-09-04 07:16:22,822:fsmgmt.fsmgmt:INFO     ... soaked-MDT0000 mounted successfully on soak-9
2017-09-04 07:16:22,823:fsmgmt.fsmgmt:INFO     ... soaked-MDT0000 failed over
2017-09-04 07:16:22,823:fsmgmt.fsmgmt:INFO     Wait for recovery to complete
2017-09-04 07:16:23,113:fsmgmt.fsmgmt:DEBUG    Recovery Result Record: {&apos;soak-9&apos;: {&apos;soaked-MDT0001&apos;: &apos;COMPLETE&apos;, &apos;soaked-MDT0000&apos;: &apos;RECOVERING&apos;}}
2017-09-04 07:16:23,113:fsmgmt.fsmgmt:INFO     soaked-MDT0000 in status &apos;RECOVERING&apos;.
...

2017-09-04 07:18:09,934:fsmgmt.fsmgmt:DEBUG    Recovery Result Record: {&apos;soak-9&apos;: {&apos;soaked-MDT0001&apos;: &apos;COMPLETE&apos;, &apos;soaked-MDT0000&apos;: &apos;COMPLETE&apos;}}
2017-09-04 07:18:09,934:fsmgmt.fsmgmt:INFO     Node soak-9: &apos;soaked-MDT0000&apos; recovery completed
2017-09-04 07:18:09,934:fsmgmt.fsmgmt:INFO     Failing back soaked-MDT0000 ...
2017-09-04 07:18:09,935:fsmgmt.fsmgmt:INFO     Unmounting soaked-MDT0000 on soak-9 ...
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;soak-9 syslog&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Sep  4 07:18:10 soak-9 kernel: Lustre: Failing over soaked-MDT0000
Sep  4 07:18:10 soak-9 kernel: LustreError: 11-0: soaked-MDT0000-osp-MDT0001: operation out_update to node 0@lo failed: rc = -19
Sep  4 07:18:10 soak-9 kernel: LustreError: Skipped 3 previous similar messages
Sep  4 07:18:10 soak-9 kernel: Lustre: soaked-MDT0000-osp-MDT0001: Connection to soaked-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
Sep  4 07:18:10 soak-9 kernel: Lustre: Skipped 1 previous similar message
Sep  4 07:18:10 soak-9 kernel: LustreError: 4464:0:(ldlm_lockd.c:1415:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8803f1172c00 ns: mdt-soaked-MDT0000_UUID lock: ffff880332d2a000/0x72b6987a94ebf754 lrc: 3/0,0 mode: CR/CR res: [0x20000a814:0xc695:0x0].0x0 bits 0x9 rrc: 2 type: IBT flags: 0x50200000000000 nid: 192.168.1.120@o2ib remote: 0x4d92350181a4cd1e expref: 4 pid: 4464 timeout: 0 lvb_type: 0
Sep  4 07:18:11 soak-9 kernel: Lustre: soaked-MDT0000: Not available for connect from 192.168.1.120@o2ib (stopping)
Sep  4 07:18:12 soak-9 kernel: Lustre: soaked-MDT0000: Not available for connect from 172.16.1.45@o2ib1 (stopping)
Sep  4 07:18:13 soak-9 kernel: LustreError: 4257:0:(client.c:1166:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff88008e50b000 x1577570293501872/t0(0) o13-&amp;gt;soaked-OST000a-osc-MDT0000@192.168.1.106@o2ib:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
Sep  4 07:18:13 soak-9 kernel: LustreError: 4257:0:(client.c:1166:ptlrpc_import_delay_req()) Skipped 19 previous similar messages
Sep  4 07:18:13 soak-9 kernel: Lustre: soaked-MDT0000: Not available for connect from 192.168.1.107@o2ib (stopping)
Sep  4 07:18:13 soak-9 kernel: Lustre: Skipped 1 previous similar message
Sep  6 22:49:04 soak-9 rsyslogd: [origin software=&quot;rsyslogd&quot; swVersion=&quot;7.4.7&quot; x-pid=&quot;1763&quot; x-info=&quot;http://www.rsyslog.com&quot;] start
Sep  6 22:48:28 soak-9 kernel: microcode: microcode updated early to revision 0x710, date = 2013-06-17
Sep  6 22:48:28 soak-9 kernel: Initializing cgroup subsys cpuset
Sep  6 22:48:28 soak-9 kernel: Initializing cgroup subsys cpu
Sep  6 22:48:28 soak-9 kernel: Initializing cgroup subsys cpuacct
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="28151" name="lustre-log.1504209688.3916.txt.gz" size="6137982" author="cliffw" created="Thu, 31 Aug 2017 21:39:16 +0000"/>
                            <attachment id="28150" name="soak-10-syslog.txt.gz" size="4228" author="cliffw" created="Thu, 31 Aug 2017 21:39:05 +0000"/>
                            <attachment id="28149" name="soak-10.afterumount.txt.gz" size="3450041" author="cliffw" created="Thu, 31 Aug 2017 21:39:13 +0000"/>
                            <attachment id="28148" name="soak-10.dmesg.gz" size="33721" author="cliffw" created="Thu, 31 Aug 2017 21:39:05 +0000"/>
                            <attachment id="28147" name="soak-10.dump.txt.gz" size="4723112" author="cliffw" created="Thu, 31 Aug 2017 21:39:15 +0000"/>
                            <attachment id="28146" name="soak-10.stack-dump.txt.gz" size="167860" author="cliffw" created="Thu, 31 Aug 2017 21:39:06 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzje7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>