<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:48:13 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-11936] High ldlm load, slow/unusable filesystem</title>
                <link>https://jira.whamcloud.com/browse/LU-11936</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We upgraded our cluster Sherlock to Lustre 2.12 yesterday and put Fir (Lustre 2.12 servers) into production yesterday but this morning, the filesystem is unusable due to a super high load of ldlm threads on the MDS servers.&lt;/p&gt;

&lt;p&gt;I can see plenty of these on the MDS:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[Wed Feb  6 09:19:25 2019][1695265.110496] LNet: Service thread pid 35530 was inactive for 350.85s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
[Wed Feb  6 09:19:25 2019][1695265.127609] Pid: 35530, comm: mdt02_024 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018
[Wed Feb  6 09:19:25 2019][1695265.137522] Call Trace:
[Wed Feb  6 09:19:25 2019][1695265.140184]  [&amp;lt;ffffffffc0e3a0bd&amp;gt;] ldlm_completion_ast+0x63d/0x920 [ptlrpc]
[Wed Feb  6 09:19:25 2019][1695265.147339]  [&amp;lt;ffffffffc0e3adcc&amp;gt;] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc]
[Wed Feb  6 09:19:25 2019][1695265.154723]  [&amp;lt;ffffffffc15164ab&amp;gt;] mdt_object_local_lock+0x50b/0xb20 [mdt]
[Wed Feb  6 09:19:25 2019][1695265.161731]  [&amp;lt;ffffffffc1516b30&amp;gt;] mdt_object_lock_internal+0x70/0x3e0 [mdt]
[Wed Feb  6 09:19:25 2019][1695265.168906]  [&amp;lt;ffffffffc1517d1a&amp;gt;] mdt_getattr_name_lock+0x90a/0x1c30 [mdt]
[Wed Feb  6 09:19:25 2019][1695265.176002]  [&amp;lt;ffffffffc151fbb5&amp;gt;] mdt_intent_getattr+0x2b5/0x480 [mdt]
[Wed Feb  6 09:19:25 2019][1695265.182750]  [&amp;lt;ffffffffc151ca18&amp;gt;] mdt_intent_policy+0x2e8/0xd00 [mdt]
[Wed Feb  6 09:19:25 2019][1695265.189403]  [&amp;lt;ffffffffc0e20ec6&amp;gt;] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc]
[Wed Feb  6 09:19:25 2019][1695265.196349]  [&amp;lt;ffffffffc0e498a7&amp;gt;] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc]
[Wed Feb  6 09:19:25 2019][1695265.203643]  [&amp;lt;ffffffffc0ed0302&amp;gt;] tgt_enqueue+0x62/0x210 [ptlrpc]
[Wed Feb  6 09:19:25 2019][1695265.210001]  [&amp;lt;ffffffffc0ed735a&amp;gt;] tgt_request_handle+0xaea/0x1580 [ptlrpc]
[Wed Feb  6 09:19:25 2019][1695265.217133]  [&amp;lt;ffffffffc0e7b92b&amp;gt;] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[Wed Feb  6 09:19:25 2019][1695265.225045]  [&amp;lt;ffffffffc0e7f25c&amp;gt;] ptlrpc_main+0xafc/0x1fc0 [ptlrpc]
[Wed Feb  6 09:19:25 2019][1695265.231563]  [&amp;lt;ffffffffadcc1c31&amp;gt;] kthread+0xd1/0xe0
[Wed Feb  6 09:19:25 2019][1695265.236657]  [&amp;lt;ffffffffae374c24&amp;gt;] ret_from_fork_nospec_begin+0xe/0x21
[Wed Feb  6 09:19:25 2019][1695265.243306]  [&amp;lt;ffffffffffffffff&amp;gt;] 0xffffffffffffffff
[Wed Feb  6 09:19:25 2019][1695265.248518] LustreError: dumping log to /tmp/lustre-log.1549473603.35530
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;On Fir, we have two Lustre 2.12 MDS, &lt;tt&gt;fir-md1-s1&lt;/tt&gt; and &lt;tt&gt;fir-md1-s2&lt;/tt&gt;, each with 2 MDTs. I dumped the current tasks using sysrq to the console and I&apos;m attaching the full console log for both MDS servers. The servers don&apos;t crash but lead to unsable/blocking clients,however from time to time we can access the filesystem. Any help would be appreciated. Thanks!&lt;/p&gt;

&lt;p&gt;Stephane&lt;/p&gt;</description>
                <environment>CentOS 7.6, Lustre 2.12.0, 3.10.0-957.1.3.el7_lustre.x86_64</environment>
        <key id="54800">LU-11936</key>
            <summary>High ldlm load, slow/unusable filesystem</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="4" iconUrl="https://jira.whamcloud.com/images/icons/statuses/reopened.png" description="This issue was once resolved, but the resolution was deemed incorrect. From here issues are either marked assigned or resolved.">Reopened</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="ssmirnov">Serguei Smirnov</assignee>
                                    <reporter username="sthiell">Stephane Thiell</reporter>
                        <labels>
                    </labels>
                <created>Wed, 6 Feb 2019 17:54:01 +0000</created>
                <updated>Wed, 27 Nov 2019 16:06:52 +0000</updated>
                                            <version>Lustre 2.12.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>9</watches>
                                                                            <comments>
                            <comment id="241473" author="sthiell" created="Wed, 6 Feb 2019 18:02:17 +0000"  >&lt;p&gt;Ah! Just after submitting this ticket and attaching the different files, I just noticed that a possible issue could be that we do have (again) one client having a NID using @tcp leading to unusable servers. &lt;/p&gt;

&lt;p&gt;For example, from lustre-log.1549473603.35530:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;@@@ Request sent has failed due to network error: [sent 1549473603/real 1549473603]  req@ffff912521032d00 x1623682053684704/t0(0) o106-&amp;gt;fir-MDT0002@10.10.114.10@tcp:15/16 lens 296/280 e 0 to 1 dl 1549473614 ref 1 fl Rpc:eX/2/ffffffff rc -11/-1
��lib-move.c�lnet_handle_find_routed_path�no route to 10.10.114.10@tcp from &amp;lt;?&amp;gt;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;All our Lustre servers are exclusively IB. If this is the cause of the issue, this could be a duplicate of my previous ticket &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11888&quot; title=&quot;Unreachable client NID confusing Lustre 2.12&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11888&quot;&gt;LU-11888&lt;/a&gt;.&lt;br/&gt;
I&apos;ll investigate and get rid of this bad node and then confirm or not.&lt;br/&gt;
Best,&lt;br/&gt;
Stephane&lt;/p&gt;
</comment>
                            <comment id="241486" author="pjones" created="Wed, 6 Feb 2019 19:00:23 +0000"  >&lt;p&gt;Seems related to a ticket you already have&lt;/p&gt;</comment>
                            <comment id="241517" author="sthiell" created="Thu, 7 Feb 2019 01:26:23 +0000"  >&lt;p&gt;Yes I confirm that when we get rid of all bogus clients and restart all Lustre servers, everything is back to normal. The servers don&apos;t recover by themselves so a full restart is required apparently. Other Lustre servers running 2.8 or 2.10 are not affected. Please feel free to mark this one as duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11888&quot; title=&quot;Unreachable client NID confusing Lustre 2.12&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11888&quot;&gt;LU-11888&lt;/a&gt;. I also opened &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11937&quot; title=&quot;lnet.service randomly load tcp NIDs&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11937&quot;&gt;&lt;del&gt;LU-11937&lt;/del&gt;&lt;/a&gt; to find out why in some case a NID with tcp0 is loaded on the clients.&lt;br/&gt;
Thanks!&lt;br/&gt;
Stephane&lt;/p&gt;</comment>
                            <comment id="241518" author="pjones" created="Thu, 7 Feb 2019 01:56:40 +0000"  >&lt;p&gt;ok - thanks Stephane&lt;/p&gt;</comment>
                            <comment id="256781" author="sthiell" created="Mon, 21 Oct 2019 22:49:46 +0000"  >&lt;p&gt;This issue should probably be reopen because we just seen it with 2.12.3 RC1. After a client announced itself with a @tcp NID, it was very hard to remove it from Lustre. We had to evict the bogus tcp nid + use &lt;tt&gt;lnetctl peer del --prim_nid&lt;/tt&gt;.&lt;/p&gt;

&lt;p&gt;And it didn&apos;t even work the first time:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@fir-md1-s1 fir-MDT0000]# lnetctl peer show --nid 10.10.117.42@tcp
peer:
    - primary nid: 10.10.117.42@tcp
      Multi-Rail: True
      peer ni:
        - nid: 10.9.117.42@o2ib4
          state: NA
        - nid: 10.10.117.42@tcp
          state: NA

[root@fir-md1-s1 fir-MDT0000]# lnetctl peer del --prim_nid 10.10.117.42@tcp

[root@fir-md1-s1 fir-MDT0000]# lnetctl peer show --nid 10.10.117.42@tcp
peer:
    - primary nid: 10.10.117.42@tcp
      Multi-Rail: False
      peer ni:
        - nid: 10.10.117.42@tcp
          state: NA
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;See, it removed the good one (o2ib4)!&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;With a combination of eviction + peer deletion, it worked:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@fir-md1-s1 fir-MDT0000]# cat exports/10.10.117.42@tcp/uuid
2f0c7362-5db5-a6c6-08eb-fa16109cc0f9

[root@fir-md1-s1 fir-MDT0000]# echo 2f0c7362-5db5-a6c6-08eb-fa16109cc0f9 &amp;gt; evict_client 


[root@fir-md1-s1 fir-MDT0000]# lnetctl peer del --prim_nid 10.10.117.42@tcp


[root@fir-md1-s1 fir-MDT0000]# lnetctl peer show --nid 10.9.117.42@o2ib4
peer:
    - primary nid: 10.9.117.42@o2ib4
      Multi-Rail: True
      peer ni:
        - nid: 10.9.117.42@o2ib4
          state: NA
[root@fir-md1-s1 fir-MDT0000]# ls exports| grep tcp
[root@fir-md1-s1 fir-MDT0000]# 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This is a problem, we have IB only routers and no tcp network/route defined on the servers, I don&apos;t know why Lustre does accept it at all. Also, the ldlm_bl thread at 100% is also still a problem:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;$ top
   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 40268 root      20   0       0      0      0 S 100.0  0.0 114:45.76 ldlm_bl_06
 16505 root      20   0  164284   4424   1548 R  22.2  0.0   0:00.04 top
 35331 root      20   0       0      0      0 S   5.6  0.0 100:37.82 kiblnd_sd_03_03
 35333 root      20   0       0      0      0 S   5.6  0.0   0:48.41 lnet_discovery
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;I&apos;m feeling hesitant of dumping the tasks right now, as the filesystem is heavily used.&lt;/p&gt;</comment>
                            <comment id="256864" author="sthiell" created="Tue, 22 Oct 2019 18:28:55 +0000"  >&lt;p&gt;This is the task that was in an endless loop:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[Tue Oct 22 06:46:06 2019][502582.891745] ldlm_bl_06      R  running task        0 40268      2 0x00000080
[Tue Oct 22 06:46:06 2019][502582.898941] Call Trace:
[Tue Oct 22 06:46:07 2019][502582.901568]  [&amp;lt;ffffffffc10018d9&amp;gt;] ? ptlrpc_expire_one_request+0xf9/0x520 [ptlrpc]
[Tue Oct 22 06:46:07 2019][502582.909199]  [&amp;lt;ffffffffc1003da8&amp;gt;] ? ptlrpc_check_set.part.23+0x378/0x1df0 [ptlrpc]
[Tue Oct 22 06:46:07 2019][502582.916913]  [&amp;lt;ffffffffc100587b&amp;gt;] ? ptlrpc_check_set+0x5b/0xe0 [ptlrpc]
[Tue Oct 22 06:46:07 2019][502582.923673]  [&amp;lt;ffffffffc1005dea&amp;gt;] ? ptlrpc_set_wait+0x4ea/0x790 [ptlrpc]
[Tue Oct 22 06:46:07 2019][502582.930470]  [&amp;lt;ffffffffaf6d7c40&amp;gt;] ? wake_up_state+0x20/0x20
[Tue Oct 22 06:46:07 2019][502582.936184]  [&amp;lt;ffffffffc0fc3055&amp;gt;] ? ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc]
[Tue Oct 22 06:46:07 2019][502582.943118]  [&amp;lt;ffffffffc0fc452f&amp;gt;] ? __ldlm_reprocess_all+0x11f/0x360 [ptlrpc]
[Tue Oct 22 06:46:07 2019][502582.950395]  [&amp;lt;ffffffffc0fc4783&amp;gt;] ? ldlm_reprocess_all+0x13/0x20 [ptlrpc]
[Tue Oct 22 06:46:07 2019][502582.957331]  [&amp;lt;ffffffffc0fdc01e&amp;gt;] ? ldlm_cli_cancel_local+0x29e/0x3f0 [ptlrpc]
[Tue Oct 22 06:46:07 2019][502582.964708]  [&amp;lt;ffffffffc0fe1b67&amp;gt;] ? ldlm_cli_cancel+0x157/0x620 [ptlrpc]
[Tue Oct 22 06:46:07 2019][502582.971541]  [&amp;lt;ffffffffaf81bd89&amp;gt;] ? ___slab_alloc+0x209/0x4f0
[Tue Oct 22 06:46:07 2019][502582.977440]  [&amp;lt;ffffffffc0fe20d4&amp;gt;] ? ldlm_blocking_ast_nocheck+0xa4/0x310 [ptlrpc]
[Tue Oct 22 06:46:07 2019][502582.985094]  [&amp;lt;ffffffffc0fe247a&amp;gt;] ? ldlm_blocking_ast+0x13a/0x170 [ptlrpc]
[Tue Oct 22 06:46:07 2019][502582.992129]  [&amp;lt;ffffffffc0fbc02c&amp;gt;] ? lock_res_and_lock+0x2c/0x50 [ptlrpc]
[Tue Oct 22 06:46:07 2019][502582.999006]  [&amp;lt;ffffffffc0fedcd8&amp;gt;] ? ldlm_handle_bl_callback+0xf8/0x4f0 [ptlrpc]
[Tue Oct 22 06:46:07 2019][502583.006463]  [&amp;lt;ffffffffc0fc00e0&amp;gt;] ? ldlm_lock_decref_internal+0x1a0/0xa30 [ptlrpc]
[Tue Oct 22 06:46:07 2019][502583.014202]  [&amp;lt;ffffffffc0d693b9&amp;gt;] ? class_handle2object+0xb9/0x1c0 [obdclass]
[Tue Oct 22 06:46:07 2019][502583.021483]  [&amp;lt;ffffffffc0fc0a70&amp;gt;] ? ldlm_lock_decref_and_cancel+0x80/0x150 [ptlrpc]
[Tue Oct 22 06:46:07 2019][502583.029249]  [&amp;lt;ffffffffc1447835&amp;gt;] ? mgs_completion_ast_generic+0x125/0x200 [mgs]
[Tue Oct 22 06:46:07 2019][502583.036765]  [&amp;lt;ffffffffc1447930&amp;gt;] ? mgs_completion_ast_barrier+0x20/0x20 [mgs]
[Tue Oct 22 06:46:07 2019][502583.044126]  [&amp;lt;ffffffffc1447943&amp;gt;] ? mgs_completion_ast_ir+0x13/0x20 [mgs]
[Tue Oct 22 06:46:07 2019][502583.051106]  [&amp;lt;ffffffffc0fbdc58&amp;gt;] ? ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc]
[Tue Oct 22 06:46:07 2019][502583.058392]  [&amp;lt;ffffffffc1005972&amp;gt;] ? ptlrpc_set_wait+0x72/0x790 [ptlrpc]
[Tue Oct 22 06:46:07 2019][502583.065122]  [&amp;lt;ffffffffaf81e41d&amp;gt;] ? kmem_cache_alloc_node_trace+0x11d/0x210
[Tue Oct 22 06:46:07 2019][502583.072240]  [&amp;lt;ffffffffc0d68a69&amp;gt;] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
[Tue Oct 22 06:46:07 2019][502583.079528]  [&amp;lt;ffffffffc0fbdbb0&amp;gt;] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc]
[Tue Oct 22 06:46:07 2019][502583.086915]  [&amp;lt;ffffffffc0ffc202&amp;gt;] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc]
[Tue Oct 22 06:46:07 2019][502583.093672]  [&amp;lt;ffffffffc0fc3055&amp;gt;] ? ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc]
[Tue Oct 22 06:46:07 2019][502583.100649]  [&amp;lt;ffffffffc0fc452f&amp;gt;] ? __ldlm_reprocess_all+0x11f/0x360 [ptlrpc]
[Tue Oct 22 06:46:07 2019][502583.107927]  [&amp;lt;ffffffffc0fc52a5&amp;gt;] ? ldlm_cancel_lock_for_export.isra.27+0x195/0x360 [ptlrpc]
[Tue Oct 22 06:46:07 2019][502583.116509]  [&amp;lt;ffffffffc0fc54ac&amp;gt;] ? ldlm_cancel_locks_for_export_cb+0x3c/0x50 [ptlrpc]
[Tue Oct 22 06:46:07 2019][502583.124535]  [&amp;lt;ffffffffc0a8efb0&amp;gt;] ? cfs_hash_for_each_relax+0x250/0x450 [libcfs]
[Tue Oct 22 06:46:07 2019][502583.132112]  [&amp;lt;ffffffffc0fc5470&amp;gt;] ? ldlm_cancel_lock_for_export.isra.27+0x360/0x360 [ptlrpc]
[Tue Oct 22 06:46:07 2019][502583.140690]  [&amp;lt;ffffffffc0fc5470&amp;gt;] ? ldlm_cancel_lock_for_export.isra.27+0x360/0x360 [ptlrpc]
[Tue Oct 22 06:46:07 2019][502583.149260]  [&amp;lt;ffffffffc0a92510&amp;gt;] ? cfs_hash_for_each_empty+0x80/0x1d0 [libcfs]
[Tue Oct 22 06:46:07 2019][502583.156714]  [&amp;lt;ffffffffc0fc57ba&amp;gt;] ? ldlm_export_cancel_locks+0xaa/0x180 [ptlrpc]
[Tue Oct 22 06:46:07 2019][502583.164257]  [&amp;lt;ffffffffc0fee888&amp;gt;] ? ldlm_bl_thread_main+0x7b8/0xa40 [ptlrpc]
[Tue Oct 22 06:46:07 2019][502583.171408]  [&amp;lt;ffffffffaf6d7c40&amp;gt;] ? wake_up_state+0x20/0x20
[Tue Oct 22 06:46:07 2019][502583.177127]  [&amp;lt;ffffffffc0fee0d0&amp;gt;] ? ldlm_handle_bl_callback+0x4f0/0x4f0 [ptlrpc]
[Tue Oct 22 06:46:07 2019][502583.184618]  [&amp;lt;ffffffffaf6c2e81&amp;gt;] ? kthread+0xd1/0xe0
[Tue Oct 22 06:46:07 2019][502583.189804]  [&amp;lt;ffffffffaf6c2db0&amp;gt;] ? insert_kthread_work+0x40/0x40
[Tue Oct 22 06:46:07 2019][502583.195988]  [&amp;lt;ffffffffafd77c24&amp;gt;] ? ret_from_fork_nospec_begin+0xe/0x21
[Tue Oct 22 06:46:07 2019][502583.202696]  [&amp;lt;ffffffffaf6c2db0&amp;gt;] ? insert_kthread_work+0x40/0x40
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I&apos;m attaching the full task dump when this was happening as  &lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/33701/33701_fir-md1-s1_20191022.log&quot; title=&quot;fir-md1-s1_20191022.log attached to LU-11936&quot;&gt;fir-md1-s1_20191022.log&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt; &lt;/p&gt;

&lt;p&gt;We have restarted the MGS. Unmounting it made the server unresponsive.&lt;/p&gt;</comment>
                            <comment id="256867" author="ssmirnov" created="Tue, 22 Oct 2019 19:19:31 +0000"  >&lt;p&gt;Hi Stephane,&lt;/p&gt;

&lt;p&gt;If the @tcp NID seen on the server is a result of misconfiguration on any of the clients, then the reported behaviour with removing the primary nid without evicting beforehand may be expected as the server will keep trying to rebuild the representation of the client.&lt;/p&gt;

&lt;p&gt;Would it be possible to get configuration details from the server/router/client (via lnetctl export)?&lt;/p&gt;

&lt;p&gt;It would make it easier to understand what is causing the issue and how ldml is getting affected.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;/p&gt;

&lt;p&gt;Serguei.&lt;/p&gt;</comment>
                            <comment id="258918" author="jstroik" created="Wed, 27 Nov 2019 16:06:52 +0000"  >&lt;p&gt;Hi Serguei,&lt;/p&gt;

&lt;p&gt;We&apos;re seeing a similar situation and i can reproduce it and provide you details.&#160; I added a comment to&#160; &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11989&quot; title=&quot;Global filesystem hangs in 2.12&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11989&quot;&gt;&lt;del&gt;LU-11989&lt;/del&gt;&lt;/a&gt; here which includes a relevant snippet from our MDS logs: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11989?focusedCommentId=258916&amp;amp;page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-258916&quot; class=&quot;external-link&quot; rel=&quot;nofollow&quot;&gt;https://jira.whamcloud.com/browse/LU-11989?focusedCommentId=258916&amp;amp;page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-258916&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In our example, we have four cluster nodes that initially connected to one of our 2.12 file systems using the wrong NID - one behind a NAT. The NIDs trapped behind the NAT are 10.23.x.x@tcp.&#160;&lt;/p&gt;

&lt;p&gt;Let me know if i can provide you any other information.&lt;/p&gt;

&lt;p&gt;Jesse Stroik&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                        <issuelink>
            <issuekey id="54666">LU-11888</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="31949" name="fir-md1-s1-console.log" size="4428910" author="sthiell" created="Wed, 6 Feb 2019 17:52:57 +0000"/>
                            <attachment id="33701" name="fir-md1-s1_20191022.log" size="3078044" author="sthiell" created="Tue, 22 Oct 2019 18:28:49 +0000"/>
                            <attachment id="31948" name="fir-md1-s2-console.log" size="2991074" author="sthiell" created="Wed, 6 Feb 2019 17:53:02 +0000"/>
                            <attachment id="31947" name="lustre-log.1549473603.35530.gz" size="1130334" author="sthiell" created="Wed, 6 Feb 2019 17:53:05 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00b4n:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>