<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:34:38 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-10390] MGS crashes in ldlm_reprocess_queue() when stopping</title>
                <link>https://jira.whamcloud.com/browse/LU-10390</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Never seen before but already twice with the latest 2.10.2 version: the MGS is crashing when stopping:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;[77225.855547] BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
[77225.864304] IP: [&amp;lt;ffffffffc0ba48ce&amp;gt;] ldlm_process_plain_lock+0x6e/0xb30 [ptlrpc]
[77225.872614] PGD 0 
[77225.874864] Oops: 0000 [#1] SMP 
[77225.878480] Modules linked in: mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) vfat fat uas usb_storage mpt2sas mptctl mptbase dell_rbu rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_core sb_edac edac_core intel_powerclamp dm_service_time coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper iTCO_wdt ablk_helper iTCO_vendor_support cryptd dm_round_robin pcspkr mxm_wmi dcdbas sg ipmi_si ipmi_devintf ipmi_msghandler mei_me mei lpc_ich shpchp acpi_power_meter wmi nfsd auth_rpcgss nfs_acl lockd grace dm_multipath sunrpc dm_mod ip_tables ext4 mbcache
[77225.957975]  jbd2 sd_mod crc_t10dif crct10dif_generic mlx4_en i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mlx4_core drm tg3 ahci libahci crct10dif_pclmul mpt3sas crct10dif_common raid_class ptp crc32c_intel libata megaraid_sas i2c_core devlink scsi_transport_sas pps_core
[77225.986851] CPU: 23 PID: 27105 Comm: ldlm_bl_14 Tainted: G           OE  ------------   3.10.0-693.2.2.el7_lustre.pl1.x86_64 #1
[77225.999661] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 2.3.4 11/08/2016
[77226.008010] task: ffff88103323cf10 ti: ffff881012ab8000 task.ti: ffff881012ab8000
[77226.016359] RIP: 0010:[&amp;lt;ffffffffc0ba48ce&amp;gt;]  [&amp;lt;ffffffffc0ba48ce&amp;gt;] ldlm_process_plain_lock+0x6e/0xb30 [ptlrpc]
[77226.027355] RSP: 0018:ffff881012abbbe0  EFLAGS: 00010287
[77226.033280] RAX: 0000000000000000 RBX: ffff881011f7d400 RCX: ffff881012abbc7c
[77226.041240] RDX: 0000000000000002 RSI: ffff881012abbc80 RDI: ffff881011f7d400
[77226.049201] RBP: ffff881012abbc58 R08: ffff881012abbcd0 R09: ffff88103d0d7880
[77226.057162] R10: ffff881011f7d400 R11: 7fffffffffffffff R12: ffff880168287540
[77226.065123] R13: 0000000000000002 R14: ffff881012abbcd0 R15: ffff881011f7d460
[77226.073085] FS:  0000000000000000(0000) GS:ffff88203c8c0000(0000) knlGS:0000000000000000
[77226.082111] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[77226.088521] CR2: 000000000000001c CR3: 00000000019f2000 CR4: 00000000001407e0
[77226.096482] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[77226.104443] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[77226.112403] Stack:
[77226.114642]  ffff881012abbc7c ffff881012abbcd0 ffff881012abbc80 0000000000000000
[77226.122930]  ffff880168287520 0000001000000001 ffff880100000010 ffff881012abbc18
[77226.131219]  ffff881012abbc18 00000000bd734aeb 0000000000000002 ffff880168287540
[77226.139507] Call Trace:
[77226.142252]  [&amp;lt;ffffffffc0ba4860&amp;gt;] ? ldlm_errno2error+0x60/0x60 [ptlrpc]
[77226.149649]  [&amp;lt;ffffffffc0b8f9db&amp;gt;] ldlm_reprocess_queue+0x13b/0x2a0 [ptlrpc]
[77226.157434]  [&amp;lt;ffffffffc0b9057d&amp;gt;] __ldlm_reprocess_all+0x14d/0x3a0 [ptlrpc]
[77226.165220]  [&amp;lt;ffffffffc0b90b30&amp;gt;] ldlm_reprocess_res+0x20/0x30 [ptlrpc]
[77226.172611]  [&amp;lt;ffffffffc0866bef&amp;gt;] cfs_hash_for_each_relax+0x21f/0x400 [libcfs]
[77226.180687]  [&amp;lt;ffffffffc0b90b10&amp;gt;] ? ldlm_lock_downgrade+0x320/0x320 [ptlrpc]
[77226.188571]  [&amp;lt;ffffffffc0b90b10&amp;gt;] ? ldlm_lock_downgrade+0x320/0x320 [ptlrpc]
[77226.196441]  [&amp;lt;ffffffffc0869d95&amp;gt;] cfs_hash_for_each_nolock+0x75/0x1c0 [libcfs]
[77226.204518]  [&amp;lt;ffffffffc0b90b7c&amp;gt;] ldlm_reprocess_recovery_done+0x3c/0x110 [ptlrpc]
[77226.212983]  [&amp;lt;ffffffffc0b917bc&amp;gt;] ldlm_export_cancel_locks+0x11c/0x130 [ptlrpc]
[77226.221162]  [&amp;lt;ffffffffc0bbada8&amp;gt;] ldlm_bl_thread_main+0x4c8/0x700 [ptlrpc]
[77226.228836]  [&amp;lt;ffffffff816a8fad&amp;gt;] ? __schedule+0x39d/0x8b0
[77226.234977]  [&amp;lt;ffffffffc0bba8e0&amp;gt;] ? ldlm_handle_bl_callback+0x410/0x410 [ptlrpc]
[77226.243232]  [&amp;lt;ffffffff810b098f&amp;gt;] kthread+0xcf/0xe0
[77226.248672]  [&amp;lt;ffffffff810b08c0&amp;gt;] ? insert_kthread_work+0x40/0x40
[77226.255472]  [&amp;lt;ffffffff816b4f58&amp;gt;] ret_from_fork+0x58/0x90
[77226.261494]  [&amp;lt;ffffffff810b08c0&amp;gt;] ? insert_kthread_work+0x40/0x40
[77226.268292] Code: 89 45 a0 74 0d f6 05 b3 ac cd ff 01 0f 85 34 06 00 00 8b 83 98 00 00 00 39 83 9c 00 00 00 89 45 b8 0f 84 57 09 00 00 48 8b 45 a0 &amp;lt;8b&amp;gt; 40 1c 85 c0 0f 84 7a 09 00 00 48 8b 4d a0 48 89 c8 48 83 c0 
[77226.289871] RIP  [&amp;lt;ffffffffc0ba48ce&amp;gt;] ldlm_process_plain_lock+0x6e/0xb30 [ptlrpc]
[77226.298248]  RSP &amp;lt;ffff881012abbbe0&amp;gt;
[77226.302135] CR2: 000000000000001c
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Best,&lt;br/&gt;
Stephane&lt;/p&gt;</description>
                <environment>CentOS 7.4</environment>
        <key id="49743">LU-10390</key>
            <summary>MGS crashes in ldlm_reprocess_queue() when stopping</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="emoly.liu">Emoly Liu</assignee>
                                    <reporter username="sthiell">Stephane Thiell</reporter>
                        <labels>
                    </labels>
                <created>Thu, 14 Dec 2017 19:57:46 +0000</created>
                <updated>Mon, 2 Aug 2021 08:12:46 +0000</updated>
                                            <version>Lustre 2.10.2</version>
                                                        <due></due>
                            <votes>1</votes>
                                    <watches>14</watches>
                                                                            <comments>
                            <comment id="216415" author="pjones" created="Fri, 15 Dec 2017 18:16:49 +0000"  >&lt;p&gt;Thanks Stephane&lt;/p&gt;</comment>
                            <comment id="221221" author="sthiell" created="Fri, 16 Feb 2018 22:55:11 +0000"  >&lt;p&gt;Crash dump for this issue, with Lustre 2.10.2 on all targets.&lt;/p&gt;

&lt;p&gt;vmcore-oak-md1-s1-2017-12-14-11:47:46.gz:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://stanford.box.com/s/unzo08zycbafl5t62cwtkubl7hsr3y1n&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://stanford.box.com/s/unzo08zycbafl5t62cwtkubl7hsr3y1n&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;vmlinux-3.10.0-693.2.2.el7_lustre.pl1.x86_64.gz:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://stanford.box.com/s/r6u6xjzmsgys2kzcq26562bqo6p45xcp&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://stanford.box.com/s/r6u6xjzmsgys2kzcq26562bqo6p45xcp&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="222017" author="mhaakddn" created="Thu, 1 Mar 2018 04:59:15 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;Quick question, are your clients at a matching version of lustre? Or are they running an earlier version? (Even 2.10.1?)&lt;/p&gt;</comment>
                            <comment id="222026" author="sthiell" created="Thu, 1 Mar 2018 05:49:28 +0000"  >&lt;p&gt;Hi Malcolm,&lt;/p&gt;

&lt;p&gt;At the time of this issue, we had a mix of 2.10.0, 2.10.1 and 2.10.2 clients.&lt;/p&gt;

&lt;p&gt;I couldn&apos;t reproduce this problem with a MGS running 2.10.3 though.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="222033" author="mhaakddn" created="Thu, 1 Mar 2018 06:43:46 +0000"  >&lt;p&gt;Thank you so much. I was going to git bisect this but if 2.10.3 fixes it I&apos;ll move onto testing that.&lt;/p&gt;

&lt;p&gt;Do you still have a mix of clients? I had an idea about what I patches I was going to pick on for my testing.&lt;/p&gt;</comment>
                            <comment id="222035" author="sthiell" created="Thu, 1 Mar 2018 06:59:08 +0000"  >&lt;p&gt;Sure. And yes, with the servers running&#160;2.10.3 we had a mix of 2.10.x (x = 0,1,2,3) clients. I say &quot;had&quot; because we&apos;re not running 2.10.3 anymore as we downgraded all servers to 2.10.2 for other reasons.&#160;&#160;But I do confirm that we&#160;couldn&apos;t reproduce the MGS panic when unmounting the MGS when&#160;the&#160;servers were&#160;running version 2.10.3, and we did it several times.&#160;That reminds me that we haven&apos;t&#160;tried&#160;to stop the MGS now that we&apos;re back in 2.10.2.&lt;/p&gt;

&lt;p&gt;Just saw &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10635&quot; title=&quot;MGS kernel panic when configuring nodemaps and filesets&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10635&quot;&gt;&lt;del&gt;LU-10635&lt;/del&gt;&lt;/a&gt;,&#160;note that in our case we&apos;re also using the nodemap feature but without filesets.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="222101" author="mhaakddn" created="Thu, 1 Mar 2018 23:47:29 +0000"  >&lt;p&gt;Thanks for that info also.&lt;/p&gt;

&lt;p&gt;It at least points me in the direction I need for some testing. Thanks again &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/biggrin.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;</comment>
                            <comment id="222287" author="emoly.liu" created="Mon, 5 Mar 2018 01:44:37 +0000"  >&lt;p&gt;Thanks for Stephane&apos;s information and thanks for Malcolm&apos;s testing!&lt;br/&gt;
Malcom, could you please update the status after you do some testing? Thanks.&lt;/p&gt;</comment>
                            <comment id="222291" author="mhaakddn" created="Mon, 5 Mar 2018 02:23:52 +0000"  >&lt;p&gt;My testing was limited. I installed 2.10.1 , 2.10.2 and 2.10.3. None of them had sane output in /proc/fs/lustre/ldlm/namespaces/MGS/pool/stats&lt;/p&gt;

&lt;p&gt;At this point I decided something is horridly wrong since nobody answered my questions in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10635&quot; title=&quot;MGS kernel panic when configuring nodemaps and filesets&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10635&quot;&gt;&lt;del&gt;LU-10635&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I&apos;m going to start with 2.9.1 and see if its also whacky in 2.9 and then when I figure out which point release it all went bad I&apos;ll start doing a git bisect&lt;/p&gt;

&lt;p&gt;This has all come to ahead for NCI because this non-granting or half-granting of locks is starting to really cause issues with clients&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;LDLM is broken. It looks like its broken in 2.10.3 still. Despite your lack of shutdown crashes post upgrade.&lt;/p&gt;

&lt;p&gt;I couldn&apos;t get my test environment to crash, for the record. But its tiny so I don&apos;t think its a valid test.&lt;/p&gt;

&lt;p&gt;I do think having a look at the crash dumps will be enlightening.&lt;/p&gt;

&lt;p&gt;EDIT: Other issue is related to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9704&quot; title=&quot;ofd_grant_check() claims GRANT, real grant 0&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9704&quot;&gt;&lt;del&gt;LU-9704&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="223840" author="sthiell" created="Fri, 16 Mar 2018 14:09:18 +0000"  >&lt;p&gt;Hello,&lt;/p&gt;

&lt;p&gt;This issue just happened in 2.10.3 when we tried to unmount the MGS. In this case, the server didn&apos;t crash but we saw a lot of soft lockup/CPU stuck stack traces on the MGS like this one:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[83347.066299] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [ldlm_bl_11:16013]
[83347.066322] Modules linked in: mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) vfat fat uas usb_storage mpt2sas mptctl mptbase rpcsec_gss_krb5 dell_rbu nfsv4 dns_resolver nfs fscache ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_core sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi dm_service_time kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt dcdbas iTCO_vendor_support ipmi_si ipmi_devintf mxm_wmi dm_round_robin pcspkr sg ipmi_msghandler acpi_power_meter wmi mei_me mei shpchp lpc_ich nfsd auth_rpcgss dm_multipath dm_mod nfs_acl lockd grace sunrpc ip_tables ext4 mbcache
[83347.066330]  jbd2 sd_mod crc_t10dif crct10dif_generic mlx4_en i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm tg3 ahci crct10dif_pclmul crct10dif_common mlx4_core mpt3sas drm libahci crc32c_intel ptp raid_class libata megaraid_sas devlink i2c_core scsi_transport_sas pps_core
[83347.066331] CPU: 5 PID: 16013 Comm: ldlm_bl_11 Tainted: G           OEL ------------   3.10.0-693.2.2.el7_lustre.pl2.x86_64 #1
[83347.066332] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 2.6.0 10/26/2017
[83347.066333] task: ffff88203ad88000 ti: ffff88201c804000 task.ti: ffff88201c804000
[83347.066335] RIP: 0010:[&amp;lt;ffffffff810fa326&amp;gt;]  [&amp;lt;ffffffff810fa326&amp;gt;] native_queued_spin_lock_slowpath+0x116/0x1e0
[83347.066335] RSP: 0018:ffff88201c807b70  EFLAGS: 00000246
[83347.066336] RAX: 0000000000000000 RBX: 000000013c796cc0 RCX: 0000000000290000
[83347.066337] RDX: ffff88103d017880 RSI: 0000000000810001 RDI: ffff88102137981c
[83347.066337] RBP: ffff88201c807b70 R08: ffff88203c697880 R09: 0000000000000000
[83347.066338] R10: ffff88202b216a00 R11: 0000000000000000 R12: ffff88201c807c58
[83347.066338] R13: 0000000000000001 R14: ffff88201c807b28 R15: ffffffff81322c35
[83347.066339] FS:  0000000000000000(0000) GS:ffff88203c680000(0000) knlGS:0000000000000000
[83347.066340] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[83347.066340] CR2: 00007fd7ddcd8090 CR3: 00000000019f2000 CR4: 00000000001407e0
[83347.066341] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[83347.066341] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[83347.066341] Stack:
[83347.066343]  ffff88201c807b80 ffffffff8169e61f ffff88201c807b90 ffffffff816abb70
[83347.066344]  ffff88201c807bd0 ffffffffc0b72198 0000000000000000 ffff88202b216a00
[83347.066345]  ffff88201c807c18 ffff88202b216a60 ffff88202b216a00 ffff8810175b8000
[83347.066345] Call Trace:
[83347.066347]  [&amp;lt;ffffffff8169e61f&amp;gt;] queued_spin_lock_slowpath+0xb/0xf
[83347.066349]  [&amp;lt;ffffffff816abb70&amp;gt;] _raw_spin_lock+0x20/0x30
[83347.066367]  [&amp;lt;ffffffffc0b72198&amp;gt;] ldlm_handle_conflict_lock+0xd8/0x330 [ptlrpc]
[83347.066388]  [&amp;lt;ffffffffc0b86755&amp;gt;] ldlm_process_plain_lock+0x435/0xb30 [ptlrpc]
[83347.066407]  [&amp;lt;ffffffffc0b86320&amp;gt;] ? ldlm_errno2error+0x60/0x60 [ptlrpc]
[83347.066425]  [&amp;lt;ffffffffc0b7199b&amp;gt;] ldlm_reprocess_queue+0x13b/0x2a0 [ptlrpc]
[83347.066443]  [&amp;lt;ffffffffc0b7253d&amp;gt;] __ldlm_reprocess_all+0x14d/0x3a0 [ptlrpc]
[83347.066460]  [&amp;lt;ffffffffc0b72af0&amp;gt;] ldlm_reprocess_res+0x20/0x30 [ptlrpc]
[83347.066466]  [&amp;lt;ffffffffc0847bef&amp;gt;] cfs_hash_for_each_relax+0x21f/0x400 [libcfs]
[83347.066483]  [&amp;lt;ffffffffc0b72ad0&amp;gt;] ? ldlm_lock_downgrade+0x320/0x320 [ptlrpc]
[83347.066501]  [&amp;lt;ffffffffc0b72ad0&amp;gt;] ? ldlm_lock_downgrade+0x320/0x320 [ptlrpc]
[83347.066506]  [&amp;lt;ffffffffc084ad95&amp;gt;] cfs_hash_for_each_nolock+0x75/0x1c0 [libcfs]
[83347.066526]  [&amp;lt;ffffffffc0b72b3c&amp;gt;] ldlm_reprocess_recovery_done+0x3c/0x110 [ptlrpc]
[83347.066544]  [&amp;lt;ffffffffc0b737cc&amp;gt;] ldlm_export_cancel_locks+0x11c/0x130 [ptlrpc]
[83347.066566]  [&amp;lt;ffffffffc0b9c9a8&amp;gt;] ldlm_bl_thread_main+0x4c8/0x700 [ptlrpc]
[83347.066567]  [&amp;lt;ffffffff810c4810&amp;gt;] ? wake_up_state+0x20/0x20
[83347.066588]  [&amp;lt;ffffffffc0b9c4e0&amp;gt;] ? ldlm_handle_bl_callback+0x410/0x410 [ptlrpc]
[83347.066589]  [&amp;lt;ffffffff810b098f&amp;gt;] kthread+0xcf/0xe0
[83347.066590]  [&amp;lt;ffffffff810b08c0&amp;gt;] ? insert_kthread_work+0x40/0x40
[83347.066592]  [&amp;lt;ffffffff816b4f58&amp;gt;] ret_from_fork+0x58/0x90
[83347.066593]  [&amp;lt;ffffffff810b08c0&amp;gt;] ? insert_kthread_work+0x40/0x40
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After a few minutes, I decided to take a crash dump. It is available upon request. Attaching the associated vmcore-dmesg.txt as vmcore-dmesg-oak-md1-s1-2018-03-16-06-47-23.txt&lt;/p&gt;

&lt;p&gt;Prior to un-mounting the MGS, I also noticed tons of &apos;lock timed out&apos; errors on clients for the MGS (at 10.0.2.51@o2ib5):&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[716108.759689] LustreError: 1392:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1521202950, 300s ago), entering recovery for MGS@MGC10.0.2.51@o2ib5_0 ns: MGC10.0.2.51@o2ib5 lock: ffff8803e8b51800/0x509cd0f71fde08cc lrc: 4/1,0 mode: --/CR res: [0x6b616f:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x2e0a3ddb63e79ade expref: -99 pid: 1392 timeout: 0 lvb_type: 0
[716108.760126] LustreError: 5988:0:(ldlm_resource.c:1100:ldlm_resource_complain()) MGC10.0.2.51@o2ib5: namespace resource [0x6b616f:0x2:0x0].0x0 (ffff880402be2600) refcount nonzero (2) after lock cleanup; forcing cleanup.
[716108.760128] LustreError: 5988:0:(ldlm_resource.c:1100:ldlm_resource_complain()) Skipped 1 previous similar message
[716108.760130] LustreError: 5988:0:(ldlm_resource.c:1682:ldlm_resource_dump()) --- Resource: [0x6b616f:0x2:0x0].0x0 (ffff880402be2600) refcount = 3
[716108.760132] LustreError: 5988:0:(ldlm_resource.c:1703:ldlm_resource_dump()) Waiting locks:
[716108.760137] LustreError: 5988:0:(ldlm_resource.c:1705:ldlm_resource_dump()) ### ### ns: MGC10.0.2.51@o2ib5 lock: ffff8803e8b51800/0x509cd0f71fde08cc lrc: 4/1,0 mode: --/CR res: [0x6b616f:0x2:0x0].0x0 rrc: 4 type: PLN flags: 0x1106400000000 nid: local remote: 0x2e0a3ddb63e79ade expref: -99 pid: 1392 timeout: 0 lvb_type: 0
[716108.760138] LustreError: 5988:0:(ldlm_resource.c:1705:ldlm_resource_dump()) Skipped 1 previous similar message
[716108.778087] LustreError: 1392:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) Skipped 1 previous similar message
[716415.536505] LustreError: 166-1: MGC10.0.2.51@o2ib5: Connection to MGS (at 10.0.2.51@o2ib5) was lost; in progress operations using this service will fail
[716415.539195] LustreError: Skipped 1 previous similar message
[716415.540823] LustreError: 5999:0:(ldlm_resource.c:1682:ldlm_resource_dump()) --- Resource: [0x6b616f:0x2:0x0].0x0 (ffff8800a2fb46c0) refcount = 2
[716415.543552] LustreError: 5999:0:(ldlm_resource.c:1703:ldlm_resource_dump()) Waiting locks:
[716415.544914] Lustre: MGC10.0.2.51@o2ib5: Connection restored to MGC10.0.2.51@o2ib5_0 (at 10.0.2.51@o2ib5)
[716415.546536] Lustre: Skipped 1 previous similar message
[716723.374375] LustreError: 1392:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1521203565, 300s ago), entering recovery for MGS@MGC10.0.2.51@o2ib5_0 ns: MGC10.0.2.51@o2ib5 lock: ffff8802383c6c00/0x509cd0f71fde2a67 lrc: 4/1,0 mode: --/CR res: [0x6b616f:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x2e0a3ddb63e8a325 expref: -99 pid: 1392 timeout: 0 lvb_type: 0
[716723.380318] LustreError: 1392:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) Skipped 1 previous similar message
[716723.381110] LustreError: 6003:0:(ldlm_resource.c:1100:ldlm_resource_complain()) MGC10.0.2.51@o2ib5: namespace resource [0x6b616f:0x2:0x0].0x0 (ffff8803aafc4900) refcount nonzero (2) after lock cleanup; forcing cleanup.
[716723.381112] LustreError: 6003:0:(ldlm_resource.c:1100:ldlm_resource_complain()) Skipped 1 previous similar message
[716723.381115] LustreError: 6003:0:(ldlm_resource.c:1682:ldlm_resource_dump()) --- Resource: [0x6b616f:0x2:0x0].0x0 (ffff8803aafc4900) refcount = 3
[716723.381117] LustreError: 6003:0:(ldlm_resource.c:1703:ldlm_resource_dump()) Waiting locks:
[716723.381123] LustreError: 6003:0:(ldlm_resource.c:1705:ldlm_resource_dump()) ### ### ns: MGC10.0.2.51@o2ib5 lock: ffff8802383c6c00/0x509cd0f71fde2a67 lrc: 4/1,0 mode: --/CR res: [0x6b616f:0x2:0x0].0x0 rrc: 4 type: PLN flags: 0x1106400000000 nid: local remote: 0x2e0a3ddb63e8a325 expref: -99 pid: 1392 timeout: 0 lvb_type: 0
[716723.381124] LustreError: 6003:0:(ldlm_resource.c:1705:ldlm_resource_dump()) Skipped 1 previous similar message
[717033.536366] LustreError: 166-1: MGC10.0.2.51@o2ib5: Connection to MGS (at 10.0.2.51@o2ib5) was lost; in progress operations using this service will fail
[717033.539206] LustreError: Skipped 1 previous similar message
[717033.541034] Lustre: MGC10.0.2.51@o2ib5: Connection restored to MGC10.0.2.51@o2ib5_0 (at 10.0.2.51@o2ib5)
[717033.542492] Lustre: Skipped 1 previous similar message
[717340.609290] LustreError: 1392:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1521204182, 300s ago), entering recovery for MGS@MGC10.0.2.51@o2ib5_0 ns: MGC10.0.2.51@o2ib5 lock: ffff8803fd41c000/0x509cd0f71fde4c02 lrc: 4/1,0 mode: --/CR res: [0x6b616f:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x2e0a3ddb63e9adcd expref: -99 pid: 1392 timeout: 0 lvb_type: 0
[717340.615144] LustreError: 1392:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) Skipped 1 previous similar message
[717340.617285] LustreError: 6017:0:(ldlm_resource.c:1100:ldlm_resource_complain()) MGC10.0.2.51@o2ib5: namespace resource [0x6b616f:0x2:0x0].0x0 (ffff88039bd87540) refcount nonzero (1) after lock cleanup; forcing cleanup.
[717340.619996] LustreError: 6017:0:(ldlm_resource.c:1682:ldlm_resource_dump()) --- Resource: [0x6b616f:0x2:0x0].0x0 (ffff88039bd87540) refcount = 2
[717340.621862] LustreError: 6017:0:(ldlm_resource.c:1703:ldlm_resource_dump()) Waiting locks:
[717340.623052] LustreError: 6017:0:(ldlm_resource.c:1705:ldlm_resource_dump()) ### ### ns: ?? lock: ffff8803fd41c000/0x509cd0f71fde4c02 lrc: 4/1,0 mode: --/CR res: ?? rrc=?? type: ??? flags: 0x1106400000000 nid: local remote: 0x2e0a3ddb63e9adcd expref: -99 pid: 1392 timeout: 0 lvb_type: 0
[717648.431191] LustreError: 166-1: MGC10.0.2.51@o2ib5: Connection to MGS (at 10.0.2.51@o2ib5) was lost; in progress operations using this service will fail
[717648.433742] LustreError: Skipped 1 previous similar message
[717648.434877] LustreError: 6028:0:(ldlm_resource.c:1682:ldlm_resource_dump()) --- Resource: [0x6b616f:0x2:0x0].0x0 (ffff8803be8ce000) refcount = 2
[717648.436695] LustreError: 6028:0:(ldlm_resource.c:1703:ldlm_resource_dump()) Waiting locks:
[717648.438034] Lustre: MGC10.0.2.51@o2ib5: Connection restored to MGC10.0.2.51@o2ib5_0 (at 10.0.2.51@o2ib5)
[717648.439395] Lustre: Skipped 1 previous similar message&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Thanks!&lt;br/&gt;
 Stephane&lt;/p&gt;</comment>
                            <comment id="224087" author="emoly.liu" created="Wed, 21 Mar 2018 06:41:47 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=sthiell&quot; class=&quot;user-hover&quot; rel=&quot;sthiell&quot;&gt;sthiell&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;I&apos;m checking the patch of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8287&quot; title=&quot;sanity-sec test_16: mgs and c0 idmap mismatch, 10 attempts&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8287&quot;&gt;&lt;del&gt;LU-8287&lt;/del&gt;&lt;/a&gt; that Malcolm mentioned in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10635&quot; title=&quot;MGS kernel panic when configuring nodemaps and filesets&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10635&quot;&gt;&lt;del&gt;LU-10635&lt;/del&gt;&lt;/a&gt;. Are you using ZFS? Is this issue reproducible for you? &#160;And what operations did you do before you hit this issue?&lt;br/&gt;
 Any help to reproduce this issue will be appreciated.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
 Emoly&lt;/p&gt;</comment>
                            <comment id="224124" author="sthiell" created="Wed, 21 Mar 2018 14:31:53 +0000"  >&lt;p&gt;Hi Emoly,&lt;/p&gt;

&lt;p&gt;We&apos;re using ldiskfs, not ZFS. The crash occurs almost each time I stop the MGS, usually after at least a few days of production. But because it&apos;s a system in production, my actions are limited, I cannot test much right now. We&apos;re almost constantly redeploying clients for rolling updates and the MGS has to be available. Still, I&apos;ll see what I can do. Also what kind of information would help you (like before unmounting), perhaps the output of&#160;&lt;/p&gt;

&lt;p&gt;/proc/fs/lustre/ldlm/namespaces/MGS/pool/stats as mentioned by Malcolm?&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Also, as I said, I often see these&#160;&apos;lock timed out&apos; errors on the clients (see my previous comment), which are related to the MGS.&lt;/p&gt;

&lt;p&gt;Thanks!!&lt;/p&gt;

&lt;p&gt;Stephane&lt;/p&gt;</comment>
                            <comment id="228411" author="lixi" created="Wed, 23 May 2018 12:06:49 +0000"  >&lt;p&gt;I got similar issue when running test of replay-dual:26 on Lustre-2.10.4.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;4853.863850] Lustre: DEBUG MARKER: == replay-dual test 26: dbench and tar with mds failover ============================================= 19:10:39 (1527073839)&lt;br/&gt;
[ 4857.258739] Lustre: 3864:0:(osd_internal.h:1139:osd_trans_exec_op()) lustre-MDT0000: opcode 2: before 509 &amp;lt; left 638, rollback = 2&lt;br/&gt;
[ 4857.265541] Lustre: 3864:0:(osd_handler.c:1673:osd_trans_dump_creds()) create: 3/24/0, destroy: 0/0/0&lt;br/&gt;
[ 4857.270705] Lustre: 3864:0:(osd_handler.c:1680:osd_trans_dump_creds()) attr_set: 1/1/0, xattr_set: 9/638/0&lt;br/&gt;
[ 4857.276600] Lustre: 3864:0:(osd_handler.c:1690:osd_trans_dump_creds()) write: 5/136/0, punch: 0/0/0, quota 3/3/0&lt;br/&gt;
[ 4857.285310] Lustre: 3864:0:(osd_handler.c:1697:osd_trans_dump_creds()) insert: 11/199/3, delete: 0/0/0&lt;br/&gt;
[ 4857.290653] Lustre: 3864:0:(osd_handler.c:1704:osd_trans_dump_creds()) ref_add: 5/5/0, ref_del: 0/0/0&lt;br/&gt;
[ 4858.049927] Lustre: 3864:0:(osd_internal.h:1139:osd_trans_exec_op()) lustre-MDT0000: opcode 2: before 509 &amp;lt; left 638, rollback = 2&lt;br/&gt;
[ 4858.056424] Lustre: 3864:0:(osd_internal.h:1139:osd_trans_exec_op()) Skipped 7 previous similar messages&lt;br/&gt;
[ 4858.061225] Lustre: 3864:0:(osd_handler.c:1673:osd_trans_dump_creds()) create: 3/24/0, destroy: 0/0/0&lt;br/&gt;
[ 4858.066306] Lustre: 3864:0:(osd_handler.c:1673:osd_trans_dump_creds()) Skipped 7 previous similar messages&lt;br/&gt;
[ 4858.071319] Lustre: 3864:0:(osd_handler.c:1680:osd_trans_dump_creds()) attr_set: 1/1/0, xattr_set: 9/638/0&lt;br/&gt;
[ 4858.077706] Lustre: 3864:0:(osd_handler.c:1680:osd_trans_dump_creds()) Skipped 7 previous similar messages&lt;br/&gt;
[ 4858.083901] Lustre: 3864:0:(osd_handler.c:1690:osd_trans_dump_creds()) write: 5/136/0, punch: 0/0/0, quota 3/3/0&lt;br/&gt;
[ 4858.090166] Lustre: 3864:0:(osd_handler.c:1690:osd_trans_dump_creds()) Skipped 7 previous similar messages&lt;br/&gt;
[ 4858.096507] Lustre: 3864:0:(osd_handler.c:1697:osd_trans_dump_creds()) insert: 11/199/3, delete: 0/0/0&lt;br/&gt;
[ 4858.108592] Lustre: 3864:0:(osd_handler.c:1697:osd_trans_dump_creds()) Skipped 7 previous similar messages&lt;br/&gt;
[ 4858.114122] Lustre: 3864:0:(osd_handler.c:1704:osd_trans_dump_creds()) ref_add: 5/5/0, ref_del: 0/0/0&lt;br/&gt;
[ 4858.120097] Lustre: 3864:0:(osd_handler.c:1704:osd_trans_dump_creds()) Skipped 7 previous similar messages&lt;br/&gt;
[ 4859.276388] Lustre: 3864:0:(osd_internal.h:1139:osd_trans_exec_op()) lustre-MDT0000: opcode 2: before 504 &amp;lt; left 638, rollback = 2&lt;br/&gt;
[ 4859.282904] Lustre: 3864:0:(osd_internal.h:1139:osd_trans_exec_op()) Skipped 87 previous similar messages&lt;br/&gt;
[ 4859.297126] Lustre: 3864:0:(osd_handler.c:1673:osd_trans_dump_creds()) create: 3/24/4, destroy: 0/0/0&lt;br/&gt;
[ 4859.303500] Lustre: 3864:0:(osd_handler.c:1673:osd_trans_dump_creds()) Skipped 87 previous similar messages&lt;br/&gt;
[ 4859.308834] Lustre: 3864:0:(osd_handler.c:1680:osd_trans_dump_creds()) attr_set: 1/1/0, xattr_set: 9/638/0&lt;br/&gt;
[ 4859.314310] Lustre: 3864:0:(osd_handler.c:1680:osd_trans_dump_creds()) Skipped 87 previous similar messages&lt;br/&gt;
[ 4859.319758] Lustre: 3864:0:(osd_handler.c:1690:osd_trans_dump_creds()) write: 5/136/0, punch: 0/0/0, quota 3/3/0&lt;br/&gt;
[ 4859.322635] Lustre: DEBUG MARKER: sync; sync; sync&lt;br/&gt;
[ 4859.332972] Lustre: 3864:0:(osd_handler.c:1690:osd_trans_dump_creds()) Skipped 87 previous similar messages&lt;br/&gt;
[ 4859.338367] Lustre: 3864:0:(osd_handler.c:1697:osd_trans_dump_creds()) insert: 11/199/4, delete: 0/0/0&lt;br/&gt;
[ 4859.343583] Lustre: 3864:0:(osd_handler.c:1697:osd_trans_dump_creds()) Skipped 87 previous similar messages&lt;br/&gt;
[ 4859.349626] Lustre: 3864:0:(osd_handler.c:1704:osd_trans_dump_creds()) ref_add: 5/5/0, ref_del: 0/0/0&lt;br/&gt;
[ 4859.357229] Lustre: 3864:0:(osd_handler.c:1704:osd_trans_dump_creds()) Skipped 87 previous similar messages&lt;br/&gt;
[ 4861.591545] Lustre: 3864:0:(osd_internal.h:1139:osd_trans_exec_op()) lustre-MDT0000: opcode 2: before 502 &amp;lt; left 638, rollback = 2&lt;br/&gt;
[ 4861.598048] Lustre: 3864:0:(osd_internal.h:1139:osd_trans_exec_op()) Skipped 55 previous similar messages&lt;br/&gt;
[ 4861.603329] Lustre: 3864:0:(osd_handler.c:1673:osd_trans_dump_creds()) create: 3/24/5, destroy: 0/0/0&lt;br/&gt;
[ 4861.608250] Lustre: 3864:0:(osd_handler.c:1673:osd_trans_dump_creds()) Skipped 55 previous similar messages&lt;br/&gt;
[ 4861.613206] Lustre: 3864:0:(osd_handler.c:1680:osd_trans_dump_creds()) attr_set: 1/1/0, xattr_set: 9/638/0&lt;br/&gt;
[ 4861.618462] Lustre: 3864:0:(osd_handler.c:1680:osd_trans_dump_creds()) Skipped 55 previous similar messages&lt;br/&gt;
[ 4861.623794] Lustre: 3864:0:(osd_handler.c:1690:osd_trans_dump_creds()) write: 5/136/0, punch: 0/0/0, quota 3/3/0&lt;br/&gt;
[ 4861.629265] Lustre: 3864:0:(osd_handler.c:1690:osd_trans_dump_creds()) Skipped 55 previous similar messages&lt;br/&gt;
[ 4861.634206] Lustre: 3864:0:(osd_handler.c:1697:osd_trans_dump_creds()) insert: 11/199/5, delete: 0/0/0&lt;br/&gt;
[ 4861.638943] Lustre: 3864:0:(osd_handler.c:1697:osd_trans_dump_creds()) Skipped 55 previous similar messages&lt;br/&gt;
[ 4861.643917] Lustre: 3864:0:(osd_handler.c:1704:osd_trans_dump_creds()) ref_add: 5/5/0, ref_del: 0/0/0&lt;br/&gt;
[ 4861.648812] Lustre: 3864:0:(osd_handler.c:1704:osd_trans_dump_creds()) Skipped 55 previous similar messages&lt;br/&gt;
[ 4862.383599] Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 notransno&lt;br/&gt;
[ 4863.153800] Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 readonly&lt;br/&gt;
[ 4864.185346] Turning device vdb (0xfd00010) read-only&lt;br/&gt;
[ 4864.654871] Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000&lt;br/&gt;
[ 4865.082331] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000&lt;br/&gt;
[ 4865.611961] Lustre: 6652:0:(osd_internal.h:1139:osd_trans_exec_op()) lustre-MDT0000: opcode 2: before 505 &amp;lt; left 638, rollback = 2&lt;br/&gt;
[ 4865.618851] Lustre: 6652:0:(osd_internal.h:1139:osd_trans_exec_op()) Skipped 31 previous similar messages&lt;br/&gt;
[ 4865.624480] Lustre: 6652:0:(osd_handler.c:1673:osd_trans_dump_creds()) create: 3/24/3, destroy: 0/0/0&lt;br/&gt;
[ 4865.629503] Lustre: 6652:0:(osd_handler.c:1673:osd_trans_dump_creds()) Skipped 31 previous similar messages&lt;br/&gt;
[ 4865.634719] Lustre: 6652:0:(osd_handler.c:1680:osd_trans_dump_creds()) attr_set: 1/1/0, xattr_set: 9/638/0&lt;br/&gt;
[ 4865.639975] Lustre: 6652:0:(osd_handler.c:1680:osd_trans_dump_creds()) Skipped 31 previous similar messages&lt;br/&gt;
[ 4865.644902] Lustre: 6652:0:(osd_handler.c:1690:osd_trans_dump_creds()) write: 5/136/0, punch: 0/0/0, quota 3/3/0&lt;br/&gt;
[ 4865.650149] Lustre: 6652:0:(osd_handler.c:1690:osd_trans_dump_creds()) Skipped 31 previous similar messages&lt;br/&gt;
[ 4865.655072] Lustre: 6652:0:(osd_handler.c:1697:osd_trans_dump_creds()) insert: 11/199/4, delete: 0/0/0&lt;br/&gt;
[ 4865.660397] Lustre: 6652:0:(osd_handler.c:1697:osd_trans_dump_creds()) Skipped 31 previous similar messages&lt;br/&gt;
[ 4865.665815] Lustre: 6652:0:(osd_handler.c:1704:osd_trans_dump_creds()) ref_add: 5/5/0, ref_del: 0/0/0&lt;br/&gt;
[ 4865.671025] Lustre: 6652:0:(osd_handler.c:1704:osd_trans_dump_creds()) Skipped 31 previous similar messages&lt;br/&gt;
[ 4867.620546] Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds1 1 times&lt;br/&gt;
[ 4868.162595] Lustre: DEBUG MARKER: test_26 fail mds1 1 times&lt;br/&gt;
[ 4868.587024] Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1&apos; &apos; /proc/mounts || true&lt;br/&gt;
[ 4869.337772] Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1&lt;br/&gt;
[ 4873.832032] Lustre: 3863:0:(osd_internal.h:1139:osd_trans_exec_op()) lustre-MDT0000: opcode 2: before 508 &amp;lt; left 638, rollback = 2&lt;br/&gt;
[ 4873.839128] Lustre: 3863:0:(osd_internal.h:1139:osd_trans_exec_op()) Skipped 151 previous similar messages&lt;br/&gt;
[ 4873.844748] Lustre: 3863:0:(osd_handler.c:1673:osd_trans_dump_creds()) create: 3/24/1, destroy: 0/0/0&lt;br/&gt;
[ 4873.852092] Lustre: 3863:0:(osd_handler.c:1673:osd_trans_dump_creds()) Skipped 151 previous similar messages&lt;br/&gt;
[ 4873.857492] Lustre: 3863:0:(osd_handler.c:1680:osd_trans_dump_creds()) attr_set: 1/1/0, xattr_set: 9/638/0&lt;br/&gt;
[ 4873.880971] Lustre: 3863:0:(osd_handler.c:1680:osd_trans_dump_creds()) Skipped 151 previous similar messages&lt;br/&gt;
[ 4873.888084] Lustre: 3863:0:(osd_handler.c:1690:osd_trans_dump_creds()) write: 5/136/0, punch: 0/0/0, quota 3/3/0&lt;br/&gt;
[ 4873.893552] Lustre: 3863:0:(osd_handler.c:1690:osd_trans_dump_creds()) Skipped 151 previous similar messages&lt;br/&gt;
[ 4873.898667] Lustre: 3863:0:(osd_handler.c:1697:osd_trans_dump_creds()) insert: 11/199/3, delete: 0/0/0&lt;br/&gt;
[ 4873.903637] Lustre: 3863:0:(osd_handler.c:1697:osd_trans_dump_creds()) Skipped 151 previous similar messages&lt;br/&gt;
[ 4873.908995] Lustre: 3863:0:(osd_handler.c:1704:osd_trans_dump_creds()) ref_add: 5/5/0, ref_del: 0/0/0&lt;br/&gt;
[ 4873.914058] Lustre: 3863:0:(osd_handler.c:1704:osd_trans_dump_creds()) Skipped 151 previous similar messages&lt;br/&gt;
[ 4900.065841] Lustre: 4511:0:(osd_internal.h:1139:osd_trans_exec_op()) lustre-MDT0000: opcode 2: before 504 &amp;lt; left 638, rollback = 2&lt;br/&gt;
[ 4900.072969] Lustre: 4511:0:(osd_internal.h:1139:osd_trans_exec_op()) Skipped 279 previous similar messages&lt;br/&gt;
[ 4900.088107] Lustre: 4511:0:(osd_handler.c:1673:osd_trans_dump_creds()) create: 3/24/4, destroy: 0/0/0&lt;br/&gt;
[ 4900.094712] Lustre: 4511:0:(osd_handler.c:1673:osd_trans_dump_creds()) Skipped 279 previous similar messages&lt;br/&gt;
[ 4900.104952] Lustre: 4511:0:(osd_handler.c:1680:osd_trans_dump_creds()) attr_set: 1/1/0, xattr_set: 9/638/0&lt;br/&gt;
[ 4900.124421] Lustre: 4511:0:(osd_handler.c:1680:osd_trans_dump_creds()) Skipped 279 previous similar messages&lt;br/&gt;
[ 4900.138081] Lustre: 4511:0:(osd_handler.c:1690:osd_trans_dump_creds()) write: 5/136/0, punch: 0/0/0, quota 3/3/0&lt;br/&gt;
[ 4900.145762] Lustre: 4511:0:(osd_handler.c:1690:osd_trans_dump_creds()) Skipped 279 previous similar messages&lt;br/&gt;
[ 4900.151332] Lustre: 4511:0:(osd_handler.c:1697:osd_trans_dump_creds()) insert: 11/199/4, delete: 0/0/0&lt;br/&gt;
[ 4900.156757] Lustre: 4511:0:(osd_handler.c:1697:osd_trans_dump_creds()) Skipped 279 previous similar messages&lt;br/&gt;
[ 4900.163104] Lustre: 4511:0:(osd_handler.c:1704:osd_trans_dump_creds()) ref_add: 5/5/0, ref_del: 0/0/0&lt;br/&gt;
[ 4900.168387] Lustre: 4511:0:(osd_handler.c:1704:osd_trans_dump_creds()) Skipped 279 previous similar messages&lt;br/&gt;
[ 4935.827897] Lustre: 3864:0:(osd_internal.h:1139:osd_trans_exec_op()) lustre-MDT0000: opcode 2: before 501 &amp;lt; left 638, rollback = 2&lt;br/&gt;
[ 4935.837586] Lustre: 3864:0:(osd_internal.h:1139:osd_trans_exec_op()) Skipped 239 previous similar messages&lt;br/&gt;
[ 4935.845497] Lustre: 3864:0:(osd_handler.c:1673:osd_trans_dump_creds()) create: 3/24/6, destroy: 0/0/0&lt;br/&gt;
[ 4935.852328] Lustre: 3864:0:(osd_handler.c:1673:osd_trans_dump_creds()) Skipped 239 previous similar messages&lt;br/&gt;
[ 4935.859506] Lustre: 3864:0:(osd_handler.c:1680:osd_trans_dump_creds()) attr_set: 1/1/0, xattr_set: 9/638/0&lt;br/&gt;
[ 4935.866531] Lustre: 3864:0:(osd_handler.c:1680:osd_trans_dump_creds()) Skipped 239 previous similar messages&lt;br/&gt;
[ 4935.872820] Lustre: 3864:0:(osd_handler.c:1690:osd_trans_dump_creds()) write: 5/136/0, punch: 0/0/0, quota 3/3/0&lt;br/&gt;
[ 4935.879939] Lustre: 3864:0:(osd_handler.c:1690:osd_trans_dump_creds()) Skipped 239 previous similar messages&lt;br/&gt;
[ 4935.886349] Lustre: 3864:0:(osd_handler.c:1697:osd_trans_dump_creds()) insert: 11/199/5, delete: 0/0/0&lt;br/&gt;
[ 4935.892222] Lustre: 3864:0:(osd_handler.c:1697:osd_trans_dump_creds()) Skipped 239 previous similar messages&lt;br/&gt;
[ 4935.898171] Lustre: 3864:0:(osd_handler.c:1704:osd_trans_dump_creds()) ref_add: 5/5/0, ref_del: 0/0/0&lt;br/&gt;
[ 4935.903051] Lustre: 3864:0:(osd_handler.c:1704:osd_trans_dump_creds()) Skipped 239 previous similar messages&lt;br/&gt;
[ 5001.767877] Lustre: 3864:0:(osd_internal.h:1139:osd_trans_exec_op()) lustre-MDT0000: opcode 2: before 508 &amp;lt; left 638, rollback = 2&lt;br/&gt;
[ 5001.774742] Lustre: 3864:0:(osd_internal.h:1139:osd_trans_exec_op()) Skipped 695 previous similar messages&lt;br/&gt;
[ 5001.780504] Lustre: 3864:0:(osd_handler.c:1673:osd_trans_dump_creds()) create: 3/24/2, destroy: 0/0/0&lt;br/&gt;
[ 5001.785460] Lustre: 3864:0:(osd_handler.c:1673:osd_trans_dump_creds()) Skipped 695 previous similar messages&lt;br/&gt;
[ 5001.790992] Lustre: 3864:0:(osd_handler.c:1680:osd_trans_dump_creds()) attr_set: 1/1/0, xattr_set: 9/638/0&lt;br/&gt;
[ 5001.796465] Lustre: 3864:0:(osd_handler.c:1680:osd_trans_dump_creds()) Skipped 695 previous similar messages&lt;br/&gt;
[ 5001.802010] Lustre: 3864:0:(osd_handler.c:1690:osd_trans_dump_creds()) write: 5/136/0, punch: 0/0/0, quota 3/3/0&lt;br/&gt;
[ 5001.808416] Lustre: 3864:0:(osd_handler.c:1690:osd_trans_dump_creds()) Skipped 695 previous similar messages&lt;br/&gt;
[ 5001.814112] Lustre: 3864:0:(osd_handler.c:1697:osd_trans_dump_creds()) insert: 11/199/2, delete: 0/0/0&lt;br/&gt;
[ 5001.819500] Lustre: 3864:0:(osd_handler.c:1697:osd_trans_dump_creds()) Skipped 695 previous similar messages&lt;br/&gt;
[ 5001.825194] Lustre: 3864:0:(osd_handler.c:1704:osd_trans_dump_creds()) ref_add: 5/5/0, ref_del: 0/0/0&lt;br/&gt;
[ 5001.830458] Lustre: 3864:0:(osd_handler.c:1704:osd_trans_dump_creds()) Skipped 695 previous similar messages&lt;br/&gt;
[ 5043.531844] INFO: task umount:7051 blocked for more than 120 seconds.&lt;br/&gt;
[ 5043.559145] &quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.&lt;br/&gt;
[ 5043.571856] umount D ffff965a5b45eeb0 0 7051 7050 0x00000000&lt;br/&gt;
[ 5043.588676] Call Trace:&lt;br/&gt;
[ 5043.590253] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff82f13e99&amp;gt;&amp;#93;&lt;/span&gt; schedule_preempt_disabled+0x29/0x70&lt;br/&gt;
[ 5043.594280] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff82f11c57&amp;gt;&amp;#93;&lt;/span&gt; __mutex_lock_slowpath+0xc7/0x1d0&lt;br/&gt;
[ 5043.598409] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff82f1103f&amp;gt;&amp;#93;&lt;/span&gt; mutex_lock+0x1f/0x2f&lt;br/&gt;
[ 5043.601442] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffc07b93a7&amp;gt;&amp;#93;&lt;/span&gt; mgc_process_config+0x207/0x13f0 &lt;span class=&quot;error&quot;&gt;&amp;#91;mgc&amp;#93;&lt;/span&gt;&lt;br/&gt;
[ 5043.605151] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffc0829d78&amp;gt;&amp;#93;&lt;/span&gt; obd_process_config.constprop.13+0x88/0x2e0 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdclass&amp;#93;&lt;/span&gt;&lt;br/&gt;
[ 5043.612032] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffc04bdcc7&amp;gt;&amp;#93;&lt;/span&gt; ? libcfs_debug_msg+0x57/0x80 &lt;span class=&quot;error&quot;&gt;&amp;#91;libcfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
[ 5043.621123] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffc0816cf9&amp;gt;&amp;#93;&lt;/span&gt; ? lprocfs_counter_add+0xf9/0x160 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdclass&amp;#93;&lt;/span&gt;&lt;br/&gt;
[ 5043.627421] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffc082b0af&amp;gt;&amp;#93;&lt;/span&gt; lustre_end_log+0x1ff/0x550 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdclass&amp;#93;&lt;/span&gt;&lt;br/&gt;
[ 5043.630929] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffc0853d1e&amp;gt;&amp;#93;&lt;/span&gt; server_put_super+0x7de/0xcd0 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdclass&amp;#93;&lt;/span&gt;&lt;br/&gt;
[ 5043.634737] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff82a1d69d&amp;gt;&amp;#93;&lt;/span&gt; generic_shutdown_super+0x6d/0x100&lt;br/&gt;
[ 5043.638215] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff82a1da82&amp;gt;&amp;#93;&lt;/span&gt; kill_anon_super+0x12/0x20&lt;br/&gt;
[ 5043.653558] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffc0829402&amp;gt;&amp;#93;&lt;/span&gt; lustre_kill_super+0x32/0x50 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdclass&amp;#93;&lt;/span&gt;&lt;br/&gt;
[ 5043.657368] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff82a1de3e&amp;gt;&amp;#93;&lt;/span&gt; deactivate_locked_super+0x4e/0x70&lt;br/&gt;
[ 5043.660755] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff82a1e5c6&amp;gt;&amp;#93;&lt;/span&gt; deactivate_super+0x46/0x60&lt;br/&gt;
[ 5043.663775] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff82a3c57f&amp;gt;&amp;#93;&lt;/span&gt; cleanup_mnt+0x3f/0x80&lt;br/&gt;
[ 5043.666571] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff82a3c612&amp;gt;&amp;#93;&lt;/span&gt; __cleanup_mnt+0x12/0x20&lt;br/&gt;
[ 5043.679438] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff828b784b&amp;gt;&amp;#93;&lt;/span&gt; task_work_run+0xbb/0xe0&lt;br/&gt;
[ 5043.682215] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8282ac55&amp;gt;&amp;#93;&lt;/span&gt; do_notify_resume+0xa5/0xc0&lt;br/&gt;
[ 5043.685702] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff82f1fada&amp;gt;&amp;#93;&lt;/span&gt; int_signal+0x12/0x17&lt;br/&gt;
[ 5130.557643] Lustre: 3864:0:(osd_internal.h:1139:osd_trans_exec_op()) lustre-MDT0000: opcode 2: before 502 &amp;lt; left 638, rollback = 2&lt;br/&gt;
[ 5130.566402] Lustre: 3864:0:(osd_internal.h:1139:osd_trans_exec_op()) Skipped 1463 previous similar messages&lt;br/&gt;
[ 5130.572039] Lustre: 3864:0:(osd_handler.c:1673:osd_trans_dump_creds()) create: 3/24/5, destroy: 0/0/0&lt;br/&gt;
[ 5130.577152] Lustre: 3864:0:(osd_handler.c:1673:osd_trans_dump_creds()) Skipped 1463 previous similar messages&lt;br/&gt;
[ 5130.582341] Lustre: 3864:0:(osd_handler.c:1680:osd_trans_dump_creds()) attr_set: 1/1/0, xattr_set: 9/638/0&lt;br/&gt;
[ 5130.587512] Lustre: 3864:0:(osd_handler.c:1680:osd_trans_dump_creds()) Skipped 1463 previous similar messages&lt;br/&gt;
[ 5130.592725] Lustre: 3864:0:(osd_handler.c:1690:osd_trans_dump_creds()) write: 5/136/0, punch: 0/0/0, quota 3/3/0&lt;br/&gt;
[ 5130.600231] Lustre: 3864:0:(osd_handler.c:1690:osd_trans_dump_creds()) Skipped 1463 previous similar messages&lt;br/&gt;
[ 5130.605633] Lustre: 3864:0:(osd_handler.c:1697:osd_trans_dump_creds()) insert: 11/199/5, delete: 0/0/0&lt;br/&gt;
[ 5130.610566] Lustre: 3864:0:(osd_handler.c:1697:osd_trans_dump_creds()) Skipped 1463 previous similar messages&lt;br/&gt;
[ 5130.615660] Lustre: 3864:0:(osd_handler.c:1704:osd_trans_dump_creds()) ref_add: 5/5/0, ref_del: 0/0/0&lt;br/&gt;
[ 5130.620572] Lustre: 3864:0:(osd_handler.c:1704:osd_trans_dump_creds()) Skipped 1463 previous similar messages&lt;br/&gt;
[ 5136.751418] LustreError: 3878:0:(ldlm_request.c:130:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1527073823, 300s ago); not entering recovery in server code, just going back to sleep ns: MGS lock: ffff965a5c79c200/0xc539b8615eb9c76 lrc: 3/0,1 mode: --/EX res: &lt;span class=&quot;error&quot;&gt;&amp;#91;0x65727473756c:0x2:0x0&amp;#93;&lt;/span&gt;.0x0 rrc: 14 type: PLN flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 3878 timeout: 0 lvb_type: 0&lt;br/&gt;
[ 5136.786410] LustreError: dumping log to /tmp/lustre-log.1527074123.3878&lt;br/&gt;
[ 5143.791753] Lustre: MGS: Received new LWP connection from 10.0.2.38@tcp, removing former export from same NID&lt;br/&gt;
[ 5143.797868] Lustre: MGS: Connection restored to bc8f32af-75b1-d99f-ee7c-157fd1d1987f (at 10.0.2.38@tcp)&lt;br/&gt;
[ 5143.803193] Lustre: Skipped 113 previous similar messages&lt;br/&gt;
[ 5151.286402] LustreError: 166-1: MGC10.0.2.48@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail&lt;br/&gt;
[ 5151.293159] LustreError: 3862:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1527073837, 300s ago), entering recovery for MGS@10.0.2.48@tcp ns: MGC10.0.2.48@tcp lock: ffff965a5d519a00/0xc539b8615eb9d25 lrc: 4/1,0 mode: --/CR res: &lt;span class=&quot;error&quot;&gt;&amp;#91;0x65727473756c:0x2:0x0&amp;#93;&lt;/span&gt;.0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0xc539b8615eb9d2c expref: -99 pid: 3862 timeout: 0 lvb_type: 0&lt;br/&gt;
[ 5151.293512] Lustre: MGS: Received new LWP connection from 0@lo, removing former export from same NID&lt;br/&gt;
[ 5151.293514] Lustre: Skipped 2 previous similar messages&lt;br/&gt;
[ 5151.294777] LustreError: 9613:0:(ldlm_resource.c:1100:ldlm_resource_complain()) MGC10.0.2.48@tcp: namespace resource &lt;span class=&quot;error&quot;&gt;&amp;#91;0x65727473756c:0x2:0x0&amp;#93;&lt;/span&gt;.0x0 (ffff965a7ac16840) refcount nonzero (2) after lock cleanup; forcing cleanup.&lt;br/&gt;
[ 5151.294782] LustreError: 9613:0:(ldlm_resource.c:1682:ldlm_resource_dump()) &amp;#8212; Resource: &lt;span class=&quot;error&quot;&gt;&amp;#91;0x65727473756c:0x2:0x0&amp;#93;&lt;/span&gt;.0x0 (ffff965a7ac16840) refcount = 3&lt;br/&gt;
[ 5151.294785] LustreError: 9613:0:(ldlm_resource.c:1703:ldlm_resource_dump()) Waiting locks:&lt;br/&gt;
[ 5151.294793] LustreError: 9613:0:(ldlm_resource.c:1705:ldlm_resource_dump()) ### ### ns: MGC10.0.2.48@tcp lock: ffff965a5d519a00/0xc539b8615eb9d25 lrc: 4/1,0 mode: --/CR res: &lt;span class=&quot;error&quot;&gt;&amp;#91;0x65727473756c:0x2:0x0&amp;#93;&lt;/span&gt;.0x0 rrc: 4 type: PLN flags: 0x1106400000000 nid: local remote: 0xc539b8615eb9d2c expref: -99 pid: 3862 timeout: 0 lvb_type: 0&lt;br/&gt;
[ 5151.358516] Lustre: Failing over lustre-MDT0000&lt;br/&gt;
[ 5151.360959] Lustre: Skipped 8 previous similar messages&lt;br/&gt;
[ 5151.410257] Lustre: lustre-MDT0000: Not available for connect from 10.0.2.51@tcp (stopping)&lt;br/&gt;
[ 5151.416502] Lustre: Skipped 1 previous similar message&lt;br/&gt;
[ 5153.740590] LustreError: 726:0:(client.c:1166:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff965a5ffdc000 x1601249971853200/t0(0) o13-&amp;gt;lustre-OST0001-osc-MDT0000@10.0.2.32@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1&lt;br/&gt;
[ 5153.759784] LustreError: 726:0:(client.c:1166:ptlrpc_import_delay_req()) Skipped 1 previous similar message&lt;br/&gt;
[ 5154.332546] LustreError: 727:0:(client.c:1166:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff965a5ffdc4c0 x1601249971853216/t0(0) o13-&amp;gt;lustre-OST0000-osc-MDT0000@10.0.2.39@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1&lt;br/&gt;
[ 5157.129226] BUG: unable to handle kernel NULL pointer dereference at 000000000000001c&lt;br/&gt;
[ 5157.134863] IP: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffc11676ce&amp;gt;&amp;#93;&lt;/span&gt; ldlm_process_plain_lock+0x6e/0xb30 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
[ 5157.139884] PGD 0 &lt;br/&gt;
[ 5157.141267] Oops: 0000 &lt;a href=&quot;#1&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;1&lt;/a&gt; SMP &lt;br/&gt;
[ 5157.143341] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) mbcache jbd2 zfs(POE) zunicode(POE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ppdev pcspkr sg joydev i2c_piix4 virtio_balloon parport_pc i2c_core parport nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sr_mod cdrom ata_generic pata_acpi virtio_scsi virtio_blk virtio_net ata_piix serio_raw virtio_pci libata virtio_ring virtio floppy &lt;span class=&quot;error&quot;&gt;&amp;#91;last unloaded: libcfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
[ 5157.212014] CPU: 0 PID: 9614 Comm: ldlm_bl_03 Kdump: loaded Tainted: P OE ------------ 3.10.0-862.2.3.el7_lustre.2.10.4_33_gd942416.ddn0.x86_64 #1&lt;br/&gt;
[ 5157.226568] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011&lt;br/&gt;
[ 5157.244021] task: ffff965a79d25ee0 ti: ffff965a5d4f4000 task.ti: ffff965a5d4f4000&lt;br/&gt;
[ 5157.248225] RIP: 0010:&lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffc11676ce&amp;gt;&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffc11676ce&amp;gt;&amp;#93;&lt;/span&gt; ldlm_process_plain_lock+0x6e/0xb30 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
[ 5157.253914] RSP: 0018:ffff965a5d4f7be0 EFLAGS: 00010287&lt;br/&gt;
[ 5157.256926] RAX: 0000000000000000 RBX: ffff965a5c79d000 RCX: 0000000000000000&lt;br/&gt;
[ 5157.260710] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff965a7a78e200&lt;br/&gt;
[ 5157.264452] RBP: ffff965a5d4f7c58 R08: 0000000000000001 R09: 0000000000000000&lt;br/&gt;
[ 5157.268412] R10: ffff965a7cc5a628 R11: 000000000000000f R12: ffff965a5888de80&lt;br/&gt;
[ 5157.272403] R13: 0000000000000002 R14: ffff965a5d4f7cd0 R15: ffff965a5c79d060&lt;br/&gt;
[ 5157.286857] FS: 0000000000000000(0000) GS:ffff965a7fc00000(0000) knlGS:0000000000000000&lt;br/&gt;
[ 5157.291044] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b&lt;br/&gt;
[ 5157.294048] CR2: 000000000000001c CR3: 000000000a20e000 CR4: 00000000000006f0&lt;br/&gt;
[ 5157.299570] Call Trace:&lt;br/&gt;
[ 5157.301199] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffc04bdcc7&amp;gt;&amp;#93;&lt;/span&gt; ? libcfs_debug_msg+0x57/0x80 &lt;span class=&quot;error&quot;&gt;&amp;#91;libcfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
[ 5157.305503] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffc1167660&amp;gt;&amp;#93;&lt;/span&gt; ? ldlm_errno2error+0x60/0x60 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
[ 5157.309668] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffc11529ed&amp;gt;&amp;#93;&lt;/span&gt; ldlm_reprocess_queue+0x13d/0x2a0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
[ 5157.313748] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffc115358d&amp;gt;&amp;#93;&lt;/span&gt; __ldlm_reprocess_all+0x14d/0x3a0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
[ 5157.336428] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffc1153b46&amp;gt;&amp;#93;&lt;/span&gt; ldlm_reprocess_res+0x26/0x30 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
[ 5157.339825] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffc04c3d70&amp;gt;&amp;#93;&lt;/span&gt; cfs_hash_for_each_relax+0x250/0x450 &lt;span class=&quot;error&quot;&gt;&amp;#91;libcfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
[ 5157.343689] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffc1153b20&amp;gt;&amp;#93;&lt;/span&gt; ? ldlm_lock_downgrade+0x320/0x320 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
[ 5157.347679] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffc1153b20&amp;gt;&amp;#93;&lt;/span&gt; ? ldlm_lock_downgrade+0x320/0x320 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
[ 5157.351716] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffc04c7105&amp;gt;&amp;#93;&lt;/span&gt; cfs_hash_for_each_nolock+0x75/0x1c0 &lt;span class=&quot;error&quot;&gt;&amp;#91;libcfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
[ 5157.355477] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffc1153b8c&amp;gt;&amp;#93;&lt;/span&gt; ldlm_reprocess_recovery_done+0x3c/0x110 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
[ 5157.359889] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffc115483c&amp;gt;&amp;#93;&lt;/span&gt; ldlm_export_cancel_locks+0x11c/0x130 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
[ 5157.363807] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffc117dda8&amp;gt;&amp;#93;&lt;/span&gt; ldlm_bl_thread_main+0x4c8/0x700 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
[ 5157.367533] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff828cee80&amp;gt;&amp;#93;&lt;/span&gt; ? wake_up_state+0x20/0x20&lt;br/&gt;
[ 5157.370635] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffc117d8e0&amp;gt;&amp;#93;&lt;/span&gt; ? ldlm_handle_bl_callback+0x410/0x410 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
[ 5157.375198] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff828bae31&amp;gt;&amp;#93;&lt;/span&gt; kthread+0xd1/0xe0&lt;br/&gt;
[ 5157.378054] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff828bad60&amp;gt;&amp;#93;&lt;/span&gt; ? insert_kthread_work+0x40/0x40&lt;br/&gt;
[ 5157.381485] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff82f1f5f7&amp;gt;&amp;#93;&lt;/span&gt; ret_from_fork_nospec_begin+0x21/0x21&lt;br/&gt;
[ 5157.385004] &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff828bad60&amp;gt;&amp;#93;&lt;/span&gt; ? insert_kthread_work+0x40/0x40&lt;br/&gt;
[ 5157.388502] Code: 89 45 a0 74 0d f6 05 33 5f 37 ff 01 0f 85 34 06 00 00 8b 83 98 00 00 00 39 83 9c 00 00 00 89 45 b8 0f 84 57 09 00 00 48 8b 45 a0 &amp;lt;8b&amp;gt; 40 1c 85 c0 0f 84 7a 09 00 00 48 8b 4d a0 48 89 c8 48 83 c0 &lt;br/&gt;
[ 5157.400499] RIP &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffc11676ce&amp;gt;&amp;#93;&lt;/span&gt; ldlm_process_plain_lock+0x6e/0xb30 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
[ 5157.404838] RSP &amp;lt;ffff965a5d4f7be0&amp;gt;&lt;br/&gt;
[ 5157.410691] CR2: 000000000000001c&lt;/p&gt;</comment>
                            <comment id="283109" author="javed" created="Fri, 23 Oct 2020 07:17:31 +0000"  >&lt;p&gt;just for info, NCI is no longer following this ticket. ok to close from our perspective.&lt;/p&gt;</comment>
                            <comment id="308734" author="haaknci" created="Thu, 29 Jul 2021 03:54:37 +0000"  >&lt;p&gt;Actually, its not. We&apos;re having this issue again.&lt;/p&gt;</comment>
                            <comment id="308746" author="emoly.liu" created="Thu, 29 Jul 2021 08:26:04 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=haaknci&quot; class=&quot;user-hover&quot; rel=&quot;haaknci&quot;&gt;haaknci&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;Could you please provide the lustre version you&apos;re using this time and any information about the testing environment and operations to hit this issue ?&lt;/p&gt;

&lt;p&gt;Thanks.&lt;/p&gt;</comment>
                            <comment id="308869" author="haaknci" created="Fri, 30 Jul 2021 02:15:27 +0000"  >&lt;p&gt;Lustre 2.10.8 Server. &lt;br/&gt;
Lustre 2.12 clients&lt;/p&gt;

&lt;p&gt;Currently we see this issues on the MGS when unmounting. We are unmounting the MGS to deal with the fact that it has become unstable and is throwing the following errors&#160;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
Jul 26 11:47:54 g4-mds01 kernel: LNet: 40488:0:(o2iblnd_cb.c:3202:kiblnd_check_conns()) Timed out tx &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; 10.141.1.87@o2ib4: 18 seconds
Jul 26 11:48:36 g4-mds01 kernel: LustreError: 166-1: MGC10.141.1.41@o2ib4: Connection to MGS (at 0@lo) was lost; in progress operations using &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; service will fail
Jul 26 11:48:36 g4-mds01 kernel: LustreError: 208265:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1627263816, 300s ago), entering recovery &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; MGS@MGC10.141.1.41@o2ib4_0 ns: MGC10.141.1.41@o2ib4 lock: ffff9f7c79309c00/0xa65add00b40c1243 lrc: 4/1,0 mode: --/CR res: [0x346174616467:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0xa65add00b40c124a expref: -99 pid: 208265 timeout: 0 lvb_type: 0
Jul 26 11:48:36 g4-mds01 kernel: LustreError: 219839:0:(ldlm_resource.c:1101:ldlm_resource_complain()) MGC10.141.1.41@o2ib4: namespace resource [0x346174616467:0x2:0x0].0x0 (ffff9f745c79e3c0) refcount nonzero (2) after lock cleanup; forcing cleanup.Jul 26 11:48:36 g4-mds01 kernel: LustreError: 219839:0:(ldlm_resource.c:1683:ldlm_resource_dump()) --- Resource: [0x346174616467:0x2:0x0].0x0 (ffff9f745c79e3c0) refcount = 3
Jul 26 11:48:36 g4-mds01 kernel: LustreError: 219839:0:(ldlm_resource.c:1704:ldlm_resource_dump()) Waiting locks:
Jul 26 11:48:36 g4-mds01 kernel: LustreError: 219839:0:(ldlm_resource.c:1706:ldlm_resource_dump()) ### ### ns: MGC10.141.1.41@o2ib4 lock: ffff9f7c79309c00/0xa65add00b40c1243 lrc: 4/1,0 mode: --/CR res: [0x346174616467:0x2:0x0].0x0 rrc: 4 type: PLN flags: 0x1106400000000 nid: local remote: 0xa65add00b40c124a expref: -99 pid: 208265 timeout: 0 lvb_type: 0&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The workaround for the above issue is to umount/remount the mgs. Currently unmounting the MGS leads to a livelocked machine.&lt;/p&gt;</comment>
                            <comment id="309047" author="emoly.liu" created="Mon, 2 Aug 2021 08:12:46 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=haaknci&quot; class=&quot;user-hover&quot; rel=&quot;haaknci&quot;&gt;haaknci&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;Could you please collect some lustre logs on MGS by the following steps and&#160;upload here?&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;lctl set_param debug=-1 debug_mb=1024(or other value in MB)&lt;/li&gt;
	&lt;li&gt;lctl dk &amp;gt; /dev/null&lt;/li&gt;
	&lt;li&gt;reproduce this issue&lt;/li&gt;
	&lt;li&gt;lctl dk &amp;gt; mgs_panic.log&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Thanks.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="50704">LU-10635</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="29846" name="vmcore-dmesg-oak-md1-s1-2018-03-16-06-47-23.txt" size="1026754" author="sthiell" created="Fri, 16 Mar 2018 14:09:44 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzpfr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>