<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:28:41 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-9725] Mount commands don&apos;t return for targets in LFS with DNE and 3 MDTs </title>
                <link>https://jira.whamcloud.com/browse/LU-9725</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;kernel version: 3.10.0-514.21.1.el7_lustre.x86_64&lt;br/&gt;
lustre version: 2.10.0_RC1-1.el7&lt;br/&gt;
OS: CentOS Linux release 7.3.1611 (Core)&lt;/p&gt;

&lt;p&gt;Failure consistently occurs in test_filesystem_dne.py test_md0_undeleteable() during IML SSI automated test runs testing against lustre b2.10&lt;/p&gt;

&lt;p&gt;This is the only test we have which creates a filesystem with 3 MDTs&lt;/p&gt;

&lt;p&gt;On recreating LFS (outside of test infrastructure) in a similar configuration with mgs, 3*mdts and 1 ost through IML, all other targets mount commands return successfully but ost mount command never returns.&lt;/p&gt;

&lt;p&gt;During when the MDT mount commands are being issued, lots of activity in the kernel messages log including multiple LustreErrors and stack traces, warnings of high cpu usage and then&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;kernel:NMI watchdog: BUG: soft lockup - CPU#1 stuck &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; 22s! [lwp_notify_fs1-:13630]&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This is on a LDISKF only lfs with DNE enabled. The OST mount command used is as follows and the MDT mount commands are of a similar format:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;mount -t lustre /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_disk5 /mnt/fs1-OST0000&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The following gists show excerpts from the /var/log/messages log during instances of this type of failure (MDT mounting in DNE):&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://gist.github.com/tanabarr/1adb35a7e7da2581be79df8f45417411&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://gist.github.com/tanabarr/1adb35a7e7da2581be79df8f45417411&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://gist.github.com/tanabarr/70d3bfa66c4fc474b82c7c02adcda511&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://gist.github.com/tanabarr/70d3bfa66c4fc474b82c7c02adcda511&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://gist.github.com/tanabarr/9f54584621aacfdeb3899f59687cb918&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://gist.github.com/tanabarr/9f54584621aacfdeb3899f59687cb918&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The last gist link is an extended excerpt giving more contextual log information regarding the attempted mounting of the MDTs and the subsequent CPU load warnings. The entire logfile for that failure instance (in addition to other IML related log files) is attached to this ticket.&lt;/p&gt;

&lt;p&gt;original IML ticket: &lt;a href=&quot;https://github.com/intel-hpdd/intel-manager-for-lustre/issues/108&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/intel-hpdd/intel-manager-for-lustre/issues/108&lt;/a&gt;&lt;/p&gt;</description>
                <environment></environment>
        <key id="46982">LU-9725</key>
            <summary>Mount commands don&apos;t return for targets in LFS with DNE and 3 MDTs </summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="laisiyao">Lai Siyao</assignee>
                                    <reporter username="tanabarr">Tom Nabarro</reporter>
                        <labels>
                    </labels>
                <created>Fri, 30 Jun 2017 11:07:13 +0000</created>
                <updated>Wed, 19 Dec 2018 16:43:15 +0000</updated>
                            <resolved>Mon, 14 Aug 2017 05:03:07 +0000</resolved>
                                    <version>Lustre 2.10.0</version>
                                    <fixVersion>Lustre 2.10.1</fixVersion>
                    <fixVersion>Lustre 2.11.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>9</watches>
                                                                            <comments>
                            <comment id="200738" author="pjones" created="Fri, 30 Jun 2017 17:32:58 +0000"  >&lt;p&gt;Niu&lt;/p&gt;

&lt;p&gt;Can you please advise on this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="201513" author="brian" created="Mon, 10 Jul 2017 13:13:21 +0000"  >&lt;p&gt;Here&apos;s the stuck thread from a full&#160;&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/27555/27555_sysrq-t&quot; title=&quot;sysrq-t attached to LU-9725&quot;&gt;sysrq-t&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt; log from a node in such a state:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[65093.076421] ll_mgs_0002     D ffff880023613a30     0 28707      2 0x00000080
[65093.077960]  ffff8800236139e8 0000000000000046 ffff880079ef4e70 ffff880023613fd8
[65093.079596]  ffff880023613fd8 ffff880023613fd8 ffff880079ef4e70 ffff8800366b8800
[65093.081208]  000000000000000e ffff8800366b8890 ffff8800366b8828 ffff880023613a30
[65093.082782] Call Trace:
[65093.083839]  [&amp;lt;ffffffff8168c849&amp;gt;] schedule+0x29/0x70
[65093.085105]  [&amp;lt;ffffffffa0f01875&amp;gt;] jbd2_log_wait_commit+0xc5/0x140 [jbd2]
[65093.086556]  [&amp;lt;ffffffff810b1b20&amp;gt;] ? wake_up_atomic_t+0x30/0x30
[65093.087929]  [&amp;lt;ffffffffa0efa4d3&amp;gt;] jbd2_journal_stop+0x343/0x3d0 [jbd2]
[65093.089347]  [&amp;lt;ffffffffa0f9210b&amp;gt;] ? __ldiskfs_handle_dirty_metadata+0x8b/0x220 [ldiskfs]
[65093.090905]  [&amp;lt;ffffffffa0efaf02&amp;gt;] ? jbd2_journal_get_write_access+0x32/0x40 [jbd2]
[65093.092447]  [&amp;lt;ffffffffa0f91c5c&amp;gt;] __ldiskfs_journal_stop+0x3c/0xb0 [ldiskfs]
[65093.093918]  [&amp;lt;ffffffffa0ab532e&amp;gt;] osd_trans_stop+0x18e/0x830 [osd_ldiskfs]
[65093.095407]  [&amp;lt;ffffffffa0acddfb&amp;gt;] ? osd_write+0x15b/0x5b0 [osd_ldiskfs]
[65093.096956]  [&amp;lt;ffffffffa087ec73&amp;gt;] ? lu_context_init+0xd3/0x1f0 [obdclass]
[65093.098633]  [&amp;lt;ffffffffa0b41262&amp;gt;] mgs_ir_update+0x2e2/0xb70 [mgs]
[65093.100328]  [&amp;lt;ffffffffa0b21d6f&amp;gt;] mgs_target_reg+0x77f/0x1370 [mgs]
[65093.102179]  [&amp;lt;ffffffffa107ce7f&amp;gt;] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc]
[65093.104399]  [&amp;lt;ffffffffa107d001&amp;gt;] ? lustre_pack_reply+0x11/0x20 [ptlrpc]
[65093.106275]  [&amp;lt;ffffffffa10de895&amp;gt;] tgt_request_handle+0x915/0x1360 [ptlrpc]
[65093.108552]  [&amp;lt;ffffffffa1088133&amp;gt;] ptlrpc_server_handle_request+0x233/0xa90 [ptlrpc]
[65093.110875]  [&amp;lt;ffffffffa1085928&amp;gt;] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
[65093.112320]  [&amp;lt;ffffffff810c54f2&amp;gt;] ? default_wake_function+0x12/0x20
[65093.113727]  [&amp;lt;ffffffff810ba628&amp;gt;] ? __wake_up_common+0x58/0x90
[65093.115111]  [&amp;lt;ffffffffa108c110&amp;gt;] ptlrpc_main+0xaa0/0x1dd0 [ptlrpc]
[65093.116585]  [&amp;lt;ffffffffa108b670&amp;gt;] ? ptlrpc_register_service+0xe30/0xe30 [ptlrpc]
[65093.118178]  [&amp;lt;ffffffff810b0a4f&amp;gt;] kthread+0xcf/0xe0
[65093.119471]  [&amp;lt;ffffffff810b0980&amp;gt;] ? kthread_create_on_node+0x140/0x140
[65093.120843]  [&amp;lt;ffffffff81697798&amp;gt;] ret_from_fork+0x58/0x90
[65093.122169]  [&amp;lt;ffffffff810b0980&amp;gt;] ? kthread_create_on_node+0x140/0x140


&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="201530" author="niu" created="Mon, 10 Jul 2017 15:27:52 +0000"  >&lt;p&gt;The stack trace posted by Brian indicates a MGS thread is waiting for journal commit, it&apos;s not necessary a problem if it didn&apos;t wait forever.&lt;/p&gt;

&lt;p&gt;As for the CPU softlock up problem:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: NMI watchdog: BUG: soft lockup - CPU#1 stuck &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; 23s! [lwp_notify_fs1-:13630]
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) spl(OE) zlib_deflate lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache libcfs(OE) snd_intel8x0 snd_ac97_codec ppdev ac97_bus snd_seq snd_seq_device sg pcspkr virtio_balloon snd_pcm parport_pc parport snd_timer snd soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic pata_acpi virtio_blk virtio_net virtio_scsi cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm serio_raw floppy drm virtio_pci virtio_ring
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: virtio ata_piix libata i2c_core dm_mirror dm_region_hash dm_log dm_mod
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: CPU: 1 PID: 13630 Comm: lwp_notify_fs1- Tainted: P           OEL ------------   3.10.0-514.21.1.el7_lustre.x86_64 #1
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: task: ffff880078071f60 ti: ffff8800388b8000 task.ti: ffff8800388b8000
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: RIP: 0010:[&amp;lt;ffffffff81327649&amp;gt;]  [&amp;lt;ffffffff81327649&amp;gt;] __write_lock_failed+0x9/0x20
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: RSP: 0018:ffff8800388bbe40  EFLAGS: 00000297
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: RAX: ffff880038e1f800 RBX: ffff8800388bbe18 RCX: 0000000000000000
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: RDX: 000000000000002e RSI: ffff88003c4c3ec4 RDI: ffff880046ca1384
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: RBP: ffff8800388bbe40 R08: 0000000000019b20 R09: ffffffffa065a9a1
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: R10: ffff88007fd19b20 R11: ffffea0000e21500 R12: ffff8800388bbdd0
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: R13: ffff880044384d80 R14: ffff8800388bbe18 R15: 0000000000000028
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: FS:  0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: CR2: 00007ff777278f30 CR3: 00000000019be000 CR4: 00000000000006e0
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: Stack:
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: ffff8800388bbe50 ffffffff8168e827 ffff8800388bbe70 ffffffffa0f4795a
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: ffff88003c4c3e80 ffff880038e1fc00 ffff8800388bbe98 ffffffffa0674dda
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: ffff880038e1fc00 ffff880035b1f900 ffff880035b1f9b0 ffff8800388bbec0
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: Call Trace:
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: [&amp;lt;ffffffff8168e827&amp;gt;] _raw_write_lock+0x17/0x20
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: [&amp;lt;ffffffffa0f4795a&amp;gt;] qsd_conn_callback+0x5a/0x160 [lquota]
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: [&amp;lt;ffffffffa0674dda&amp;gt;] lustre_notify_lwp_list+0xba/0x100 [obdclass]
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: [&amp;lt;ffffffffa1385af6&amp;gt;] lwp_notify_main+0x56/0xc0 [osp]
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: [&amp;lt;ffffffffa1385aa0&amp;gt;] ? lwp_import_event+0xb0/0xb0 [osp]
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: [&amp;lt;ffffffff810b0a4f&amp;gt;] kthread+0xcf/0xe0
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: [&amp;lt;ffffffff810b0980&amp;gt;] ? kthread_create_on_node+0x140/0x140
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: [&amp;lt;ffffffff81697798&amp;gt;] ret_from_fork+0x58/0x90
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: [&amp;lt;ffffffff810b0980&amp;gt;] ? kthread_create_on_node+0x140/0x140
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Looks the thread is blocked on a spinlock in qsd_conn_callback()&lt;del&gt;&amp;gt;write_lock(&amp;amp;qsd&lt;/del&gt;&amp;gt;qsd_lock), I&apos;ll investigate it more.&lt;/p&gt;</comment>
                            <comment id="201629" author="gerrit" created="Tue, 11 Jul 2017 02:28:59 +0000"  >&lt;p&gt;Niu Yawei (yawei.niu@intel.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/27987&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/27987&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9725&quot; title=&quot;Mount commands don&amp;#39;t return for targets in LFS with DNE and 3 MDTs &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9725&quot;&gt;&lt;del&gt;LU-9725&lt;/del&gt;&lt;/a&gt; lwp: wait on deregister&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: a21f2bbb8b7a7c76528d461e06072b5b9759be43&lt;/p&gt;</comment>
                            <comment id="202013" author="brian" created="Thu, 13 Jul 2017 14:05:33 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=niu&quot; class=&quot;user-hover&quot; rel=&quot;niu&quot;&gt;niu&lt;/a&gt; Could you add a brief summary to this ticket about what conditions cause this bug to happen? &#160;The description seems to suggest that just a simple-ish configuration of 1 MGS, 3 MDTs and 1 OST will cause this to happen when the OST is started. &#160;Is there something more subtle about this configuration that is causing this bug that we can try to avoid, so as to not hit it?&lt;/p&gt;</comment>
                            <comment id="202017" author="niu" created="Thu, 13 Jul 2017 14:24:13 +0000"  >&lt;p&gt;There is a race that can result in cpu hang on qsd_conn_callback() when start/shutdown servers, it&apos;s not configuration related.&lt;br/&gt;
I presume this is a rare issue that can&apos;t be steadily reproduced, am I right?&lt;/p&gt;</comment>
                            <comment id="202065" author="brian" created="Thu, 13 Jul 2017 18:56:58 +0000"  >&lt;p&gt;We can reproduce it fairly&#160;frequently.&lt;/p&gt;

&lt;p&gt;Frequently enough that we had to disable a few tests that were fairly reliably reproducing it.&lt;/p&gt;</comment>
                            <comment id="202092" author="niu" created="Fri, 14 Jul 2017 01:23:28 +0000"  >&lt;p&gt;Could you share with me what kind of tests can reliably reproduce it? Does the problem always happen when start/stop server? Could you help to verify if the patch really solve the problem? Thanks in advance.&lt;/p&gt;</comment>
                            <comment id="202585" author="gerrit" created="Wed, 19 Jul 2017 03:29:17 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/27987/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/27987/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9725&quot; title=&quot;Mount commands don&amp;#39;t return for targets in LFS with DNE and 3 MDTs &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9725&quot;&gt;&lt;del&gt;LU-9725&lt;/del&gt;&lt;/a&gt; lwp: wait on deregister&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 5d5702a3ec24cd1bc7effbadb13d272fa51dff05&lt;/p&gt;</comment>
                            <comment id="202641" author="pjones" created="Wed, 19 Jul 2017 03:47:10 +0000"  >&lt;p&gt;So, Niu&apos;s patch has landed but have we confirmed that this fix meets the needs of the reporter?&lt;/p&gt;</comment>
                            <comment id="202702" author="brian" created="Wed, 19 Jul 2017 11:10:37 +0000"  >&lt;blockquote&gt;
&lt;p&gt;have we confirmed that this fix meets the needs of the reporter&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Not yet. It&apos;s on our TODO list but there are some things that have to happen first before we are able to test IML with a review build. &#160;I&apos;m working on those right now.&lt;/p&gt;</comment>
                            <comment id="202709" author="pjones" created="Wed, 19 Jul 2017 12:06:35 +0000"  >&lt;p&gt;If we expedite landing it to b2_10 would that help?&lt;/p&gt;</comment>
                            <comment id="202713" author="brian" created="Wed, 19 Jul 2017 12:29:47 +0000"  >&lt;p&gt;Might not be necessary. &#160;I might have enough pieces in place today to be able to install a review build of Lustre in IML.&lt;/p&gt;</comment>
                            <comment id="202844" author="brian" created="Thu, 20 Jul 2017 00:03:40 +0000"  >&lt;p&gt;I&apos;m afraid I have to renege. &#160;We need the client support that is in the stack of reviews that I have outstanding in order to get a successful IML installation.&lt;/p&gt;

&lt;p&gt;So verifying this issue is going to be blocked by getting that stack landed.&lt;/p&gt;</comment>
                            <comment id="203008" author="gerrit" created="Thu, 20 Jul 2017 23:49:21 +0000"  >&lt;p&gt;Jian Yu (jian.yu@intel.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/28161&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/28161&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9725&quot; title=&quot;Mount commands don&amp;#39;t return for targets in LFS with DNE and 3 MDTs &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9725&quot;&gt;&lt;del&gt;LU-9725&lt;/del&gt;&lt;/a&gt; lwp: wait on deregister&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_10&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 183c78264bb067d49df9b76901a67ab631c2d751&lt;/p&gt;</comment>
                            <comment id="203099" author="gerrit" created="Fri, 21 Jul 2017 17:06:13 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/28161/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/28161/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9725&quot; title=&quot;Mount commands don&amp;#39;t return for targets in LFS with DNE and 3 MDTs &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9725&quot;&gt;&lt;del&gt;LU-9725&lt;/del&gt;&lt;/a&gt; lwp: wait on deregister&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_10&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: ab0d0c35b3de894f38f8941380d137848751c6eb&lt;/p&gt;</comment>
                            <comment id="203101" author="pjones" created="Fri, 21 Jul 2017 17:16:02 +0000"  >&lt;p&gt;Brian&lt;/p&gt;

&lt;p&gt;Are you now able to verify this fix?&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="203119" author="brian" created="Fri, 21 Jul 2017 19:23:12 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=pjones&quot; class=&quot;user-hover&quot; rel=&quot;pjones&quot;&gt;pjones&lt;/a&gt;: Not yet I&apos;m afraid. &#160;We need the stack of 3 packaging patches in combination with this fix to get a functional IML (with Lustre 2.10) system up with which to test the fix for this ticket. &#160;Those three patches look (at least maybe only partially contentiously) in progress so I am hopeful.&lt;/p&gt;</comment>
                            <comment id="203182" author="mdiep" created="Sat, 22 Jul 2017 04:05:46 +0000"  >&lt;p&gt;landed in 2.11&lt;/p&gt;</comment>
                            <comment id="203519" author="brian" created="Tue, 25 Jul 2017 19:10:40 +0000"  >&lt;p&gt;I&apos;m afraid I don&apos;t think the patch fixed the problem.&lt;/p&gt;

&lt;p&gt;Using:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Lustre: Build Version: 2.10.0_5_gbb3c407
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;which I believe should have the fix in it I&apos;m still seeing:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[ 1604.696255] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [lwp_notify_test:25841]
[ 1604.703478] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_zfs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) spl(OE) zlib_deflate lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device ppdev snd_pcm sg snd_timer pcspkr virtio_balloon snd i2c_piix4 soundcore parport_pc parport nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic pata_acpi virtio_blk virtio_net cirrus drm_kms_helper virtio_scsi syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix libata i2c_core serio_raw floppy virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod
[ 1604.785347] CPU: 0 PID: 25841 Comm: lwp_notify_test Tainted: P OE ------------ 3.10.0-514.21.1.el7_lustre.x86_64 #1
[ 1604.798164] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 1604.804864] task: ffff88001ac3edd0 ti: ffff88001a204000 task.ti: ffff88001a204000
[ 1604.812330] RIP: 0010:[&amp;lt;ffffffff81327649&amp;gt;] [&amp;lt;ffffffff81327649&amp;gt;] __write_lock_failed+0x9/0x20
[ 1604.820634] RSP: 0018:ffff88001a207e40 EFLAGS: 00000206
[ 1604.828340] RAX: ffff880020eb5400 RBX: ffff88001a207e18 RCX: 0000000000000000
[ 1604.834839] RDX: 0000000000000016 RSI: ffff88001ce8ecc7 RDI: ffff88007a9c1984
[ 1604.842076] RBP: ffff88001a207e40 R08: 0000000000019b20 R09: ffffffffa066f9c1
[ 1604.848577] R10: ffff88007fc19b20 R11: ffffea00006d0400 R12: ffff88001a207dd0
[ 1604.854735] R13: ffff88001ad083c0 R14: ffff88001a207e18 R15: 0000000000000028
[ 1604.861290] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[ 1604.867586] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1604.873392] CR2: 00007fe96aa680e0 CR3: 000000001acfd000 CR4: 00000000000006f0
[ 1604.879558] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
[ 1604.885757] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 
[ 1604.891532] Stack: 
[ 1604.896912] ffff88001a207e50 ffffffff8168e827 ffff88001a207e70 ffffffffa0ff295a 
[ 1604.903591] ffff88001ce8ec80 ffff880020eb6400 ffff88001a207e98 ffffffffa0689eca 
[ 1604.910593] ffff880020eb6400 ffff880020a03000 ffff880020a030b0 ffff88001a207ec0 
[ 1604.916651] Call Trace: 
[ 1604.922930] [&amp;lt;ffffffff8168e827&amp;gt;] _raw_write_lock+0x17/0x20 
[ 1604.929711] [&amp;lt;ffffffffa0ff295a&amp;gt;] qsd_conn_callback+0x5a/0x160 [lquota] 
[ 1604.935637] [&amp;lt;ffffffffa0689eca&amp;gt;] lustre_notify_lwp_list+0xba/0x100 [obdclass]
[ 1604.941153] [&amp;lt;ffffffffa14d8af6&amp;gt;] lwp_notify_main+0x56/0xc0 [osp]
[ 1604.946248] [&amp;lt;ffffffffa14d8aa0&amp;gt;] ? lwp_import_event+0xb0/0xb0 [osp]
[ 1604.951715] [&amp;lt;ffffffff810b0a4f&amp;gt;] kthread+0xcf/0xe0
[ 1604.956861] [&amp;lt;ffffffff810b0980&amp;gt;] ? kthread_create_on_node+0x140/0x140
[ 1604.962093] [&amp;lt;ffffffff81697798&amp;gt;] ret_from_fork+0x58/0x90
[ 1604.967130] [&amp;lt;ffffffff810b0980&amp;gt;] ? kthread_create_on_node+0x140/0x140
[ 1604.972471] Code: 66 90 48 89 01 31 c0 66 66 90 c3 b8 f2 ff ff ff 66 66 90 c3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 f0 ff 07 f3 90 &amp;lt;83&amp;gt; 3f 01 75 f9 f0 ff 0f 75 f1 5d c3 66 66 2e 0f 1f 84 00 00 00 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="203577" author="pjones" created="Wed, 26 Jul 2017 10:22:24 +0000"  >&lt;p&gt;Lai&lt;/p&gt;

&lt;p&gt;Could you please advise on this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="204004" author="laisiyao" created="Tue, 1 Aug 2017 03:01:11 +0000"  >&lt;p&gt;Brian, is it true 2.10.0 include this fix? even tag 2.10.50 doesn&apos;t include this, can you test with master build?&lt;/p&gt;</comment>
                            <comment id="204047" author="brian" created="Tue, 1 Aug 2017 11:26:14 +0000"  >&lt;blockquote&gt;
&lt;p&gt;Brian, is it true 2.10.0 include this fix?&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;No, 2.10.0 does &lt;b&gt;not&lt;/b&gt; include this fix. b2_10 does &lt;a href=&quot;https://review.whamcloud.com/#/c/28161/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;contain this fix&lt;/a&gt; though and that&apos;s what we are testing with. You can see from &lt;a href=&quot;https://jira.hpdd.intel.com/browse/LU-9725?focusedCommentId=203519&amp;amp;page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-203519&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;the comment&lt;/a&gt; above though that we tested with 2.10.0_5_gbb3c407 which is a build with &lt;a href=&quot;https://git.hpdd.intel.com/?p=fs/lustre-release.git;a=commit;h=bb3c407a42c39bcd2bef9abd1a19ce502cf4c70d&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;this commit&lt;/a&gt; in it which is 5 commits newer than the &lt;a href=&quot;https://review.whamcloud.com/#/c/28161/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;landed patch&lt;/a&gt; for this ticket. &#160;So we definitely did test the patch from this ticket.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;even tag 2.10.50 doesn&apos;t include this, can you test with master build?&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;We cannot test with master due to issues that Jenkins has trying to be an HTTP server. But given the above, we really shouldn&apos;t need to test master given that we have tested the patch on b2_10.&lt;/p&gt;</comment>
                            <comment id="204073" author="laisiyao" created="Tue, 1 Aug 2017 15:13:37 +0000"  >&lt;p&gt;Thanks Brian, I&apos;ll continue checking the code.&lt;/p&gt;</comment>
                            <comment id="204076" author="brian" created="Tue, 1 Aug 2017 15:29:44 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=laisiyao&quot; class=&quot;user-hover&quot; rel=&quot;laisiyao&quot;&gt;laisiyao&lt;/a&gt;: No problem.  Let me know if there is anything else I can do to help.&lt;/p&gt;</comment>
                            <comment id="204441" author="gerrit" created="Fri, 4 Aug 2017 15:19:54 +0000"  >&lt;p&gt;Lai Siyao (lai.siyao@intel.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/28356&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/28356&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9725&quot; title=&quot;Mount commands don&amp;#39;t return for targets in LFS with DNE and 3 MDTs &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9725&quot;&gt;&lt;del&gt;LU-9725&lt;/del&gt;&lt;/a&gt; quota: always deregister lwp&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 1a5df6bc53e8356d8ae83d4031ea4397ebc03af3&lt;/p&gt;</comment>
                            <comment id="204445" author="laisiyao" created="Fri, 4 Aug 2017 15:25:02 +0000"  >&lt;p&gt;Brian, I uploaded a patch, could you help verify it?&lt;/p&gt;</comment>
                            <comment id="204449" author="gerrit" created="Fri, 4 Aug 2017 15:33:07 +0000"  >&lt;p&gt;Brian J. Murrell (brian.murrell@intel.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/28357&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/28357&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9725&quot; title=&quot;Mount commands don&amp;#39;t return for targets in LFS with DNE and 3 MDTs &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9725&quot;&gt;&lt;del&gt;LU-9725&lt;/del&gt;&lt;/a&gt; quota: always deregister lwp&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_10&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 728454ee8f85d3074124933d0d83e42f10515500&lt;/p&gt;</comment>
                            <comment id="204451" author="brian" created="Fri, 4 Aug 2017 15:38:29 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=laisiyao&quot; class=&quot;user-hover&quot; rel=&quot;laisiyao&quot;&gt;laisiyao&lt;/a&gt;: Sure. &#160;I will do a cherry-pick to b2_10 and test from there.&lt;/p&gt;</comment>
                            <comment id="204514" author="simmonsja" created="Fri, 4 Aug 2017 20:29:45 +0000"  >&lt;p&gt;I just tested this patch and this is the bug that preventing my debugfs port. Due to lwp not being totally unregistered the debugfs kobjects were not being freed so when it attempted to mount the second time the MDT it  would fail due to the debugfs files already existing. You can&apos;t register debugfs file twice.&lt;/p&gt;</comment>
                            <comment id="205261" author="gerrit" created="Sun, 13 Aug 2017 17:17:47 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/28356/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/28356/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9725&quot; title=&quot;Mount commands don&amp;#39;t return for targets in LFS with DNE and 3 MDTs &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9725&quot;&gt;&lt;del&gt;LU-9725&lt;/del&gt;&lt;/a&gt; quota: always deregister lwp&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: ce8ca7d3564439285a56982430f380354b697f68&lt;/p&gt;</comment>
                            <comment id="205295" author="pjones" created="Mon, 14 Aug 2017 05:03:07 +0000"  >&lt;p&gt;Landed for 2.11&lt;/p&gt;</comment>
                            <comment id="205783" author="gerrit" created="Fri, 18 Aug 2017 20:55:44 +0000"  >&lt;p&gt;John L. Hammond (john.hammond@intel.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/28357/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/28357/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9725&quot; title=&quot;Mount commands don&amp;#39;t return for targets in LFS with DNE and 3 MDTs &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9725&quot;&gt;&lt;del&gt;LU-9725&lt;/del&gt;&lt;/a&gt; quota: always deregister lwp&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_10&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: e53c0fbeefc1c29d7b5256c6a4cc6ead96ae41e8&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="36381">LU-8066</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="45624">LU-9376</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="27302" name="chroma-agent-console.log.txt" size="1281738" author="tanabarr" created="Fri, 30 Jun 2017 11:10:48 +0000"/>
                            <attachment id="27301" name="chroma-agent.log.txt" size="1648907" author="tanabarr" created="Fri, 30 Jun 2017 11:10:49 +0000"/>
                            <attachment id="27303" name="job_scheduler.log.txt" size="8412356" author="tanabarr" created="Fri, 30 Jun 2017 11:11:03 +0000"/>
                            <attachment id="27304" name="messages.txt" size="2230136" author="tanabarr" created="Fri, 30 Jun 2017 11:10:50 +0000"/>
                            <attachment id="27555" name="sysrq-t" size="383814" author="brian" created="Mon, 10 Jul 2017 13:12:39 +0000"/>
                            <attachment id="27305" name="yum.log.txt" size="150920" author="tanabarr" created="Fri, 30 Jun 2017 11:10:46 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzfwv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>