<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:42:41 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-11301] hung threads on MDT and MDT won&apos;t umount</title>
                <link>https://jira.whamcloud.com/browse/LU-11301</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;we&apos;ve had 2 more MDT hangs that have similar symptoms to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11082&quot; title=&quot;stuck threads on MDS&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11082&quot;&gt;&lt;del&gt;LU-11082&lt;/del&gt;&lt;/a&gt;. file operations on clients hang, usually gradually from more and more clients. in every case MDS failover restores functionality, one of the 3 MDTs won&apos;t umount, and the MDS has to be stonith&apos;d. only our large DNE filesystem is affected (dagg) and not home,images,apps, so likely DNE issue.&lt;/p&gt;

&lt;p&gt;the first hang was yesterday. the 3 MDTs were distributed across the 2 MDS&apos;s. dagg-mdt2 would not umount in the MDS failovers that were needed to restore operation. warble1,2 logs attached.&lt;br/&gt;
in the hang today all MDTs were on warble1 and dagg-mdt0 wouldn&apos;t umount. logs attached.&lt;/p&gt;

&lt;p&gt;previous to these 2 hangs, we haven&apos;t had similar MDS issues for 3 weeks. we can&apos;t see an obvious change in user behaviour that triggers this.&lt;/p&gt;

&lt;p&gt;the hangs yesterday started with&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Aug 29 14:19:37 warble2 kernel: LNet: Service thread pid 51461 was inactive for 200.46s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
Aug 29 14:19:37 warble2 kernel: Pid: 51461, comm: mdt_rdpg01_013 3.10.0-862.9.1.el7.x86_64 #1 SMP Mon Jul 16 16:29:36 UTC 2018
Aug 29 14:19:37 warble2 kernel: Call Trace:
Aug 29 14:19:37 warble2 kernel: [&amp;lt;ffffffffc0fb7037&amp;gt;] top_trans_wait_result+0xa6/0x155 [ptlrpc]
Aug 29 14:19:37 warble2 kernel: [&amp;lt;ffffffffc0f9890b&amp;gt;] top_trans_stop+0x42b/0x930 [ptlrpc]
Aug 29 14:19:37 warble2 kernel: [&amp;lt;ffffffffc18655f9&amp;gt;] lod_trans_stop+0x259/0x340 [lod]
Aug 29 14:19:37 warble2 kernel: [&amp;lt;ffffffffc1ac623a&amp;gt;] mdd_trans_stop+0x2a/0x46 [mdd]
Aug 29 14:19:37 warble2 kernel: [&amp;lt;ffffffffc1abbbcb&amp;gt;] mdd_attr_set+0x5eb/0xce0 [mdd]
Aug 29 14:19:37 warble2 kernel: [&amp;lt;ffffffffc1a1b206&amp;gt;] mdt_mfd_close+0x1a6/0x610 [mdt]
Aug 29 14:19:37 warble2 kernel: [&amp;lt;ffffffffc1a20981&amp;gt;] mdt_close_internal+0x121/0x220 [mdt]
Aug 29 14:19:37 warble2 kernel: [&amp;lt;ffffffffc1a20ca0&amp;gt;] mdt_close+0x220/0x780 [mdt]
Aug 29 14:19:37 warble2 kernel: [&amp;lt;ffffffffc0f8538a&amp;gt;] tgt_request_handle+0x92a/0x1370 [ptlrpc]
Aug 29 14:19:37 warble2 kernel: [&amp;lt;ffffffffc0f2de4b&amp;gt;] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
Aug 29 14:19:37 warble2 kernel: [&amp;lt;ffffffffc0f31592&amp;gt;] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
Aug 29 14:19:37 warble2 kernel: [&amp;lt;ffffffff9f8bb621&amp;gt;] kthread+0xd1/0xe0
Aug 29 14:19:37 warble2 kernel: [&amp;lt;ffffffff9ff205dd&amp;gt;] ret_from_fork_nospec_begin+0x7/0x21
Aug 29 14:19:37 warble2 kernel: [&amp;lt;ffffffffffffffff&amp;gt;] 0xffffffffffffffff
Aug 29 14:19:37 warble2 kernel: LustreError: dumping log to /tmp/lustre-log.1535516377.51461
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and the hangs today started with&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Aug 30 12:25:32 warble1 kernel: Lustre: dagg-MDT0000-osp-MDT0002: Connection to dagg-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
Aug 30 12:25:32 warble1 kernel: LustreError: 212573:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1535595632, 300s ago), entering recovery for dagg-MDT0000_UUID@192.168.44.21@o2ib44 ns: dagg-MDT0000-osp-MDT0002 lock: ffff93869b37ea00/0xddb792bb3962e7e2 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2 rrc: 13 type: IBT flags: 0x1000001000000 nid: local remote: 0xddb792bb3962e7e9 expref: -99 pid: 212573 timeout: 0 lvb_type: 0
Aug 30 12:25:32 warble1 kernel: Lustre: dagg-MDT0000: Received new LWP connection from 0@lo, removing former export from same NID
Aug 30 12:25:32 warble1 kernel: Lustre: Skipped 1 previous similar message
Aug 30 12:25:32 warble1 kernel: Lustre: dagg-MDT0000: Connection restored to 192.168.44.21@o2ib44 (at 0@lo)
Aug 30 12:25:32 warble1 kernel: Lustre: Skipped 5 previous similar messages
Aug 30 12:25:32 warble1 kernel: LustreError: 212573:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) Skipped 4 previous similar messages
Aug 30 12:25:34 warble1 kernel: LustreError: 132075:0:(ldlm_request.c:130:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1535595633, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-dagg-MDT0000_UUID lock: ffff931f1afa8a00/0xddb792bb39655687 lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2 rrc: 20 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 132075 timeout: 0 lvb_type: 0
Aug 30 12:25:34 warble1 kernel: LustreError: dumping log to /tmp/lustre-log.1535595934.132075
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I&apos;ve attached the first lustre-logs and syslog for both days.&lt;br/&gt;
there&apos;s nothing extra in console logs that I can see.&lt;br/&gt;
MDS&apos;s are warble, OSS&apos;s are arkle/umlaut, clients are john,farnarkle,bryan.&lt;/p&gt;

&lt;p&gt;on Lustre servers we have a bunch of patches over standard 2.10.4 applied. it&apos;s pretty much 2.10.5. to be precise, it&apos;s Lustre 2.10.4 plus these&lt;br/&gt;
lu10683-lu11093-checksum-overquota-gerrit32788-1fb85e7e.patch&lt;br/&gt;
lu10988-lfsck2-gerrit32522-21d33c11.patch&lt;br/&gt;
lu11074-mdc-xattr-gerrit32739-dea1cde9.patch&lt;br/&gt;
lu11107-xattr-gerrit32753-c96a8f08.patch&lt;br/&gt;
lu11111-lfsck-gerrit32796-693fe452.patch&lt;br/&gt;
lu11082-lu11103-stuckMdtThreads-gerrit32853-3dc08caa.diff&lt;br/&gt;
lu11062-stacktrace-gerrit32972-7232c445.patch&lt;/p&gt;

&lt;p&gt;on clients we have 2.10.4 plus&lt;br/&gt;
lu11074-gerrit32739_a1ae6014.diff&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</description>
                <environment>x86_64, zfs, 3 MDTs, all on 1 MDS, or across 2 MDS&amp;#39;s, 2.10.4 + many patches.</environment>
        <key id="53157">LU-11301</key>
            <summary>hung threads on MDT and MDT won&apos;t umount</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="laisiyao">Lai Siyao</assignee>
                                    <reporter username="scadmin">SC Admin</reporter>
                        <labels>
                    </labels>
                <created>Thu, 30 Aug 2018 06:53:17 +0000</created>
                <updated>Mon, 7 Jan 2019 19:43:01 +0000</updated>
                            <resolved>Fri, 21 Sep 2018 15:19:22 +0000</resolved>
                                    <version>Lustre 2.10.4</version>
                                    <fixVersion>Lustre 2.12.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="232817" author="pjones" created="Thu, 30 Aug 2018 17:12:29 +0000"  >&lt;p&gt;Lai&lt;/p&gt;

&lt;p&gt;Can you please assist with this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="232925" author="scadmin" created="Mon, 3 Sep 2018 11:25:42 +0000"  >&lt;p&gt;some more MDT hangs today.&lt;/p&gt;

&lt;p&gt;this time I managed to catch some hung processes on the client that was probably triggering the problem.&lt;/p&gt;

&lt;p&gt;the below lines are truncated output from &quot;ps auxw&quot; (I should have done &quot;ps auxwwww&quot;).&lt;br/&gt;
each chmod below probably has up to 5 directories as arguments. it was a sweep to chgrp and chmod on files and directories in a tree that was many levels deep and then 2 files in the leafs. all the below are ops on directories.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;root     410582  0.0  0.0 107980   476 pts/2    D    19:48   0:00 chmod g+s oz077/BulkSpec/19/04/00/11/41/AU1904001141B0 oz077/BulkSpec/19/04/00/11/79 oz077/BulkSpec/19
root     410584  0.0  0.0 107980   476 pts/2    D    19:48   0:00 chmod g+s oz077/BulkSpec/19/08/01/25/10/AU1908012510B0 oz077/BulkSpec/19/08/01/25/28 oz077/BulkSpec/19
root     410585  0.0  0.0 107980   108 pts/2    S    19:48   0:00 chmod 750 oz077/BulkSpec/19/05/00/35/34 oz077/BulkSpec/19/05/00/35/34/AU1905003534B0 oz077/BulkSpec/19
root     410586  0.0  0.0 107980   104 pts/2    S    19:48   0:00 chmod 750 oz077/BulkSpec/19/07/00/77/92/AU1907007792B0 oz077/BulkSpec/19/07/00/77/19 oz077/BulkSpec/19
root     411165  0.0  0.0 107980   108 pts/2    D    19:49   0:00 chmod g+s oz077/BulkSpec/19/07 oz077/BulkSpec/19/07/01 oz077/BulkSpec/19/07/01/01 oz077/BulkSpec/19/07
root     411166  0.0  0.0 107980   108 pts/2    S    19:49   0:00 chmod g+s oz077/BulkSpec/19/05 oz077/BulkSpec/19/05/00 oz077/BulkSpec/19/05/00/45 oz077/BulkSpec/19/05
root     411167  0.0  0.0 107980   108 pts/2    S    19:49   0:00 chmod g+s oz077/BulkSpec/19/06 oz077/BulkSpec/19/06/00 oz077/BulkSpec/19/06/00/68 oz077/BulkSpec/19/06
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I&apos;ll upload lustre-logs from most of the hangs, and syslog in a minute.&lt;/p&gt;

&lt;p&gt;there was also an LBUG on one of the failover&apos;s but I don&apos;t really care about that right now.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="232926" author="scadmin" created="Mon, 3 Sep 2018 12:19:15 +0000"  >&lt;p&gt;and another MDS hang. this time with only a single chmod running at once (but still 5 dirs as args) -&amp;gt;&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt; # ps auxwwwww
...
root      22196  0.0  0.0 108252   112 ?        S    21:55   0:00 xargs -0 -n5 chmod 750
root      22753  0.0  0.0      0     0 ?        S    21:56   0:00 [kworker/24:2]
root      22828  0.0  0.0 107980   100 ?        S    21:56   0:00 chmod 750 oz077/BulkSpec/19/10/01/85/35 oz077/BulkSpec/19/10/01/85/35/AU1910018535B0 oz077/BulkSpec/19/10/01/85/72 oz077/BulkSpec/19/10/01/85/72/AU1910018572B0 oz077/BulkSpec/19/10/01/85/96
root      22955  0.0  0.0      0     0 ?        S    21:59   0:00 [kworker/3:1]
root      23895  0.0  0.0      0     0 ?        S    22:01   0:00 [kworker/2:2]
...
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I should point out that I&apos;m not 100% sure it&apos;s this script and client crashing it, but the timing seems to make it quite likely.&lt;/p&gt;

&lt;p&gt;as a reminder, we have 3 MDTs and inherited dir striping across all 3.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="232967" author="laisiyao" created="Tue, 4 Sep 2018 08:01:07 +0000"  >&lt;p&gt;The logs show that many operations fail to cancel update logs:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000020:00080000:4.0:1535977826.431899:0:36891:0:(update_trans.c:80:top_multiple_thandle_dump()) dagg-MDT0000-osd tmt ffff95ce38f2a900 refcount 3 committed 0 result -116 batchid 446684036709
00000020:00080000:5.0:1535977826.456464:0:34564:0:(update_trans.c:80:top_multiple_thandle_dump()) dagg-MDT0000-osd tmt ffff962cd5916480 refcount 3 committed 0 result -116 batchid 446684036710
00000040:00020000:5.0:1535977836.962837:0:19030:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0001-osp-MDT0000: fail to cancel 0 of 1 llog-records: rc = -116
00000020:00080000:5.0:1535977836.977815:0:19030:0:(update_trans.c:1299:distribute_txn_cancel_records()) dagg-MDT0001-osp-MDT0000: batchid 446684036700 cancel update log [0x3:0x8003ad82:0x2].3749: rc = -116
00000020:00080000:14.0:1535977840.676310:0:38230:0:(update_trans.c:80:top_multiple_thandle_dump()) dagg-MDT0000-osd tmt ffff95ce284aaf00 refcount 3 committed 0 result -116 batchid 446684036711
00000020:00080000:6.0:1535977860.144482:0:27095:0:(update_trans.c:80:top_multiple_thandle_dump()) dagg-MDT0000-osd tmt ffff95ce284ada80 refcount 3 committed 0 result -116 batchid 446684036712
00000020:00080000:2.0:1535977887.558019:0:37736:0:(update_trans.c:80:top_multiple_thandle_dump()) dagg-MDT0000-osd tmt ffff95ce33879e80 refcount 3 committed 0 result -116 batchid 446684036713
00000020:00080000:4.0:1535977888.348589:0:18564:0:(update_trans.c:80:top_multiple_thandle_dump()) dagg-MDT0000-osd tmt ffff95cdb657b680 refcount 3 committed 0 result -116 batchid 446684036714
00000020:00080000:6.0:1535977908.552388:0:34801:0:(update_trans.c:80:top_multiple_thandle_dump()) dagg-MDT0000-osd tmt ffff95ce6abf8500 refcount 3 committed 0 result -116 batchid 446684036715
00000020:00080000:9.0:1535977950.428018:0:34789:0:(update_trans.c:80:top_multiple_thandle_dump()) dagg-MDT0000-osd tmt ffff95ce5027a180 refcount 3 committed 0 result -116 batchid 446684036716
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;-116 is -ESTALE, but I&apos;m not clear how it&apos;s returned, can you turn on full debug with &apos;lfs set_param debug=-1&apos; on all MDS, and reproduce this issue?&lt;/p&gt;</comment>
                            <comment id="232971" author="scadmin" created="Tue, 4 Sep 2018 08:43:13 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;these events are quite traumatic. we have at least 8 new corrupted and unremovable dirs from yesterday and we can&apos;t lfsck because of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11111&quot; title=&quot;crash doing LFSCK: orph_index_insert()) ASSERTION( !(obj-&amp;gt;mod_flags &amp;amp; ORPHAN_OBJ)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11111&quot;&gt;&lt;del&gt;LU-11111&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I would very much prefer to not to damage the production machine any further.&lt;br/&gt;
also last time we turned on -1 debugging then IIRC a bunch of clients got evicted.&lt;/p&gt;

&lt;p&gt;I&apos;ll see if I can reproduce it in a VM instead...&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="232996" author="scadmin" created="Tue, 4 Sep 2018 17:06:15 +0000"  >&lt;p&gt;I spent all day trying to reproduce it in a VM or on a 1-node lustre setup on a compute node, but (of course) couldn&apos;t.&lt;/p&gt;

&lt;p&gt;so I ran it on the real machine as you suggested. all 3 MDTs are on one MDS (warble2).&lt;/p&gt;

&lt;p&gt;the largest possible -1 debug logs still overflow very quickly, so I kept cycling them every 10s.&lt;br/&gt;
I think I caught your -116 cleanly -&amp;gt;&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;...
Sep  5 02:26:17 warble2 kernel: Lustre: debug daemon will attempt to start writing to /mnt/root/root/lu-11301.debug_daemon.warble2.1536078377 (20480000kB max)
Sep  5 02:26:21 warble2 kernel: debug daemon buffer overflowed; discarding 10% of pages (103 of 1024)
Sep  5 02:26:21 warble2 kernel: debug daemon buffer overflowed; discarding 10% of pages (103 of 1024)
Sep  5 02:26:21 warble2 kernel: debug daemon buffer overflowed; discarding 10% of pages (103 of 1024)
Sep  5 02:26:21 warble2 kernel: debug daemon buffer overflowed; discarding 10% of pages (103 of 1024)
Sep  5 02:26:21 warble2 kernel: debug daemon buffer overflowed; discarding 10% of pages (103 of 1024)
Sep  5 02:26:21 warble2 kernel: debug daemon buffer overflowed; discarding 10% of pages (103 of 1024)
Sep  5 02:26:21 warble2 kernel: debug daemon buffer overflowed; discarding 10% of pages (103 of 1024)
Sep  5 02:26:21 warble2 kernel: debug daemon buffer overflowed; discarding 10% of pages (103 of 1024)
Sep  5 02:26:21 warble2 kernel: debug daemon buffer overflowed; discarding 10% of pages (103 of 1024)
Sep  5 02:26:21 warble2 kernel: debug daemon buffer overflowed; discarding 10% of pages (103 of 1024)
Sep  5 02:26:27 warble2 kernel: Lustre: shutting down debug daemon thread...
Sep  5 02:26:27 warble2 kernel: Lustre: debug daemon will attempt to start writing to /mnt/root/root/lu-11301.debug_daemon.warble2.1536078387 (20480000kB max)
Sep  5 02:26:36 warble2 kernel: LustreError: 20423:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0001-osp-MDT0000: fail to cancel 0 of 1 llog-records: rc = -116
Sep  5 02:26:37 warble2 kernel: Lustre: shutting down debug daemon thread...
Sep  5 02:26:37 warble2 kernel: Lustre: debug daemon will attempt to start writing to /mnt/root/root/lu-11301.debug_daemon.warble2.1536078397 (20480000kB max)
...
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;so I&apos;ll upload&lt;/p&gt;

&lt;p&gt;lu-11301.debug_daemon.warble2.1536078377&lt;br/&gt;
lu-11301.debug_daemon.warble2.1536078387&lt;br/&gt;
lu-11301.debug_daemon.warble2.1536078397&lt;/p&gt;

&lt;p&gt;but I&apos;ll need a ftp site to upload them to. they are huge. please let me know the write-only location.&lt;/p&gt;

&lt;p&gt;BTW, almost all the clients were evicted on one of the power cycles to get back from this,  so it&apos;s complete cluster reboot time again. very sad. I hope these are enough logs.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="233021" author="laisiyao" created="Wed, 5 Sep 2018 02:07:05 +0000"  >&lt;p&gt;you can upload to ftp.whamcloud.com.&lt;/p&gt;</comment>
                            <comment id="233027" author="scadmin" created="Wed, 5 Sep 2018 05:04:59 +0000"  >&lt;p&gt;it won&apos;t let me upload. do I have to do it to a specific dir?&lt;/p&gt;</comment>
                            <comment id="233028" author="scadmin" created="Wed, 5 Sep 2018 05:39:34 +0000"  >&lt;p&gt;also Simon found that &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9157&quot; title=&quot;replay-single test_80c: rmdir failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9157&quot;&gt;&lt;del&gt;LU-9157&lt;/del&gt;&lt;/a&gt; has the same -116 error. but I guess you know that already.&lt;/p&gt;</comment>
                            <comment id="233029" author="scadmin" created="Wed, 5 Sep 2018 05:44:23 +0000"  >&lt;p&gt;ah, I figured out the ftp from an old intel ticket we had. cool. uploading now.&lt;/p&gt;</comment>
                            <comment id="233030" author="pjones" created="Wed, 5 Sep 2018 05:47:12 +0000"  >&lt;p&gt;Ah good - I was just rummaging for the instructions &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;</comment>
                            <comment id="233033" author="laisiyao" created="Wed, 5 Sep 2018 06:08:52 +0000"  >&lt;p&gt;ahh, I just noticed that in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9157&quot; title=&quot;replay-single test_80c: rmdir failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9157&quot;&gt;&lt;del&gt;LU-9157&lt;/del&gt;&lt;/a&gt;, and I&apos;ll check that.&lt;/p&gt;</comment>
                            <comment id="233035" author="scadmin" created="Wed, 5 Sep 2018 06:23:26 +0000"  >&lt;p&gt;I&apos;ve attached syslog that matches the -1 logs that are uploading.&lt;/p&gt;

&lt;p&gt;the 3 most likely -1 logs (listed above) are on the ftp site now in the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11301&quot; title=&quot;hung threads on MDT and MDT won&amp;#39;t umount&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11301&quot;&gt;&lt;del&gt;LU-11301&lt;/del&gt;&lt;/a&gt; dir.&lt;br/&gt;
I&apos;ve also uploaded one extra before and one extra after (but the earliest is still uploading, ETA 5 mins).&lt;br/&gt;
so there should be 5 total&lt;/p&gt;

&lt;p&gt;let me know if you&apos;d like more from the below list&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;lu-11301.debug_daemon.warble2.1536078317.gz
lu-11301.debug_daemon.warble2.1536078327.gz
lu-11301.debug_daemon.warble2.1536078337.gz
lu-11301.debug_daemon.warble2.1536078347.gz
lu-11301.debug_daemon.warble2.1536078357.gz
lu-11301.debug_daemon.warble2.1536078367.gz
lu-11301.debug_daemon.warble2.1536078377.gz
lu-11301.debug_daemon.warble2.1536078387.gz
lu-11301.debug_daemon.warble2.1536078397.gz
lu-11301.debug_daemon.warble2.1536078407.gz
lu-11301.debug_daemon.warble2.1536078417.gz
lu-11301.debug_daemon.warble2.1536078427.gz
lu-11301.debug_daemon.warble2.1536078437.gz
lu-11301.debug_daemon.warble2.1536078447.gz
lu-11301.debug_daemon.warble2.1536078457.gz
lu-11301.debug_daemon.warble2.1536078467.gz
lu-11301.debug_daemon.warble2.1536078477.gz
lu-11301.debug_daemon.warble2.1536078487.gz
lu-11301.debug_daemon.warble2.1536078497.gz
lu-11301.debug_daemon.warble2.1536078507.gz
lu-11301.debug_daemon.warble2.1536078517.gz
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="233043" author="laisiyao" created="Wed, 5 Sep 2018 08:20:01 +0000"  >&lt;p&gt;yes, I saw the 5 log files, and I&apos;m looking into them.&lt;/p&gt;</comment>
                            <comment id="233044" author="laisiyao" created="Wed, 5 Sep 2018 08:40:44 +0000"  >&lt;p&gt;I saw -116 error in one of them, thanks.&lt;/p&gt;</comment>
                            <comment id="233456" author="scadmin" created="Thu, 13 Sep 2018 12:06:26 +0000"  >&lt;p&gt;thanks for working on this.&lt;br/&gt;
please let us know if there is anything we can do to help.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="233516" author="gerrit" created="Fri, 14 Sep 2018 12:19:17 +0000"  >&lt;p&gt;Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/33169&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/33169&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11301&quot; title=&quot;hung threads on MDT and MDT won&amp;#39;t umount&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11301&quot;&gt;&lt;del&gt;LU-11301&lt;/del&gt;&lt;/a&gt; target: add lock in sub_trans_stop_cb()&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: c43baa1cb34e2ac98a6da36da1969e23bfdc42ca&lt;/p&gt;</comment>
                            <comment id="233517" author="laisiyao" created="Fri, 14 Sep 2018 12:27:36 +0000"  >&lt;p&gt;Robin, I just uploaded a patch, which should be able to apply on 2.10 without conflict, you can wait for review finished and then apply it.&lt;/p&gt;</comment>
                            <comment id="233520" author="scadmin" created="Fri, 14 Sep 2018 12:48:07 +0000"  >&lt;p&gt;will do. thanks!&lt;/p&gt;</comment>
                            <comment id="233687" author="scadmin" created="Tue, 18 Sep 2018 15:26:51 +0000"  >&lt;p&gt;FYI we&apos;re running this patch on our MDS&apos;s now.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="233853" author="gerrit" created="Fri, 21 Sep 2018 03:31:06 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/33169/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/33169/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11301&quot; title=&quot;hung threads on MDT and MDT won&amp;#39;t umount&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11301&quot;&gt;&lt;del&gt;LU-11301&lt;/del&gt;&lt;/a&gt; target: add lock in sub_trans_stop_cb()&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 9e313b46d1916f69d2e96b14540c3db88d105265&lt;/p&gt;</comment>
                            <comment id="233875" author="pjones" created="Fri, 21 Sep 2018 15:19:22 +0000"  >&lt;p&gt;Landed for 2.12&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                                        </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="30966" name="messages-20180829-grep-warble.txt.gz" size="63912" author="scadmin" created="Thu, 30 Aug 2018 06:35:28 +0000"/>
                            <attachment id="30965" name="messages-20180830-grep-v-slurm.etc.txt.gz" size="110333" author="scadmin" created="Thu, 30 Aug 2018 06:35:39 +0000"/>
                            <attachment id="30995" name="messages-20180905.grep-v_slurm.etc.txt.gz" size="12501" author="scadmin" created="Wed, 5 Sep 2018 06:14:38 +0000"/>
                            <attachment id="30981" name="messages-grep-v-slurm.etc.20180903.txt.gz" size="451362" author="scadmin" created="Mon, 3 Sep 2018 14:57:11 +0000"/>
                            <attachment id="30969" name="warble1-20180829-lustre-log.1535516379.149622.gz" size="6493253" author="scadmin" created="Thu, 30 Aug 2018 04:56:20 +0000"/>
                            <attachment id="30967" name="warble1-20180830-lustre-log.1535595934.132075.gz" size="6346876" author="scadmin" created="Thu, 30 Aug 2018 04:58:07 +0000"/>
                            <attachment id="30980" name="warble1-lustre-log.1535964912.429651.gz" size="6188838" author="scadmin" created="Mon, 3 Sep 2018 14:56:52 +0000"/>
                            <attachment id="30968" name="warble2-20180829-lustre-log.1535516377.51461.gz" size="6413266" author="scadmin" created="Thu, 30 Aug 2018 04:58:03 +0000"/>
                            <attachment id="30982" name="warble2-lustre-log.1535968315.274573.gz" size="5196475" author="scadmin" created="Mon, 3 Sep 2018 14:57:51 +0000"/>
                            <attachment id="30983" name="warble2-lustre-log.1535978027.36891.gz" size="4457376" author="scadmin" created="Mon, 3 Sep 2018 14:58:21 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i001h3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10021"><![CDATA[2]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>