<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:44:33 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-11516] ASSERTION( ((o)-&gt;lo_header-&gt;loh_attr &amp; LOHA_EXISTS) != 0 ) failed: LBUG</title>
                <link>https://jira.whamcloud.com/browse/LU-11516</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;after a namespace lfsck today, and not being able to run a production system due to the &quot;stale&quot; directory problems documented in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11418&quot; title=&quot;hung threads on MDT and MDT won&amp;#39;t umount&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11418&quot;&gt;&lt;del&gt;LU-11418&lt;/del&gt;&lt;/a&gt;, I had to reboot the MDS.&lt;/p&gt;

&lt;p&gt;unfortunately now I can no longer mount the filesystem - the MDS LBUG&apos;s before the MDTs are fully up.&lt;/p&gt;

&lt;p&gt;I&apos;ve tried mounting with -o skip_lfsck, but it doesn&apos;t help.&lt;br/&gt;
with the patch from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11418&quot; title=&quot;hung threads on MDT and MDT won&amp;#39;t umount&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11418&quot;&gt;&lt;del&gt;LU-11418&lt;/del&gt;&lt;/a&gt; I get nothing at all on the serial console - just a complete freeze of some sort.&lt;br/&gt;
without that patch, I get the LBUG below.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;...
2018-10-14 04:14:50 [  144.564183] LNet: Using FMR for registration
2018-10-14 04:14:50 [  144.565919] LNetError: 104:0:(o2iblnd_cb.c:2299:kiblnd_passive_connect()) Can&apos;t accept conn from 192.168.44.102@o2ib44 on NA (ib0:0:192.168.44.22): 
bad dst nid 192.168.44.22@o2ib44
2018-10-14 04:14:50 [  144.639679] LNet: Added LNI 192.168.44.22@o2ib44 [128/2048/0/180]
2018-10-14 04:14:50 [  144.693364] LustreError: 137-5: dagg-MDT0000_UUID: not available for connect from 192.168.44.140@o2ib44 (no target). If you are running an HA pair c
heck that the target is mounted on the other server.
2018-10-14 04:14:50 [  144.748380] Lustre: dagg-MDT0000: Not available for connect from 192.168.44.155@o2ib44 (not set up)
2018-10-14 04:14:50 [  144.850890] Lustre: dagg-MDT0000: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900
2018-10-14 04:14:50 [  144.893093] Lustre: dagg-MDT0000: Will be in recovery for at least 2:30, or until 130 clients reconnect
2018-10-14 04:14:50 [  144.903035] Lustre: dagg-MDT0000: Connection restored to  (at 192.168.44.160@o2ib44)
2018-10-14 04:14:50 [  144.904095] LNet: 18002:0:(o2iblnd_cb.c:1350:kiblnd_reconnect_peer()) Abort reconnection of 192.168.44.34@o2ib44: connected
2018-10-14 04:14:50 [  144.922881] Lustre: Skipped 2 previous similar messages
2018-10-14 04:14:50 [  145.197706] LustreError: 137-5: dagg-MDT0001_UUID: not available for connect from 192.168.44.135@o2ib44 (no target). If you are running an HA pair c
heck that the target is mounted on the other server.
2018-10-14 04:14:50 [  145.216609] LustreError: Skipped 100 previous similar messages
2018-10-14 04:14:51 [  145.405422] Lustre: dagg-MDT0000: Connection restored to e995292c-7ea9-b4b9-296b-7fc43479891e (at 192.168.44.204@o2ib44)
2018-10-14 04:14:51 [  145.416862] Lustre: Skipped 45 previous similar messages
2018-10-14 04:14:51 [  145.749608] LustreError: 11-0: dagg-MDT0000-osp-MDT0001: operation mds_connect to node 0@lo failed: rc = -114
2018-10-14 04:14:51 [  145.797392] Lustre: dagg-MDT0001: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900
2018-10-14 04:14:52 [  146.517192] Lustre: dagg-MDT0001: Will be in recovery for at least 2:30, or until 130 clients reconnect
2018-10-14 04:14:52 [  146.517895] Lustre: dagg-MDT0000: Connection restored to  (at 192.168.44.145@o2ib44)
2018-10-14 04:14:52 [  146.517896] Lustre: Skipped 17 previous similar messages
2018-10-14 04:14:52 [  146.648101] LustreError: 137-5: dagg-MDT0002_UUID: not available for connect from 192.168.44.155@o2ib44 (no target). If you are running an HA pair c
heck that the target is mounted on the other server.
2018-10-14 04:14:52 [  146.667038] LustreError: Skipped 27 previous similar messages
2018-10-14 04:14:53 [  147.742695] LustreError: 11-0: dagg-MDT0000-osp-MDT0002: operation mds_connect to node 0@lo failed: rc = -114
2018-10-14 04:14:53 [  147.809966] Lustre: dagg-MDT0002: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900
2018-10-14 04:14:53 [  147.906112] Lustre: dagg-MDT0002: Will be in recovery for at least 2:30, or until 130 clients reconnect
2018-10-14 04:14:54 [  148.619484] Lustre: dagg-MDT0002: Connection restored to 4043286f-fc2d-0b38-4c8b-0a509fa70b9d (at 192.168.44.150@o2ib44)
2018-10-14 04:14:54 [  148.631061] Lustre: Skipped 13 previous similar messages
2018-10-14 04:14:58 [  152.622612] Lustre: dagg-MDT0001: Connection restored to 4313929d-144f-a365-76bf-0741877d11de (at 192.168.44.182@o2ib44)
2018-10-14 04:14:58 [  152.634166] Lustre: Skipped 167 previous similar messages
2018-10-14 04:14:58 [  153.193285] Lustre: dagg-MDT0000: Recovery already passed deadline 2:21. It is due to DNE recovery failed/stuck on the 2 MDT(s): 0001 0002. Please wait until all MDTs recovered or abort the recovery by force.
2018-10-14 04:14:59 [  153.938830] Lustre: dagg-MDT0000: Recovery already passed deadline 2:21. It is due to DNE recovery failed/stuck on the 2 MDT(s): 0001 0002. Please wait until all MDTs recovered or abort the recovery by force.
2018-10-14 04:15:09 [  163.702354] Lustre: dagg-MDT0002: Connection restored to 66a65625-28da-d50f-c83b-b2f538a83827 (at 192.168.44.195@o2ib44)
2018-10-14 04:15:09 [  163.713974] Lustre: Skipped 135 previous similar messages
2018-10-14 04:15:25 [  179.961770] Lustre: dagg-MDT0000: Connection restored to d2eebaed-e98f-5c92-c1f5-5b272ce31e82 (at 10.8.49.221@tcp201)
2018-10-14 04:15:25 [  179.973196] Lustre: Skipped 38 previous similar messages
2018-10-14 04:15:31 [  185.747958] LustreError: 42034:0:(tgt_handler.c:509:tgt_filter_recovery_request()) @@@ not permitted during recovery  req@ffff8b825e9f2d00 x1613591234569248/t0(0) o601-&amp;gt;dagg-MDT0000-lwp-OST0001_UUID@192.168.44.31@o2ib44:652/0 lens 336/0 e 0 to 0 dl 1539450937 ref 1 fl Interpret:/0/ffffffff rc 0/-1
2018-10-14 04:15:31 [  185.776641] LustreError: 42034:0:(tgt_handler.c:509:tgt_filter_recovery_request()) Skipped 2 previous similar messages
2018-10-14 04:15:36 [  190.915287] LustreError: 42034:0:(tgt_handler.c:509:tgt_filter_recovery_request()) @@@ not permitted during recovery  req@ffff8b825ec6e300 x1613592493156656/t0(0) o601-&amp;gt;dagg-MDT0000-lwp-OST000b_UUID@192.168.44.36@o2ib44:657/0 lens 336/0 e 0 to 0 dl 1539450942 ref 1 fl Interpret:/0/ffffffff rc 0/-1
2018-10-14 04:15:36 [  190.944086] LustreError: 42034:0:(tgt_handler.c:509:tgt_filter_recovery_request()) Skipped 4 previous similar messages
2018-10-14 04:15:38 [  192.736539] LustreError: 42042:0:(tgt_handler.c:509:tgt_filter_recovery_request()) @@@ not permitted during recovery  req@ffff8b2334027200 x1613591230816544/t0(0) o601-&amp;gt;dagg-MDT0000-lwp-OST0004_UUID@192.168.44.33@o2ib44:659/0 lens 336/0 e 0 to 0 dl 1539450944 ref 1 fl Interpret:/0/ffffffff rc 0/-1
2018-10-14 04:15:38 [  192.765302] LustreError: 42042:0:(tgt_handler.c:509:tgt_filter_recovery_request()) Skipped 3 previous similar messages
2018-10-14 04:15:40 [  194.809319] LustreError: 42037:0:(tgt_handler.c:509:tgt_filter_recovery_request()) @@@ not permitted during recovery  req@ffff8b825e9cc200 x1613592493190864/t0(0) o601-&amp;gt;dagg-MDT0000-lwp-OST0007_UUID@192.168.44.34@o2ib44:661/0 lens 336/0 e 0 to 0 dl 1539450946 ref 1 fl Interpret:/0/ffffffff rc 0/-1
2018-10-14 04:15:40 [  194.838140] LustreError: 42037:0:(tgt_handler.c:509:tgt_filter_recovery_request()) Skipped 8 previous similar messages
2018-10-14 04:15:43 [  197.681575] Lustre: dagg-MDT0000: Recovery over after 0:53, of 130 clients 130 recovered and 0 were evicted.
2018-10-14 04:15:43 [  197.691192] Lustre: 20075:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4
2018-10-14 04:15:43 [  197.691198] LustreError: 20075:0:(lod_dev.c:420:lod_sub_recovery_thread()) dagg-MDT0000-osp-MDT0001 getting update log failed: rc = -22
2018-10-14 04:15:43 [  197.748062] LustreError: 43000:0:(lu_object.h:862:lu_object_attr()) ASSERTION( ((o)-&amp;gt;lo_header-&amp;gt;loh_attr &amp;amp; LOHA_EXISTS) != 0 ) failed: 
2018-10-14 04:15:43 [  197.762055] LustreError: 43000:0:(lu_object.h:862:lu_object_attr()) LBUG
2018-10-14 04:15:43 [  197.769608] Pid: 43000, comm: orph_cleanup_da 3.10.0-862.9.1.el7.x86_64 #1 SMP Mon Jul 16 16:29:36 UTC 2018
2018-10-14 04:15:43 [  197.780185] Call Trace:
2018-10-14 04:15:43 [  197.783478]  [&amp;lt;ffffffffc06f07cc&amp;gt;] libcfs_call_trace+0x8c/0xc0 [libcfs]
2018-10-14 04:15:43 [  197.790871]  [&amp;lt;ffffffffc06f087c&amp;gt;] lbug_with_loc+0x4c/0xa0 [libcfs]
2018-10-14 04:15:43 [  197.797889]  [&amp;lt;ffffffffc0b722dd&amp;gt;] orph_declare_index_delete+0x40d/0x460 [mdd]
2018-10-14 04:15:43 [  197.805872]  [&amp;lt;ffffffffc0b72761&amp;gt;] orph_key_test_and_del+0x431/0xd30 [mdd]
2018-10-14 04:15:43 [  197.813494]  [&amp;lt;ffffffffc0b73617&amp;gt;] __mdd_orphan_cleanup+0x5b7/0x840 [mdd]
2018-10-14 04:15:43 [  197.821009]  [&amp;lt;ffffffff8d2bb621&amp;gt;] kthread+0xd1/0xe0
2018-10-14 04:15:43 [  197.826702]  [&amp;lt;ffffffff8d9205dd&amp;gt;] ret_from_fork_nospec_begin+0x7/0x21
2018-10-14 04:15:43 [  197.833942]  [&amp;lt;ffffffffffffffff&amp;gt;] 0xffffffffffffffff
2018-10-14 04:15:43 [  197.839723] Kernel panic - not syncing: LBUG
2018-10-14 04:15:43 [  197.844759] CPU: 9 PID: 43000 Comm: orph_cleanup_da Tainted: P           OE  ------------   3.10.0-862.9.1.el7.x86_64 #1
2018-10-14 04:15:43 [  197.856369] Hardware name: Dell Inc. PowerEdge R740/0JM3W2, BIOS 1.4.8 05/21/2018
2018-10-14 04:15:43 [  197.864605] Call Trace:
2018-10-14 04:15:43 [  197.867810]  [&amp;lt;ffffffff8d90e84e&amp;gt;] dump_stack+0x19/0x1b
2018-10-14 04:15:43 [  197.873696]  [&amp;lt;ffffffff8d908b50&amp;gt;] panic+0xe8/0x21f
2018-10-14 04:15:43 [  197.879223]  [&amp;lt;ffffffffc06f08cb&amp;gt;] lbug_with_loc+0x9b/0xa0 [libcfs]
2018-10-14 04:15:43 [  197.886124]  [&amp;lt;ffffffffc0b722dd&amp;gt;] orph_declare_index_delete+0x40d/0x460 [mdd]
2018-10-14 04:15:43 [  197.893965]  [&amp;lt;ffffffffc11c4439&amp;gt;] ? lod_trans_create+0x39/0x50 [lod]
2018-10-14 04:15:43 [  197.901017]  [&amp;lt;ffffffffc0b72761&amp;gt;] orph_key_test_and_del+0x431/0xd30 [mdd]
2018-10-14 04:15:43 [  197.908491]  [&amp;lt;ffffffffc0b73617&amp;gt;] __mdd_orphan_cleanup+0x5b7/0x840 [mdd]
2018-10-14 04:15:43 [  197.915874]  [&amp;lt;ffffffffc0b73060&amp;gt;] ? orph_key_test_and_del+0xd30/0xd30 [mdd]
2018-10-14 04:15:43 [  197.923506]  [&amp;lt;ffffffff8d2bb621&amp;gt;] kthread+0xd1/0xe0
2018-10-14 04:15:43 [  197.929044]  [&amp;lt;ffffffff8d2bb550&amp;gt;] ? insert_kthread_work+0x40/0x40
2018-10-14 04:15:43 [  197.935789]  [&amp;lt;ffffffff8d9205dd&amp;gt;] ret_from_fork_nospec_begin+0x7/0x21
2018-10-14 04:15:43 [  197.942869]  [&amp;lt;ffffffff8d2bb550&amp;gt;] ? insert_kthread_work+0x40/0x40
2018-10-14 04:15:43 [  197.949604] Kernel Offset: 0xc200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;our standard server lustre is 2.10.4 plus these patches&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;  lu10683-lu11093-checksum-overquota-gerrit32788-1fb85e7e.patch
  lu10988-lfsck2-gerrit32522-21d33c11.patch
  lu11074-mdc-xattr-gerrit32739-dea1cde9.patch
  lu11107-xattr-gerrit32753-c96a8f08.patch
  lu11111-lfsck-gerrit32796-693fe452.patch
  lu11082-lu11103-stuckMdtThreads-gerrit32853-3dc08caa.diff
  lu11062-stacktrace-gerrit32972-7232c445.patch
  lu11301-stuckMdtThreads2-c43baa1c.patch
  lu11419-lfsckDoesntFinish-gerrit33252-22503a1d.diff
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;but also looks like the same LBUG when I boot into a testing server image with 2.10.5 + these patches&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;  lu11082-lu11103-stuckMdtThreads-gerrit32853-3dc08caa.diff
  lu11111-lfsck-gerrit32796-693fe452.ported.patch    # fixed to apply cleanly
  lu11201-lfsckDoesntFinish-gerrit33078-4829fb05.patch
  lu11301-stuckMdtThreads2-c43baa1c.patch
  lu11418-hungMdtZfs-gerrit33248-eaa3c60d.diff     # doesn&apos;t apply cleanly. needs lustre/include/obd_support.h edit
  lu11419-lfsckDoesntFinish-gerrit33252-22503a1d.diff
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2018-10-14 04:45:14 [  256.907785] LustreError: 46668:0:(lu_object.h:862:lu_object_attr()) ASSERTION( ((o)-&amp;gt;lo_header-&amp;gt;loh_attr &amp;amp; LOHA_EXISTS) != 0 ) failed:
2018-10-14 04:45:14 [  256.921784] LustreError: 46668:0:(lu_object.h:862:lu_object_attr()) LBUG
2018-10-14 04:45:14 [  256.929337] Pid: 46668, comm: orph_cleanup_da 3.10.0-862.9.1.el7.x86_64 #1 SMP Mon Jul 16 16:29:36 UTC 2018
2018-10-14 04:45:14 [  256.939901] Call Trace:
2018-10-14 04:45:14 [  256.943185]  [&amp;lt;ffffffffc08ae7cc&amp;gt;] libcfs_call_trace+0x8c/0xc0 [libcfs]
2018-10-14 04:45:14 [  256.950584]  [&amp;lt;ffffffffc08ae87c&amp;gt;] lbug_with_loc+0x4c/0xa0 [libcfs]
2018-10-14 04:45:14 [  256.957618]  [&amp;lt;ffffffffc1abf2dd&amp;gt;] orph_declare_index_delete+0x40d/0x460 [mdd]
2018-10-14 04:45:14 [  256.965586]  [&amp;lt;ffffffffc1abf761&amp;gt;] orph_key_test_and_del+0x431/0xd30 [mdd]
2018-10-14 04:45:14 [  256.973192]  [&amp;lt;ffffffffc1ac0617&amp;gt;] __mdd_orphan_cleanup+0x5b7/0x840 [mdd]
2018-10-14 04:45:14 [  256.980697]  [&amp;lt;ffffffffb4cbb621&amp;gt;] kthread+0xd1/0xe0
2018-10-14 04:45:14 [  256.986380]  [&amp;lt;ffffffffb53205dd&amp;gt;] ret_from_fork_nospec_begin+0x7/0x21
2018-10-14 04:45:14 [  256.993618]  [&amp;lt;ffffffffffffffff&amp;gt;] 0xffffffffffffffff
2018-10-14 04:45:14 [  256.999378] Kernel panic - not syncing: LBUG
2018-10-14 04:45:14 [  257.004389] CPU: 5 PID: 46668 Comm: orph_cleanup_da Tainted: P           OE  ------------   3.10.0-862.9.1.el7.x86_64 #1
2018-10-14 04:45:14 [  257.015980] Hardware name: Dell Inc. PowerEdge R740/0JM3W2, BIOS 1.4.8 05/21/2018
2018-10-14 04:45:14 [  257.024200] Call Trace:
2018-10-14 04:45:14 [  257.027392]  [&amp;lt;ffffffffb530e84e&amp;gt;] dump_stack+0x19/0x1b
2018-10-14 04:45:14 [  257.033268]  [&amp;lt;ffffffffb5308b50&amp;gt;] panic+0xe8/0x21f
2018-10-14 04:45:14 [  257.038783]  [&amp;lt;ffffffffc08ae8cb&amp;gt;] lbug_with_loc+0x9b/0xa0 [libcfs]
2018-10-14 04:45:14 [  257.045676]  [&amp;lt;ffffffffc1abf2dd&amp;gt;] orph_declare_index_delete+0x40d/0x460 [mdd]
2018-10-14 04:45:14 [  257.053528]  [&amp;lt;ffffffffc1a24439&amp;gt;] ? lod_trans_create+0x39/0x50 [lod]
2018-10-14 04:45:14 [  257.060588]  [&amp;lt;ffffffffc1abf761&amp;gt;] orph_key_test_and_del+0x431/0xd30 [mdd]
2018-10-14 04:45:14 [  257.068071]  [&amp;lt;ffffffffc1ac0617&amp;gt;] __mdd_orphan_cleanup+0x5b7/0x840 [mdd]
2018-10-14 04:45:14 [  257.075460]  [&amp;lt;ffffffffc1ac0060&amp;gt;] ? orph_key_test_and_del+0xd30/0xd30 [mdd]
2018-10-14 04:45:14 [  257.083100]  [&amp;lt;ffffffffb4cbb621&amp;gt;] kthread+0xd1/0xe0
2018-10-14 04:45:14 [  257.088650]  [&amp;lt;ffffffffb4cbb550&amp;gt;] ? insert_kthread_work+0x40/0x40
2018-10-14 04:45:14 [  257.095401]  [&amp;lt;ffffffffb53205dd&amp;gt;] ret_from_fork_nospec_begin+0x7/0x21
2018-10-14 04:45:14 [  257.102490]  [&amp;lt;ffffffffb4cbb550&amp;gt;] ? insert_kthread_work+0x40/0x40
2018-10-14 04:45:14 [  257.109241] Kernel Offset: 0x33c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</description>
                <environment>x86_64, zfs, DNE, centos 7.5</environment>
        <key id="53595">LU-11516</key>
            <summary>ASSERTION( ((o)-&gt;lo_header-&gt;loh_attr &amp; LOHA_EXISTS) != 0 ) failed: LBUG</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="laisiyao">Lai Siyao</assignee>
                                    <reporter username="scadmin">SC Admin</reporter>
                        <labels>
                    </labels>
                <created>Sat, 13 Oct 2018 17:50:24 +0000</created>
                <updated>Thu, 11 Jul 2019 17:09:45 +0000</updated>
                            <resolved>Fri, 2 Nov 2018 10:45:45 +0000</resolved>
                                    <version>Lustre 2.10.5</version>
                                    <fixVersion>Lustre 2.12.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="234892" author="pjones" created="Sat, 13 Oct 2018 19:24:42 +0000"  >&lt;p&gt;Lai&lt;/p&gt;

&lt;p&gt;Can you please advise here?&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="234894" author="adilger" created="Sat, 13 Oct 2018 23:12:43 +0000"  >&lt;p&gt;It isn&apos;t quite clear what the root of the problem is, but I pushed a patch that will hopefully get you past the LASSERT() failing during orphan cleanup. That is just for files that were open-unlinked and are now being deleted, so it isn&apos;t a reason to stop the mount of the MDS. &lt;/p&gt;</comment>
                            <comment id="234895" author="scadmin" created="Sun, 14 Oct 2018 03:40:40 +0000"  >&lt;p&gt;thanks Andreas. that&apos;s done the trick. filesystem is back up.&lt;/p&gt;</comment>
                            <comment id="234898" author="gerrit" created="Sun, 14 Oct 2018 04:45:00 +0000"  >&lt;p&gt;Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/33366&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/33366&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11516&quot; title=&quot;ASSERTION( ((o)-&amp;gt;lo_header-&amp;gt;loh_attr &amp;amp; LOHA_EXISTS) != 0 ) failed: LBUG&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11516&quot;&gt;&lt;del&gt;LU-11516&lt;/del&gt;&lt;/a&gt; mdd: do not assert on missing orphan&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_10&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: aa82c8ede4fb0661671542e084ac4a0654efb7d5&lt;/p&gt;</comment>
                            <comment id="234900" author="gerrit" created="Sun, 14 Oct 2018 19:56:26 +0000"  >&lt;p&gt;Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/33368&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/33368&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11516&quot; title=&quot;ASSERTION( ((o)-&amp;gt;lo_header-&amp;gt;loh_attr &amp;amp; LOHA_EXISTS) != 0 ) failed: LBUG&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11516&quot;&gt;&lt;del&gt;LU-11516&lt;/del&gt;&lt;/a&gt; mdd: do not assert on missing orphan&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 41bb8c3049360ed53c7f8974251911d8717abbdf&lt;/p&gt;</comment>
                            <comment id="234905" author="adilger" created="Mon, 15 Oct 2018 02:06:43 +0000"  >&lt;p&gt;Sorry for the patch spam on this ticket, I pushed the b2_10 version of the patch to the master branch and had some trouble to fix it. &lt;/p&gt;

&lt;p&gt;Glad that this got your system back working&lt;/p&gt;

&lt;p&gt; It would be useful to see the error messages reported on the console when the MDS starts up. Is there just a single message about a bad object in the orphan list, or are there many of them?&lt;/p&gt;</comment>
                            <comment id="234908" author="scadmin" created="Mon, 15 Oct 2018 05:04:59 +0000"  >&lt;p&gt;Hi Andreas,&lt;/p&gt;

&lt;p&gt;MDS conman log for warble2 attached.&lt;br/&gt;
warble2 is the MDS that has all 3 DNE dagg MDTs on it. the other MDS (warble1) has MGS and non-DNE MDTs from 3 smaller fs&apos;s.&lt;/p&gt;

&lt;p&gt;I can&apos;t see a specific orphan message at mount. we have some ongoing orphan related messages though.&lt;/p&gt;

&lt;p&gt;there&apos;s a bunch of known corruption in the filesystem. mostly from all the DNE MDT hangs. we have been trying to lfsck since July to try and fix some of this. one lfsck succeeded this weekend, but then we ran into this problem and still have others.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="236196" author="gerrit" created="Fri, 2 Nov 2018 07:14:23 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/33368/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/33368/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11516&quot; title=&quot;ASSERTION( ((o)-&amp;gt;lo_header-&amp;gt;loh_attr &amp;amp; LOHA_EXISTS) != 0 ) failed: LBUG&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11516&quot;&gt;&lt;del&gt;LU-11516&lt;/del&gt;&lt;/a&gt; mdd: do not assert on missing orphan&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 5d89450b462f76fe2a377e7253595a162dae309b&lt;/p&gt;</comment>
                            <comment id="236222" author="pjones" created="Fri, 2 Nov 2018 10:45:45 +0000"  >&lt;p&gt;The removal of the assert patch has landed for 2.12. The other issues are being tracked under different tickets.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="47620">LU-9818</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="52628">LU-11111</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="36091">LU-8013</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="53392">LU-11418</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="31197" name="conman-warble2.log" size="25502" author="scadmin" created="Mon, 15 Oct 2018 02:48:34 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00467:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10020"><![CDATA[1]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>