<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:39:58 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-10988] LBUG in lfsck</title>
                <link>https://jira.whamcloud.com/browse/LU-10988</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;we tested warble1 hardware all we could for about a week and found no hardware issues. we also replaced sas cards and cables just to be safe.&lt;/p&gt;

&lt;p&gt;warble1 now is 3.10.0-693.21.1.el7.x86_64 and zfs 0.7.8 and has these patches applied&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;usr/src/lustre-2.10.3/lu10212-estale.patch
usr/src/lustre-2.10.3/lu10707-ksocklnd-revert-jiffies.patch
usr/src/lustre-2.10.3/lu10707-lnet-route-jiffies.patch
usr/src/lustre-2.10.3/lu10887-lfsck.patch
usr/src/lustre-2.10.3/lu8990-put-root.patch
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;when the dagg MDT&apos;s were mounted on warble1 they COMPLETED ok and then about 5 seconds later it hit an LBUG in lfsck.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;...
2018-05-02 22:06:06 [ 2919.828067] Lustre: dagg-MDT0000: Client 22c84389-af1f-9970-0e9b-70c3a4861afd (at 10.8.49.155@tcp201) reconnecting
2018-05-02 22:06:06 [ 2919.828113] Lustre: dagg-MDT0002: Recovery already passed deadline 0:31. If you do not want to wait more, please abort the recovery by force.
2018-05-02 22:06:38 [ 2951.686211] Lustre: dagg-MDT0002: recovery is timed out, evict stale exports
2018-05-02 22:06:38 [ 2951.694197] Lustre: dagg-MDT0002: disconnecting 1 stale clients
2018-05-02 22:06:38 [ 2951.736799] Lustre: 24680:0:(ldlm_lib.c:2544:target_recovery_thread()) too long recovery - read logs
2018-05-02 22:06:38 [ 2951.746774] Lustre: dagg-MDT0002: Recovery over after 6:24, of 125 clients 124 recovered and 1 was evicted.
2018-05-02 22:06:38 [ 2951.746775] LustreError: dumping log to /tmp/lustre-log.1525262798.24680
2018-05-02 22:06:44 [ 2957.910031] LustreError: 33236:0:(dt_object.c:213:dt_mode_to_dft()) LBUG
2018-05-02 22:06:44 [ 2957.917615] Pid: 33236, comm: lfsck_namespace
2018-05-02 22:06:44 [ 2957.922760]
2018-05-02 22:06:44 [ 2957.922760] Call Trace:
2018-05-02 22:06:44 [ 2957.928142]  [&amp;lt;ffffffffc06457ae&amp;gt;] libcfs_call_trace+0x4e/0x60 [libcfs]
2018-05-02 22:06:44 [ 2957.935374]  [&amp;lt;ffffffffc064583c&amp;gt;] lbug_with_loc+0x4c/0xb0 [libcfs]
2018-05-02 22:06:44 [ 2957.942270]  [&amp;lt;ffffffffc0d82573&amp;gt;] dt_mode_to_dft+0x73/0x80 [obdclass]
2018-05-02 22:06:44 [ 2957.949398]  [&amp;lt;ffffffffc115ac81&amp;gt;] lfsck_namespace_repair_dangling+0x621/0xf40 [lfsck]
2018-05-02 22:06:44 [ 2957.957911]  [&amp;lt;ffffffffc0d7ea22&amp;gt;] ? htable_lookup+0x102/0x180 [obdclass]
2018-05-02 22:06:44 [ 2957.965289]  [&amp;lt;ffffffffc1186f4a&amp;gt;] lfsck_namespace_striped_dir_rescan+0x86a/0x1220 [lfsck]
2018-05-02 22:06:44 [ 2957.974129]  [&amp;lt;ffffffffc115ce71&amp;gt;] lfsck_namespace_assistant_handler_p1+0x18d1/0x1f40 [lfsck]
2018-05-02 22:06:44 [ 2957.983217]  [&amp;lt;ffffffff8102954d&amp;gt;] ? __switch_to+0xcd/0x500
2018-05-02 22:06:44 [ 2957.989375]  [&amp;lt;ffffffffc114098e&amp;gt;] lfsck_assistant_engine+0x3ce/0x20b0 [lfsck]
2018-05-02 22:06:44 [ 2957.997154]  [&amp;lt;ffffffff810cb0b5&amp;gt;] ? sched_clock_cpu+0x85/0xc0
2018-05-02 22:06:44 [ 2958.003538]  [&amp;lt;ffffffff8102954d&amp;gt;] ? __switch_to+0xcd/0x500
2018-05-02 22:06:44 [ 2958.009648]  [&amp;lt;ffffffff810c7c70&amp;gt;] ? default_wake_function+0x0/0x20
2018-05-02 22:06:44 [ 2958.016449]  [&amp;lt;ffffffffc11405c0&amp;gt;] ? lfsck_assistant_engine+0x0/0x20b0 [lfsck]
2018-05-02 22:06:44 [ 2958.024186]  [&amp;lt;ffffffff810b4031&amp;gt;] kthread+0xd1/0xe0
2018-05-02 22:06:44 [ 2958.029662]  [&amp;lt;ffffffff810b3f60&amp;gt;] ? kthread+0x0/0xe0
2018-05-02 22:06:44 [ 2958.035220]  [&amp;lt;ffffffff816c055d&amp;gt;] ret_from_fork+0x5d/0xb0
2018-05-02 22:06:44 [ 2958.041197]  [&amp;lt;ffffffff810b3f60&amp;gt;] ? kthread+0x0/0xe0
2018-05-02 22:06:44 [ 2958.046723]
2018-05-02 22:06:44 [ 2958.048771] Kernel panic - not syncing: LBUG
2018-05-02 22:06:44 [ 2958.053576] CPU: 2 PID: 33236 Comm: lfsck_namespace Tainted: P           OE  ------------   3.10.0-693.21.1.el7.x86_64 #1
2018-05-02 22:06:44 [ 2958.065051] Hardware name: Dell Inc. PowerEdge R740/0JM3W2, BIOS 1.3.7 02/08/2018
2018-05-02 22:06:44 [ 2958.073066] Call Trace:
2018-05-02 22:06:44 [ 2958.076060]  [&amp;lt;ffffffff816ae7c8&amp;gt;] dump_stack+0x19/0x1b
2018-05-02 22:06:44 [ 2958.081738]  [&amp;lt;ffffffff816a8634&amp;gt;] panic+0xe8/0x21f
2018-05-02 22:06:44 [ 2958.087058]  [&amp;lt;ffffffffc0645854&amp;gt;] lbug_with_loc+0x64/0xb0 [libcfs]
2018-05-02 22:06:44 [ 2958.093781]  [&amp;lt;ffffffffc0d82573&amp;gt;] dt_mode_to_dft+0x73/0x80 [obdclass]
2018-05-02 22:06:44 [ 2958.100741]  [&amp;lt;ffffffffc115ac81&amp;gt;] lfsck_namespace_repair_dangling+0x621/0xf40 [lfsck]
2018-05-02 22:06:45 [ 2958.109091]  [&amp;lt;ffffffffc0d7ea22&amp;gt;] ? htable_lookup+0x102/0x180 [obdclass]
2018-05-02 22:06:45 [ 2958.116294]  [&amp;lt;ffffffffc1186f4a&amp;gt;] lfsck_namespace_striped_dir_rescan+0x86a/0x1220 [lfsck]
2018-05-02 22:06:45 [ 2958.124963]  [&amp;lt;ffffffffc115ce71&amp;gt;] lfsck_namespace_assistant_handler_p1+0x18d1/0x1f40 [lfsck]
2018-05-02 22:06:45 [ 2958.133889]  [&amp;lt;ffffffff8102954d&amp;gt;] ? __switch_to+0xcd/0x500
2018-05-02 22:06:45 [ 2958.139866]  [&amp;lt;ffffffffc114098e&amp;gt;] lfsck_assistant_engine+0x3ce/0x20b0 [lfsck]
2018-05-02 22:06:45 [ 2958.147492]  [&amp;lt;ffffffff810cb0b5&amp;gt;] ? sched_clock_cpu+0x85/0xc0
2018-05-02 22:06:45 [ 2958.153724]  [&amp;lt;ffffffff8102954d&amp;gt;] ? __switch_to+0xcd/0x500
2018-05-02 22:06:45 [ 2958.159689]  [&amp;lt;ffffffff810c7c70&amp;gt;] ? wake_up_state+0x20/0x20
2018-05-02 22:06:45 [ 2958.165742]  [&amp;lt;ffffffffc11405c0&amp;gt;] ? lfsck_master_engine+0x1310/0x1310 [lfsck]
2018-05-02 22:06:45 [ 2958.173343]  [&amp;lt;ffffffff810b4031&amp;gt;] kthread+0xd1/0xe0
2018-05-02 22:06:45 [ 2958.178685]  [&amp;lt;ffffffff810b3f60&amp;gt;] ? insert_kthread_work+0x40/0x40
2018-05-02 22:06:45 [ 2958.185227]  [&amp;lt;ffffffff816c055d&amp;gt;] ret_from_fork+0x5d/0xb0
2018-05-02 22:06:45 [ 2958.191065]  [&amp;lt;ffffffff810b3f60&amp;gt;] ? insert_kthread_work+0x40/0x40
2018-05-02 22:06:45 [ 2958.197613] Kernel Offset: disabled
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;I&apos;ve failed the MDT&apos;s back to warble2 and mounted them by hand with -o skip_lfsck&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</description>
                <environment></environment>
        <key id="52079">LU-10988</key>
            <summary>LBUG in lfsck</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="yong.fan">nasf</assignee>
                                    <reporter username="scadmin">SC Admin</reporter>
                        <labels>
                    </labels>
                <created>Wed, 2 May 2018 13:59:04 +0000</created>
                <updated>Mon, 20 Aug 2018 15:42:59 +0000</updated>
                            <resolved>Sat, 12 May 2018 05:49:29 +0000</resolved>
                                    <version>Lustre 2.10.3</version>
                                    <fixVersion>Lustre 2.12.0</fixVersion>
                    <fixVersion>Lustre 2.10.5</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="227093" author="pjones" created="Wed, 2 May 2018 14:01:07 +0000"  >&lt;blockquote&gt;&lt;p&gt;usr/src/lustre-2.10.3/lu10887-lfsck.patch&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;So you have applied the patch &lt;a href=&quot;https://review.whamcloud.com/31915/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/31915/&lt;/a&gt; on the MDT, right? If yes, there should be other corners for lfsck_namespace_repair_dangling() that missed to set the &#8220;mode&#8221;. I will investigate more. Anyway, the existed patches are still valid.&lt;/p&gt;</comment>
                            <comment id="227095" author="scadmin" created="Wed, 2 May 2018 14:03:10 +0000"  >&lt;p&gt;yes, that patch is what I&apos;ve called lu10887-lfsck.patch and is applied.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="227102" author="gerrit" created="Wed, 2 May 2018 14:29:00 +0000"  >&lt;p&gt;Fan Yong (fan.yong@intel.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/32245&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/32245&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10988&quot; title=&quot;LBUG in lfsck&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10988&quot;&gt;&lt;del&gt;LU-10988&lt;/del&gt;&lt;/a&gt; obdclass: show invalid mode for dt_mode_to_dft&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: b4b453e5a8d4ff2cb51b4b261b54457d3258b8c6&lt;/p&gt;</comment>
                            <comment id="227103" author="yong.fan" created="Wed, 2 May 2018 14:30:39 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=scadmin&quot; class=&quot;user-hover&quot; rel=&quot;scadmin&quot;&gt;scadmin&lt;/a&gt;, would you please to apply the patch &lt;a href=&quot;https://review.whamcloud.com/32245?&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/32245?&lt;/a&gt; that will help us to understand what the bad file &quot;mode&quot; is when hit the trouble. Thanks!&lt;/p&gt;</comment>
                            <comment id="227112" author="scadmin" created="Wed, 2 May 2018 15:10:23 +0000"  >&lt;p&gt;no probs.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2018-05-03 01:02:53 [10209.454437] LustreError: 35645:0:(dt_object.c:213:dt_mode_to_dft()) ASSERTION( 0 ) failed: invalid mode 0
2018-05-03 01:02:53 [10209.464798] LustreError: 35645:0:(dt_object.c:213:dt_mode_to_dft()) LBUG
2018-05-03 01:02:53 [10209.472260] Pid: 35645, comm: lfsck_namespace
2018-05-03 01:02:53 [10209.477336] 
2018-05-03 01:02:53 [10209.477336] Call Trace:
2018-05-03 01:02:53 [10209.482611]  [&amp;lt;ffffffffc05f67ae&amp;gt;] libcfs_call_trace+0x4e/0x60 [libcfs]
2018-05-03 01:02:53 [10209.489784]  [&amp;lt;ffffffffc05f683c&amp;gt;] lbug_with_loc+0x4c/0xb0 [libcfs]
2018-05-03 01:02:53 [10209.496628]  [&amp;lt;ffffffffc0b275a1&amp;gt;] dt_mode_to_dft+0xa1/0xb0 [obdclass]
2018-05-03 01:02:53 [10209.503721]  [&amp;lt;ffffffffc14b5c81&amp;gt;] lfsck_namespace_repair_dangling+0x621/0xf40 [lfsck]
2018-05-03 01:02:53 [10209.512193]  [&amp;lt;ffffffffc0b23a22&amp;gt;] ? htable_lookup+0x102/0x180 [obdclass]
2018-05-03 01:02:53 [10209.519533]  [&amp;lt;ffffffffc14e1f4a&amp;gt;] lfsck_namespace_striped_dir_rescan+0x86a/0x1220 [lfsck]
2018-05-03 01:02:53 [10209.528336]  [&amp;lt;ffffffffc14b7e71&amp;gt;] lfsck_namespace_assistant_handler_p1+0x18d1/0x1f40 [lfsck]
2018-05-03 01:02:53 [10209.537370]  [&amp;lt;ffffffff8102954d&amp;gt;] ? __switch_to+0xcd/0x500
2018-05-03 01:02:53 [10209.543458]  [&amp;lt;ffffffffc149b98e&amp;gt;] lfsck_assistant_engine+0x3ce/0x20b0 [lfsck]
2018-05-03 01:02:53 [10209.551182]  [&amp;lt;ffffffff810cb0b5&amp;gt;] ? sched_clock_cpu+0x85/0xc0
2018-05-03 01:02:53 [10209.557512]  [&amp;lt;ffffffff8102954d&amp;gt;] ? __switch_to+0xcd/0x500
2018-05-03 01:02:53 [10209.563565]  [&amp;lt;ffffffff810c7c70&amp;gt;] ? default_wake_function+0x0/0x20
2018-05-03 01:02:53 [10209.570305]  [&amp;lt;ffffffffc149b5c0&amp;gt;] ? lfsck_assistant_engine+0x0/0x20b0 [lfsck]
2018-05-03 01:02:53 [10209.578004]  [&amp;lt;ffffffff810b4031&amp;gt;] kthread+0xd1/0xe0
2018-05-03 01:02:53 [10209.583438]  [&amp;lt;ffffffff810b3f60&amp;gt;] ? kthread+0x0/0xe0
2018-05-03 01:02:53 [10209.588968]  [&amp;lt;ffffffff816c055d&amp;gt;] ret_from_fork+0x5d/0xb0
2018-05-03 01:02:53 [10209.594914]  [&amp;lt;ffffffff810b3f60&amp;gt;] ? kthread+0x0/0xe0
2018-05-03 01:02:53 [10209.600428] 
2018-05-03 01:02:53 [10209.602473] Kernel panic - not syncing: LBUG
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="227119" author="yong.fan" created="Wed, 2 May 2018 16:40:24 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=scadmin&quot; class=&quot;user-hover&quot; rel=&quot;scadmin&quot;&gt;scadmin&lt;/a&gt;, thanks for the new test results. I have updated the patch ttps://review.whamcloud.com/32245. That should have fixed your trouble. Would you please to try again (set 2 or newer)? Thanks!&lt;/p&gt;</comment>
                            <comment id="227176" author="scadmin" created="Thu, 3 May 2018 10:18:51 +0000"  >&lt;p&gt;yup, that worked. thanks!&lt;/p&gt;

&lt;p&gt;lfsck on the MDT1,MDT2 where it was &apos;crashed&apos; has now completed:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[warble1]root:   lctl get_param -n mdd.dagg-MDT000*.lfsck_namespace
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: stopped
flags:
param: all_targets,create_mdtobj
last_completed_time: 1523258154
time_since_last_completed: 2084076 seconds
latest_start_time: 1523283745
time_since_latest_start: 2058485 seconds
last_checkpoint_time: 1523284186
time_since_last_checkpoint: 2058044 seconds
latest_start_position: 3643308, [0x20001072c:0x16502:0x0], 0x75092e6bed520000
last_checkpoint_position: 3643308, [0x20001072c:0x16502:0x0], 0x750934ffbf07ffff
first_failure_position: 3643307, N/A, N/A
checked_phase1: 2426567
checked_phase2: 0
updated_phase1: 0
updated_phase2: 0
failed_phase1: 1
failed_phase2: 0
directories: 202176
dirent_repaired: 0
linkea_repaired: 0
nlinks_repaired: 0
multiple_linked_checked: 14706
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 0
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 1
striped_shards_scanned: 142443
striped_shards_repaired: 0
striped_shards_failed: 0
striped_shards_skipped: 0
name_hash_repaired: 0
linkea_overflow_cleared: 0
success_count: 5
run_time_phase1: 180 seconds
run_time_phase2: 0 seconds
average_speed_phase1: 13480 items/sec
average_speed_phase2: 0 objs/sec
average_speed_total: 13480 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: completed
flags:
param: all_targets,create_mdtobj
last_completed_time: 1525341840
time_since_last_completed: 390 seconds
latest_start_time: 1525339672
time_since_latest_start: 2558 seconds
last_checkpoint_time: 1525341840
time_since_last_checkpoint: 390 seconds
latest_start_position: 2934356, [0x28001c941:0x9c24:0x0], 0x0
last_checkpoint_position: 35184372088832, N/A, N/A
first_failure_position: N/A, N/A, N/A
checked_phase1: 7946529
checked_phase2: 3312630
updated_phase1: 1
updated_phase2: 1
failed_phase1: 0
failed_phase2: 0
directories: 1414222
dirent_repaired: 0
linkea_repaired: 0
nlinks_repaired: 0
multiple_linked_checked: 52655
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 3
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 1
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 1060672
striped_shards_repaired: 0
striped_shards_failed: 0
striped_shards_skipped: 0
name_hash_repaired: 0
linkea_overflow_cleared: 0
success_count: 6
run_time_phase1: 1488 seconds
run_time_phase2: 931 seconds
average_speed_phase1: 5340 items/sec
average_speed_phase2: 3558 objs/sec
average_speed_total: 4654 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: completed
flags:
param: all_targets,create_mdtobj
last_completed_time: 1525341827
time_since_last_completed: 403 seconds
latest_start_time: 1525339070
time_since_latest_start: 3160 seconds
last_checkpoint_time: 1525341827
time_since_last_checkpoint: 403 seconds
latest_start_position: 2965156, N/A, N/A
last_checkpoint_position: 35184372088832, N/A, N/A
first_failure_position: N/A, N/A, N/A
checked_phase1: 7917065
checked_phase2: 3318839
updated_phase1: 0
updated_phase2: 1
failed_phase1: 0
failed_phase2: 0
directories: 1425028
dirent_repaired: 0
linkea_repaired: 0
nlinks_repaired: 0
multiple_linked_checked: 52856
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 0
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 1
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 1068469
striped_shards_repaired: 0
striped_shards_failed: 0
striped_shards_skipped: 0
name_hash_repaired: 0
linkea_overflow_cleared: 0
success_count: 6
run_time_phase1: 2016 seconds
run_time_phase2: 919 seconds
average_speed_phase1: 3927 items/sec
average_speed_phase2: 3611 objs/sec
average_speed_total: 3828 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;MDT0 still needs to be re-scanned I think though &apos;cos it was &apos;stopped&apos; as part of the crashing stuff in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10887&quot; title=&quot;2 MDTs stuck in WAITING&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10887&quot;&gt;&lt;del&gt;LU-10887&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I&apos;ll start a whole new scan again now...&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;lctl lfsck_start -M dagg-MDT0000 -t namespace -A -r -C
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="227177" author="yong.fan" created="Thu, 3 May 2018 10:40:00 +0000"  >&lt;p&gt;&quot;stopped&quot; is not the expected status. Please re-run the LFSCK with LFSCK debug enabled (lctl set_param debug+=lfsck), then if it failed again, we can analysis the debug logs.&lt;/p&gt;</comment>
                            <comment id="227178" author="scadmin" created="Thu, 3 May 2018 11:08:18 +0000"  >&lt;p&gt;yes, I know &apos;stopped&apos; is not correct. these 3 MDTs have been &apos;stopped&apos;, and &apos;crashed&apos;, &apos;crashed&apos; respectively for a week now, and mounted with -o skip_lfsck (for obvious reasons).&lt;/p&gt;

&lt;p&gt;now that lfsck is fixed, the &apos;crashed&apos; ones continued automatically as soon as they were mounted, and then finished. the &apos;stopped&apos; one (MDT0) remained stopped, as that has been its state for a week or more.&lt;/p&gt;

&lt;p&gt;I have now re-run the whole namespace lfsck again, and output is now this:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[warble1]root:   lctl get_param -n mdd.dagg-MDT000*.lfsck_namespace
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: completed
flags:
param: all_targets,create_mdtobj
last_completed_time: 1525344435
time_since_last_completed: 963 seconds
latest_start_time: 1525342746
time_since_latest_start: 2652 seconds
last_checkpoint_time: 1525344435
time_since_last_checkpoint: 963 seconds
latest_start_position: 266, N/A, N/A
last_checkpoint_position: 35184372088832, N/A, N/A
first_failure_position: N/A, N/A, N/A
checked_phase1: 8060227
checked_phase2: 3437356
updated_phase1: 0
updated_phase2: 0
failed_phase1: 0
failed_phase2: 0
directories: 1393898
dirent_repaired: 0
linkea_repaired: 0
nlinks_repaired: 0
multiple_linked_checked: 53149
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 0
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 1036368
striped_shards_repaired: 1
striped_shards_failed: 0
striped_shards_skipped: 2
name_hash_repaired: 0
linkea_overflow_cleared: 0
success_count: 6
run_time_phase1: 1039 seconds
run_time_phase2: 650 seconds
average_speed_phase1: 7757 items/sec
average_speed_phase2: 5288 objs/sec
average_speed_total: 6807 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: completed
flags:
param: all_targets,create_mdtobj
last_completed_time: 1525344401
time_since_last_completed: 997 seconds
latest_start_time: 1525342746
time_since_latest_start: 2652 seconds
last_checkpoint_time: 1525344401
time_since_last_checkpoint: 997 seconds
latest_start_position: 266, N/A, N/A
last_checkpoint_position: 35184372088832, N/A, N/A
first_failure_position: N/A, N/A, N/A
checked_phase1: 7743558
checked_phase2: 3250035
updated_phase1: 0
updated_phase2: 0
failed_phase1: 0
failed_phase2: 0
directories: 1381748
dirent_repaired: 0
linkea_repaired: 0
nlinks_repaired: 0
multiple_linked_checked: 52656
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 0
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 1036341
striped_shards_repaired: 0
striped_shards_failed: 0
striped_shards_skipped: 2
name_hash_repaired: 0
linkea_overflow_cleared: 0
success_count: 7
run_time_phase1: 638 seconds
run_time_phase2: 616 seconds
average_speed_phase1: 12137 items/sec
average_speed_phase2: 5276 objs/sec
average_speed_total: 8766 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: completed
flags:
param: all_targets,create_mdtobj
last_completed_time: 1525344398
time_since_last_completed: 1000 seconds
latest_start_time: 1525342746
time_since_latest_start: 2652 seconds
last_checkpoint_time: 1525344398
time_since_last_checkpoint: 1000 seconds
latest_start_position: 266, N/A, N/A
last_checkpoint_position: 35184372088832, N/A, N/A
first_failure_position: N/A, N/A, N/A
checked_phase1: 7746926
checked_phase2: 3246941
updated_phase1: 2
updated_phase2: 0
failed_phase1: 0
failed_phase2: 0
directories: 1382139
dirent_repaired: 0
linkea_repaired: 0
nlinks_repaired: 0
multiple_linked_checked: 52862
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 0
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 1036341
striped_shards_repaired: 1
striped_shards_failed: 0
striped_shards_skipped: 2
name_hash_repaired: 2
linkea_overflow_cleared: 0
success_count: 7
run_time_phase1: 641 seconds
run_time_phase2: 613 seconds
average_speed_phase1: 12085 items/sec
average_speed_phase2: 5296 objs/sec
average_speed_total: 8767 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;look ok to you?&lt;/p&gt;

&lt;p&gt;anything else we should be doing to repair the damage from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10877&quot; title=&quot;dt_locate_at reference leak&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10877&quot;&gt;&lt;del&gt;LU-10877&lt;/del&gt;&lt;/a&gt; or &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10677&quot; title=&quot;can&amp;#39;t delete directory&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10677&quot;&gt;LU-10677&lt;/a&gt;?&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="227180" author="yong.fan" created="Thu, 3 May 2018 11:26:47 +0000"  >&lt;blockquote&gt;
&lt;p&gt;striped_shards_repaired: 1&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;That means some corrupted striped directory has been fixed. And the status become &quot;completed&quot;, that is expected.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;striped_shards_skipped: 2&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;That means that the namespace LFSCK could not confirm whether related shards are inconsistent or not because without enough evidences.&lt;/p&gt;

&lt;p&gt;Generally, the system is in healthy status. As for those uncertain items, if they are corrupted, then we can consider to fix them when we really hit trouble in the future.&lt;/p&gt;</comment>
                            <comment id="227762" author="gerrit" created="Sat, 12 May 2018 03:53:19 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/32245/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/32245/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10988&quot; title=&quot;LBUG in lfsck&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10988&quot;&gt;&lt;del&gt;LU-10988&lt;/del&gt;&lt;/a&gt; lfsck: load object attr when prepare LFSCK request&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: f7c354096a810df0a9333dace2f538d6dfbe486f&lt;/p&gt;</comment>
                            <comment id="227774" author="pjones" created="Sat, 12 May 2018 05:49:29 +0000"  >&lt;p&gt;Landed for 2.12&lt;/p&gt;</comment>
                            <comment id="228446" author="gerrit" created="Wed, 23 May 2018 16:17:38 +0000"  >&lt;p&gt;Minh Diep (minh.diep@intel.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/32522&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/32522&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10988&quot; title=&quot;LBUG in lfsck&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10988&quot;&gt;&lt;del&gt;LU-10988&lt;/del&gt;&lt;/a&gt; lfsck: load object attr when prepare LFSCK request&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_10&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 6b3b1a51fe759f769f86f8ff968290754a467ff1&lt;/p&gt;</comment>
                            <comment id="229440" author="gerrit" created="Mon, 11 Jun 2018 22:16:07 +0000"  >&lt;p&gt;John L. Hammond (john.hammond@intel.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/32522/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/32522/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10988&quot; title=&quot;LBUG in lfsck&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10988&quot;&gt;&lt;del&gt;LU-10988&lt;/del&gt;&lt;/a&gt; lfsck: load object attr when prepare LFSCK request&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_10&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 21d33c11abdb2908fa5fa43bb942a97615c6f7a2&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="51707">LU-10887</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzwov:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>