<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:43:43 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-11419] lfsck does not complete phase2</title>
                <link>https://jira.whamcloud.com/browse/LU-11419</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;I presume this is related to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11111&quot; title=&quot;crash doing LFSCK: orph_index_insert()) ASSERTION( !(obj-&amp;gt;mod_flags &amp;amp; ORPHAN_OBJ)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11111&quot;&gt;&lt;del&gt;LU-11111&lt;/del&gt;&lt;/a&gt; and &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10888&quot; title=&quot;&amp;#39;lctl abort_recovery&amp;#39; allow aborting recovery between MDTs&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10888&quot;&gt;&lt;del&gt;LU-10888&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;lctl lfsck_start -M dagg-MDT0000 -t namespace -A -n&lt;br/&gt;
completed ok&lt;/p&gt;

&lt;p&gt;lctl lfsck_start -M dagg-MDT0000 -t namespace -A&lt;br/&gt;
completed on mdt1 and mdt2 but stuck on mdt0.&lt;/p&gt;

&lt;p&gt;this is the summary of repairs, and md0 did not progress from here:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[warble2]root: lctl get_param -n mdd.dagg-MDT000*.lfsck_namespace | egrep &apos;status:|repaired|checked_&apos;  | grep -v &apos; 0$&apos;
status: scanning-phase2
checked_phase1: 33226737
checked_phase2: 10901477
dangling_repaired: 28
striped_shards_repaired: 102
name_hash_repaired: 51
status: completed
checked_phase1: 32652269
checked_phase2: 12379442
dangling_repaired: 28
striped_shards_repaired: 125
status: completed
checked_phase1: 32662678
checked_phase2: 12378342
unmatched_pairs_repaired: 1
dangling_repaired: 11
striped_shards_repaired: 96
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;lfsck_namespace was using 100% of a cpu but the checked_phase2 counter wasn&apos;t going up.&lt;br/&gt;
kill -9 on lfsck_namespace didn&apos;t work&lt;br/&gt;
I didn&apos;t try lfsk stop_lfsck this time.&lt;br/&gt;
mdt0 wouldn&apos;t umount. had to reset the MDS.&lt;/p&gt;

&lt;p&gt;I did a sysrq &apos;t&apos; and &apos;w&apos; before resetting the MDS and those start at&lt;br/&gt;
Sep 23 00:18:42&lt;br/&gt;
in the attached messages file.&lt;/p&gt;

&lt;p&gt;hopefully that might help.&lt;br/&gt;
please let us know if there&apos;s something else we can help with.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</description>
                <environment>x86_64, zfs, 3 MDTs, all on 1 MDS, , 2.10.4 + many patches.</environment>
        <key id="53393">LU-11419</key>
            <summary>lfsck does not complete phase2</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="laisiyao">Lai Siyao</assignee>
                                    <reporter username="scadmin">SC Admin</reporter>
                        <labels>
                    </labels>
                <created>Sat, 22 Sep 2018 16:48:26 +0000</created>
                <updated>Mon, 7 Jan 2019 19:47:54 +0000</updated>
                            <resolved>Mon, 29 Oct 2018 16:21:50 +0000</resolved>
                                    <version>Lustre 2.10.4</version>
                                    <fixVersion>Lustre 2.12.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="233906" author="pjones" created="Sun, 23 Sep 2018 07:19:19 +0000"  >&lt;p&gt;Lai&lt;/p&gt;

&lt;p&gt;Could you please assist here?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="233951" author="laisiyao" created="Tue, 25 Sep 2018 08:36:13 +0000"  >&lt;p&gt;This looks to be the same as &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11201&quot; title=&quot;NMI watchdog: BUG: soft lockup in lfsck_namespace&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11201&quot;&gt;&lt;del&gt;LU-11201&lt;/del&gt;&lt;/a&gt;, can you apply patch &lt;a href=&quot;https://review.whamcloud.com/#/c/32958/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/#/c/32958/&lt;/a&gt; and try lfsck again?&lt;/p&gt;</comment>
                            <comment id="234019" author="scadmin" created="Wed, 26 Sep 2018 15:16:12 +0000"  >&lt;p&gt;Hi Lai,&lt;/p&gt;

&lt;p&gt;thanks. applied &lt;a href=&quot;https://review.whamcloud.com/#/c/33078/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/#/c/33078/&lt;/a&gt; from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11201&quot; title=&quot;NMI watchdog: BUG: soft lockup in lfsck_namespace&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11201&quot;&gt;&lt;del&gt;LU-11201&lt;/del&gt;&lt;/a&gt; but no change.&lt;/p&gt;

&lt;p&gt;I left it for about 10 extra hours after phase2 counters on mdt0 stopped incrementing, but nothing changed. as per previous report of this in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11111&quot; title=&quot;crash doing LFSCK: orph_index_insert()) ASSERTION( !(obj-&amp;gt;mod_flags &amp;amp; ORPHAN_OBJ)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11111&quot;&gt;&lt;del&gt;LU-11111&lt;/del&gt;&lt;/a&gt; I couldn&apos;t stop the lfsck and had to reset the MDS.&lt;/p&gt;

&lt;p&gt;this is as far as it got -&amp;gt;&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[warble2]root: lctl get_param -n mdd.dagg-MDT000*.lfsck_namespace | egrep &apos;status:|repaired|checked_|speed&apos;  | grep -v &apos; 0$&apos;
status: scanning-phase2
checked_phase1: 33091005
checked_phase2: 10550536
dangling_repaired: 31
striped_shards_repaired: 28
name_hash_repaired: 18
average_speed_phase1: 874 items/sec
average_speed_phase2: 807 objs/sec
average_speed_total: 857 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: 7 objs/sec
status: completed
checked_phase1: 32500602
checked_phase2: 12505620
dangling_repaired: 29
striped_shards_repaired: 28
name_hash_repaired: 56
average_speed_phase1: 890 items/sec
average_speed_phase2: 1923 objs/sec
average_speed_total: 1046 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
status: completed
checked_phase1: 32512235
checked_phase2: 12504486
linkea_repaired: 1
dangling_repaired: 14
striped_shards_repaired: 28
average_speed_phase1: 896 items/sec
average_speed_phase2: 1923 objs/sec
average_speed_total: 1052 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I&apos;ll attach syslog for today. it includes a couple of &apos;echo t &amp;gt; /proc/sysrq-trigger&apos; in case that helps you work out where lfsck namespace is stuck.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="234047" author="laisiyao" created="Thu, 27 Sep 2018 02:26:35 +0000"  >&lt;p&gt;Can you enable more debug: &apos;lctl set_param debug=&quot;+trace lfsck&quot;&apos; on dagg-MDT0000 when running lfsck, and then collect debug logs? This can help locate what dead loop it may fall into.&lt;/p&gt;</comment>
                            <comment id="234123" author="gerrit" created="Sat, 29 Sep 2018 04:02:24 +0000"  >&lt;p&gt;Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/33252&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/33252&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11419&quot; title=&quot;lfsck does not complete phase2&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11419&quot;&gt;&lt;del&gt;LU-11419&lt;/del&gt;&lt;/a&gt; lfsck: lfsck_namespace_shrink_linkea() dead loop&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 22503a1db2c3b6a7f3a12829ee2484ea95a25913&lt;/p&gt;</comment>
                            <comment id="234124" author="laisiyao" created="Sat, 29 Sep 2018 04:04:27 +0000"  >&lt;p&gt;Hi Robin, I just uploaded a patch, you can wait for it to pass autotest, and then apply on your system and test again.&lt;/p&gt;</comment>
                            <comment id="235786" author="gerrit" created="Mon, 29 Oct 2018 16:02:19 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/33252/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/33252/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11419&quot; title=&quot;lfsck does not complete phase2&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11419&quot;&gt;&lt;del&gt;LU-11419&lt;/del&gt;&lt;/a&gt; lfsck: lfsck_namespace_shrink_linkea() dead loop&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 20a603c42ecc1a5c6f1b3d5a0e31b2b323777abb&lt;/p&gt;</comment>
                            <comment id="235804" author="pjones" created="Mon, 29 Oct 2018 16:21:50 +0000"  >&lt;p&gt;Landed for 2.12&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="52888">LU-11201</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="52628">LU-11111</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="51708">LU-10888</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="31087" name="messages-grep-vslurm.txt.gz" size="787000" author="scadmin" created="Sat, 22 Sep 2018 16:43:50 +0000"/>
                            <attachment id="31100" name="messages-warble2.txt.gz" size="124619" author="scadmin" created="Wed, 26 Sep 2018 15:16:25 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i002xb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>