<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:44:42 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-11534] OST group desc corruption following forced panic, OST fails to star</title>
                <link>https://jira.whamcloud.com/browse/LU-11534</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;During failover/failback testing one OST corrupt following a force crash.&lt;/p&gt;

&lt;p&gt;n005 panic was forced ~01:54 on 22Sep (via sysrq-trigger). On failover to n004, OST failed to mount due to block desc corruption. From kern log:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
...Sep 22 01:58:53 snx11205n004 kernel: LDISKFS-fs (dm-1): file extents enabled, maximum tree depth=5Sep 22 01:58:56 snx11205n004 kernel: LustreError: 137-5: snx11205-OST0001_UUID: not available &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; connect from 257@gni (no target). If you are running an HA pair check that the target is mounted on the other server.Sep 22 01:58:56 snx11205n004 kernel: LustreError: Skipped 7 previous similar messagesSep 22 01:58:59 snx11205n004 kernel: IEC: 026001001: GROUP DISCRIPTORS CORRUPTED: { &lt;span class=&quot;code-quote&quot;&gt;&quot;device&quot;&lt;/span&gt;: &lt;span class=&quot;code-quote&quot;&gt;&quot;dm-1&quot;&lt;/span&gt;, &lt;span class=&quot;code-quote&quot;&gt;&quot;data&quot;&lt;/span&gt;: &lt;span class=&quot;code-quote&quot;&gt;&quot;(2017!=28757)&quot;&lt;/span&gt; }Sep 22 01:58:59 snx11205n004 kernel: LDISKFS-fs (dm-1): ldiskfs_check_descriptors: Checksum &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; group 153506 failed (2017!=28757)Sep 22 01:58:59 snx11205n004 kernel: LDISKFS-fs (dm-1): group descriptors corrupted!Sep 22 01:58:59 snx11205n004 kernel: IEC: 026001001: GROUP DISCRIPTORS CORRUPTED: { &lt;span class=&quot;code-quote&quot;&gt;&quot;device&quot;&lt;/span&gt;: &lt;span class=&quot;code-quote&quot;&gt;&quot;dm-1&quot;&lt;/span&gt;, &lt;span class=&quot;code-quote&quot;&gt;&quot;data&quot;&lt;/span&gt;: &quot;&quot; }Sep 22 01:58:59 snx11205n004 kernel: LustreError: 2435:0:(osd_handler.c:7295:osd_mount()) snx11205-OST0001-osd: can&apos;t mount /dev/mapper/nytroxd-md-uuid-ce5ae0d0:e14179f2:9771b3a4:c58bbc65: -22Sep 22 01:58:59 snx11205n004 kernel: LustreError: 2435:0:(obd_config.c:559:class_setup()) setup snx11205-OST0001-osd failed (-22)Sep 22 01:58:59 snx11205n004 kernel: LustreError: 2435:0:(obd_mount.c:202:lustre_start_simple()) snx11205-OST0001-osd setup error -22Sep 22 01:58:59 snx11205n004 kernel: LustreError: 2435:0:(obd_mount_server.c:1902:server_fill_super()) Unable to start osd on /dev/mapper/nytroxd-md-uuid-ce5ae0d0:e14179f2:9771b3a4:c58bbc65: -22Sep 22 01:58:59 snx11205n004 kernel: LustreError: 2435:0:(obd_mount.c:1583:lustre_fill_super()) Unable to mount  (-22)... &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;from dumpe2fs:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
Group 153506: (Blocks 5030084608-5030117375) [ITABLE_ZEROED]
  Checksum 0x7055 (EXPECTED 0x07e1), unused inodes 0
  Block bitmap at 5024776354 (bg #153344 + 162), Inode bitmap at 5024776610 (bg #153344 + 418)
  Inode table at 5024778000-5024778007 (bg #153344 + 1808)
  28875 free blocks, 46 free inodes, 0 directories
  Free blocks: 5030084608-5030109183, 5030109207-5030109236, 5030109302-5030109314, 5030109322-5030109344, 5030109353-5030109362, 5030109413-5030109472, 5030109496-5030109509, 5030109540-5030109579, 5030109587-5030109599, 5030109616-5030109623, 5030109652-5030109675, 5030109686-5030109733, 5030109784-5030109803, 5030109832-5030109858, 5030109884-5030109917, 5030109921, 5030109944-5030109983, 5030109988-5030110000, 5030110102-5030110153, 5030110211-5030110224, 5030110268-5030110281, 5030110300-5030110304, 5030110350-5030110357, 5030110458-5030110482, 5030110575-5030110590, 5030110632-5030110686, 5030110716-5030110764, 5030110812-5030110832, 5030110855-5030110871, 5030110907-5030110914, 5030110930-5030110958, 5030110984-5030111008, 5030111045-5030111085, 5030111133-5030111178, 5030111197-5030111223, 5030111273-5030111326, 5030111344-5030111367, 5030111397-5030111445, 5030111457-5030111493, 5030111517-5030111589, 5030111707-5030111789, 5030111794-5030111807, 5030111818-5030111822, 5030111848-5030111854, 5030111869-5030111886, 5030111915-5030111918, 5030111935-5030111970, 5030112107-5030112124, 5030112137-5030112197, 5030112208-5030112241, 5030112281-5030112381, 5030112397-5030112411, 5030112416-5030112438, 5030112471-5030112514, 5030112564-5030112579, 5030112584-5030112646, 5030112691-5030112749, 5030112778-5030112808, 5030112828-5030112849, 5030112866-5030112868, 5030112951-5030112968, 5030113001-5030113015, 5030113059-5030113132, 5030113148-5030113174, 5030113177-5030113355, 5030113370-5030113396, 5030113481-5030113496, 5030113525-5030113534, 5030113564-5030113577, 5030113655-5030113667, 5030113674-5030113706, 5030113766-5030113803, 5030113840-5030113881, 5030113928-5030113954, 5030114046-5030114052, 5030114057-5030114071, 5030114088-5030114100, 5030114133-5030114176, 5030114221-5030114233, 5030114265-5030114309, 5030114335-5030114347, 5030114442-5030114465, 5030114513-5030114527, 5030114558-5030114569, 5030114618-5030114626, 5030114653-5030114657, 5030114670-5030114736, 5030114742-5030114752, 5030114769-5030114793, 5030114809-5030114863, 5030114876-5030114898, 5030114910-5030114936, 5030114953-5030114971, 5030115002-5030115032, 5030115040-5030115070, 5030115098-5030115128, 5030115144-5030115195, 5030115210-5030115272, 5030115285-5030115332, 5030115355-5030115412, 5030115434-5030115550, 5030115553-5030115586, 5030115615, 5030115627-5030115677, 5030115680-5030115689, 5030115701-5030115720, 5030115734-5030115783, 5030115812-5030115816, 5030115836-5030115855, 5030115867-5030115896, 5030115916-5030115949, 5030115982-5030116007, 5030116012-5030116038, 5030116072-5030116084, 5030116136-5030116163, 5030116193-5030116238, 5030116268-5030116289, 5030116315-5030116345, 5030116355-5030116396, 5030116427-5030116479, 5030116483-5030116520, 5030116532-5030116550, 5030116580-5030116692, 5030116719-5030116809, 5030116813-5030116830, 5030116840-5030116848, 5030116876-5030116890, 5030116900-5030116921, 5030116947-5030116954, 5030116988-5030116996, 5030117010-5030117015, 5030117044-5030117058, 5030117063-5030117066, 5030117086-5030117091, 5030117135-5030117248, 5030117261-5030117312, 5030117329-5030117375
  Free inodes: 19648770, 19648772, 19648774, 19648778-19648779, 19648783, 196487
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Ran read-only e2fsck, it reports the group desc checksum problem, plus a few other discrepancies. e2fsck on backup sb does not report the group desc problem, but does report numerous pass 5 problems with free blocks/inodes.&lt;/p&gt;</description>
                <environment></environment>
        <key id="53627">LU-11534</key>
            <summary>OST group desc corruption following forced panic, OST fails to star</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="6" iconUrl="https://jira.whamcloud.com/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="artem_blagodarenko">Artem Blagodarenko</reporter>
                        <labels>
                    </labels>
                <created>Wed, 17 Oct 2018 10:57:23 +0000</created>
                <updated>Fri, 9 Nov 2018 08:31:08 +0000</updated>
                            <resolved>Fri, 9 Nov 2018 08:31:08 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                            <comments>
                            <comment id="235144" author="adilger" created="Fri, 19 Oct 2018 08:36:42 +0000"  >&lt;p&gt;The pass 5 errors when using the backup superblock and descriptors are totally expected, and not considered harmful.  That is just because the backup group descriptors are not kept up-to-date by the kernel, but e2fsck makes an updated copy of the bitmaps and per-group counters as it scans the whole filesystem.&lt;/p&gt;</comment>
                            <comment id="236723" author="artem_blagodarenko" created="Fri, 9 Nov 2018 08:30:47 +0000"  >&lt;p&gt;Now we know that this patch is caused the issue:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
commit de92c8caf16ca84926fa31b7a5590c0fb9c0d5ca
Author: Jan Kara &amp;lt;jack@suse.cz&amp;gt;
Date:   Mon Jun 8 12:46:37 2015 -0400

    jbd2: speedup jbd2_journal_get_[write|undo]_access()
    
    jbd2_journal_get_write_access() and jbd2_journal_get_create_access() are
    frequently called &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; buffers that are already part of the running
    transaction - most frequently it is the &lt;span class=&quot;code-keyword&quot;&gt;case&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; bitmaps, inode table
    blocks, and superblock. Since in such cases we have nothing to &lt;span class=&quot;code-keyword&quot;&gt;do&lt;/span&gt;, it is
    unfortunate we still grab reference to journal head, lock the bh, lock
    bh_state only to find out there&apos;s nothing to &lt;span class=&quot;code-keyword&quot;&gt;do&lt;/span&gt;.
    
    Improving &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; is a bit subtle though since until we find out journal
    head is attached to the running transaction, it can disappear from under
    us because checkpointing / commit decided it&apos;s no longer needed. We deal
    with &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; by protecting journal_head slab with RCU. We still have to be
    careful about journal head being freed &amp;amp; reallocated within slab and
    about exposing journal head in consistent state (in particular
    b_modified and b_frozen_data must be in correct state before we allow
    user to touch the buffer).
    
    Signed-off-by: Jan Kara &amp;lt;jack@suse.cz&amp;gt;
    Signed-off-by: Theodore Ts&apos;o &amp;lt;tytso@mit.edu&amp;gt;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;jbd2_write_access_granted access to journal structures without lock and allow racing in jbd2_journal_get_write_access during copying to frozen buffer.&lt;/p&gt;

&lt;p&gt;This is already fixed in master by this commit.&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
commit 2083ffd1bc6c772972834b50e5aef2118c88658d
Author: Andreas Dilger &amp;lt;andreas.dilger@intel.com&amp;gt;
Date:   Mon Mar 19 01:20:24 2018 +0000

    Revert &lt;span class=&quot;code-quote&quot;&gt;&quot;LU-9796 kernel: improve metadata performaces &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; RHEL7&quot;&lt;/span&gt;
    
    This reverts commit 17fe3c192e101ac due to suspected
    problems hit in some deployments.
    
    Change-Id: I8cb28b4c69f67583356a7e07cf94ba897ffeb6ee
    Signed-off-by: Andreas Dilger &amp;lt;andreas.dilger@intel.com&amp;gt;
    Reviewed-on: https:&lt;span class=&quot;code-comment&quot;&gt;//review.whamcloud.com/31683
&lt;/span&gt;    Reviewed-by: Wang Shilong &amp;lt;wshilong@ddn.com&amp;gt;
    Tested-by: Jenkins
    Reviewed-by: Oleg Drokin &amp;lt;oleg.drokin@intel.com&amp;gt;
    Tested-by: Oleg Drokin &amp;lt;oleg.drokin@intel.com&amp;gt;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="31217" name="e2fsck.ost1.fvntt.201809220723" size="2741" author="artem_blagodarenko" created="Wed, 17 Oct 2018 10:57:05 +0000"/>
                            <attachment id="31216" name="e2fsck.ost1.sb32768.fvn.tt.201809220810.gz" size="1928443" author="artem_blagodarenko" created="Wed, 17 Oct 2018 10:57:09 +0000"/>
                            <attachment id="31215" name="kern.11205.20180921.zip" size="591315" author="artem_blagodarenko" created="Wed, 17 Oct 2018 10:57:08 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i004db:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>