[LU-11534] OST group desc corruption following forced panic, OST fails to star Created: 17/Oct/18  Updated: 09/Nov/18  Resolved: 09/Nov/18

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Artem Blagodarenko (Inactive) Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: None

Attachments: File e2fsck.ost1.fvntt.201809220723     File e2fsck.ost1.sb32768.fvn.tt.201809220810.gz     Zip Archive kern.11205.20180921.zip    
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

During failover/failback testing one OST corrupt following a force crash.

n005 panic was forced ~01:54 on 22Sep (via sysrq-trigger). On failover to n004, OST failed to mount due to block desc corruption. From kern log:

...Sep 22 01:58:53 snx11205n004 kernel: LDISKFS-fs (dm-1): file extents enabled, maximum tree depth=5Sep 22 01:58:56 snx11205n004 kernel: LustreError: 137-5: snx11205-OST0001_UUID: not available for connect from 257@gni (no target). If you are running an HA pair check that the target is mounted on the other server.Sep 22 01:58:56 snx11205n004 kernel: LustreError: Skipped 7 previous similar messagesSep 22 01:58:59 snx11205n004 kernel: IEC: 026001001: GROUP DISCRIPTORS CORRUPTED: { "device": "dm-1", "data": "(2017!=28757)" }Sep 22 01:58:59 snx11205n004 kernel: LDISKFS-fs (dm-1): ldiskfs_check_descriptors: Checksum for group 153506 failed (2017!=28757)Sep 22 01:58:59 snx11205n004 kernel: LDISKFS-fs (dm-1): group descriptors corrupted!Sep 22 01:58:59 snx11205n004 kernel: IEC: 026001001: GROUP DISCRIPTORS CORRUPTED: { "device": "dm-1", "data": "" }Sep 22 01:58:59 snx11205n004 kernel: LustreError: 2435:0:(osd_handler.c:7295:osd_mount()) snx11205-OST0001-osd: can't mount /dev/mapper/nytroxd-md-uuid-ce5ae0d0:e14179f2:9771b3a4:c58bbc65: -22Sep 22 01:58:59 snx11205n004 kernel: LustreError: 2435:0:(obd_config.c:559:class_setup()) setup snx11205-OST0001-osd failed (-22)Sep 22 01:58:59 snx11205n004 kernel: LustreError: 2435:0:(obd_mount.c:202:lustre_start_simple()) snx11205-OST0001-osd setup error -22Sep 22 01:58:59 snx11205n004 kernel: LustreError: 2435:0:(obd_mount_server.c:1902:server_fill_super()) Unable to start osd on /dev/mapper/nytroxd-md-uuid-ce5ae0d0:e14179f2:9771b3a4:c58bbc65: -22Sep 22 01:58:59 snx11205n004 kernel: LustreError: 2435:0:(obd_mount.c:1583:lustre_fill_super()) Unable to mount  (-22)... 

from dumpe2fs:

Group 153506: (Blocks 5030084608-5030117375) [ITABLE_ZEROED]
  Checksum 0x7055 (EXPECTED 0x07e1), unused inodes 0
  Block bitmap at 5024776354 (bg #153344 + 162), Inode bitmap at 5024776610 (bg #153344 + 418)
  Inode table at 5024778000-5024778007 (bg #153344 + 1808)
  28875 free blocks, 46 free inodes, 0 directories
  Free blocks: 5030084608-5030109183, 5030109207-5030109236, 5030109302-5030109314, 5030109322-5030109344, 5030109353-5030109362, 5030109413-5030109472, 5030109496-5030109509, 5030109540-5030109579, 5030109587-5030109599, 5030109616-5030109623, 5030109652-5030109675, 5030109686-5030109733, 5030109784-5030109803, 5030109832-5030109858, 5030109884-5030109917, 5030109921, 5030109944-5030109983, 5030109988-5030110000, 5030110102-5030110153, 5030110211-5030110224, 5030110268-5030110281, 5030110300-5030110304, 5030110350-5030110357, 5030110458-5030110482, 5030110575-5030110590, 5030110632-5030110686, 5030110716-5030110764, 5030110812-5030110832, 5030110855-5030110871, 5030110907-5030110914, 5030110930-5030110958, 5030110984-5030111008, 5030111045-5030111085, 5030111133-5030111178, 5030111197-5030111223, 5030111273-5030111326, 5030111344-5030111367, 5030111397-5030111445, 5030111457-5030111493, 5030111517-5030111589, 5030111707-5030111789, 5030111794-5030111807, 5030111818-5030111822, 5030111848-5030111854, 5030111869-5030111886, 5030111915-5030111918, 5030111935-5030111970, 5030112107-5030112124, 5030112137-5030112197, 5030112208-5030112241, 5030112281-5030112381, 5030112397-5030112411, 5030112416-5030112438, 5030112471-5030112514, 5030112564-5030112579, 5030112584-5030112646, 5030112691-5030112749, 5030112778-5030112808, 5030112828-5030112849, 5030112866-5030112868, 5030112951-5030112968, 5030113001-5030113015, 5030113059-5030113132, 5030113148-5030113174, 5030113177-5030113355, 5030113370-5030113396, 5030113481-5030113496, 5030113525-5030113534, 5030113564-5030113577, 5030113655-5030113667, 5030113674-5030113706, 5030113766-5030113803, 5030113840-5030113881, 5030113928-5030113954, 5030114046-5030114052, 5030114057-5030114071, 5030114088-5030114100, 5030114133-5030114176, 5030114221-5030114233, 5030114265-5030114309, 5030114335-5030114347, 5030114442-5030114465, 5030114513-5030114527, 5030114558-5030114569, 5030114618-5030114626, 5030114653-5030114657, 5030114670-5030114736, 5030114742-5030114752, 5030114769-5030114793, 5030114809-5030114863, 5030114876-5030114898, 5030114910-5030114936, 5030114953-5030114971, 5030115002-5030115032, 5030115040-5030115070, 5030115098-5030115128, 5030115144-5030115195, 5030115210-5030115272, 5030115285-5030115332, 5030115355-5030115412, 5030115434-5030115550, 5030115553-5030115586, 5030115615, 5030115627-5030115677, 5030115680-5030115689, 5030115701-5030115720, 5030115734-5030115783, 5030115812-5030115816, 5030115836-5030115855, 5030115867-5030115896, 5030115916-5030115949, 5030115982-5030116007, 5030116012-5030116038, 5030116072-5030116084, 5030116136-5030116163, 5030116193-5030116238, 5030116268-5030116289, 5030116315-5030116345, 5030116355-5030116396, 5030116427-5030116479, 5030116483-5030116520, 5030116532-5030116550, 5030116580-5030116692, 5030116719-5030116809, 5030116813-5030116830, 5030116840-5030116848, 5030116876-5030116890, 5030116900-5030116921, 5030116947-5030116954, 5030116988-5030116996, 5030117010-5030117015, 5030117044-5030117058, 5030117063-5030117066, 5030117086-5030117091, 5030117135-5030117248, 5030117261-5030117312, 5030117329-5030117375
  Free inodes: 19648770, 19648772, 19648774, 19648778-19648779, 19648783, 196487

Ran read-only e2fsck, it reports the group desc checksum problem, plus a few other discrepancies. e2fsck on backup sb does not report the group desc problem, but does report numerous pass 5 problems with free blocks/inodes.



 Comments   
Comment by Andreas Dilger [ 19/Oct/18 ]

The pass 5 errors when using the backup superblock and descriptors are totally expected, and not considered harmful. That is just because the backup group descriptors are not kept up-to-date by the kernel, but e2fsck makes an updated copy of the bitmaps and per-group counters as it scans the whole filesystem.

Comment by Artem Blagodarenko (Inactive) [ 09/Nov/18 ]

Now we know that this patch is caused the issue:

commit de92c8caf16ca84926fa31b7a5590c0fb9c0d5ca
Author: Jan Kara <jack@suse.cz>
Date:   Mon Jun 8 12:46:37 2015 -0400

    jbd2: speedup jbd2_journal_get_[write|undo]_access()
    
    jbd2_journal_get_write_access() and jbd2_journal_get_create_access() are
    frequently called for buffers that are already part of the running
    transaction - most frequently it is the case for bitmaps, inode table
    blocks, and superblock. Since in such cases we have nothing to do, it is
    unfortunate we still grab reference to journal head, lock the bh, lock
    bh_state only to find out there's nothing to do.
    
    Improving this is a bit subtle though since until we find out journal
    head is attached to the running transaction, it can disappear from under
    us because checkpointing / commit decided it's no longer needed. We deal
    with this by protecting journal_head slab with RCU. We still have to be
    careful about journal head being freed & reallocated within slab and
    about exposing journal head in consistent state (in particular
    b_modified and b_frozen_data must be in correct state before we allow
    user to touch the buffer).
    
    Signed-off-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

jbd2_write_access_granted access to journal structures without lock and allow racing in jbd2_journal_get_write_access during copying to frozen buffer.

This is already fixed in master by this commit.

commit 2083ffd1bc6c772972834b50e5aef2118c88658d
Author: Andreas Dilger <andreas.dilger@intel.com>
Date:   Mon Mar 19 01:20:24 2018 +0000

    Revert "LU-9796 kernel: improve metadata performaces for RHEL7"
    
    This reverts commit 17fe3c192e101ac due to suspected
    problems hit in some deployments.
    
    Change-Id: I8cb28b4c69f67583356a7e07cf94ba897ffeb6ee
    Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
    Reviewed-on: https://review.whamcloud.com/31683
    Reviewed-by: Wang Shilong <wshilong@ddn.com>
    Tested-by: Jenkins
    Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
    Tested-by: Oleg Drokin <oleg.drokin@intel.com>
Generated at Sat Feb 10 02:44:42 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.