Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11534

OST group desc corruption following forced panic, OST fails to star

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      During failover/failback testing one OST corrupt following a force crash.

      n005 panic was forced ~01:54 on 22Sep (via sysrq-trigger). On failover to n004, OST failed to mount due to block desc corruption. From kern log:

      ...Sep 22 01:58:53 snx11205n004 kernel: LDISKFS-fs (dm-1): file extents enabled, maximum tree depth=5Sep 22 01:58:56 snx11205n004 kernel: LustreError: 137-5: snx11205-OST0001_UUID: not available for connect from 257@gni (no target). If you are running an HA pair check that the target is mounted on the other server.Sep 22 01:58:56 snx11205n004 kernel: LustreError: Skipped 7 previous similar messagesSep 22 01:58:59 snx11205n004 kernel: IEC: 026001001: GROUP DISCRIPTORS CORRUPTED: { "device": "dm-1", "data": "(2017!=28757)" }Sep 22 01:58:59 snx11205n004 kernel: LDISKFS-fs (dm-1): ldiskfs_check_descriptors: Checksum for group 153506 failed (2017!=28757)Sep 22 01:58:59 snx11205n004 kernel: LDISKFS-fs (dm-1): group descriptors corrupted!Sep 22 01:58:59 snx11205n004 kernel: IEC: 026001001: GROUP DISCRIPTORS CORRUPTED: { "device": "dm-1", "data": "" }Sep 22 01:58:59 snx11205n004 kernel: LustreError: 2435:0:(osd_handler.c:7295:osd_mount()) snx11205-OST0001-osd: can't mount /dev/mapper/nytroxd-md-uuid-ce5ae0d0:e14179f2:9771b3a4:c58bbc65: -22Sep 22 01:58:59 snx11205n004 kernel: LustreError: 2435:0:(obd_config.c:559:class_setup()) setup snx11205-OST0001-osd failed (-22)Sep 22 01:58:59 snx11205n004 kernel: LustreError: 2435:0:(obd_mount.c:202:lustre_start_simple()) snx11205-OST0001-osd setup error -22Sep 22 01:58:59 snx11205n004 kernel: LustreError: 2435:0:(obd_mount_server.c:1902:server_fill_super()) Unable to start osd on /dev/mapper/nytroxd-md-uuid-ce5ae0d0:e14179f2:9771b3a4:c58bbc65: -22Sep 22 01:58:59 snx11205n004 kernel: LustreError: 2435:0:(obd_mount.c:1583:lustre_fill_super()) Unable to mount  (-22)... 

      from dumpe2fs:

      Group 153506: (Blocks 5030084608-5030117375) [ITABLE_ZEROED]
        Checksum 0x7055 (EXPECTED 0x07e1), unused inodes 0
        Block bitmap at 5024776354 (bg #153344 + 162), Inode bitmap at 5024776610 (bg #153344 + 418)
        Inode table at 5024778000-5024778007 (bg #153344 + 1808)
        28875 free blocks, 46 free inodes, 0 directories
        Free blocks: 5030084608-5030109183, 5030109207-5030109236, 5030109302-5030109314, 5030109322-5030109344, 5030109353-5030109362, 5030109413-5030109472, 5030109496-5030109509, 5030109540-5030109579, 5030109587-5030109599, 5030109616-5030109623, 5030109652-5030109675, 5030109686-5030109733, 5030109784-5030109803, 5030109832-5030109858, 5030109884-5030109917, 5030109921, 5030109944-5030109983, 5030109988-5030110000, 5030110102-5030110153, 5030110211-5030110224, 5030110268-5030110281, 5030110300-5030110304, 5030110350-5030110357, 5030110458-5030110482, 5030110575-5030110590, 5030110632-5030110686, 5030110716-5030110764, 5030110812-5030110832, 5030110855-5030110871, 5030110907-5030110914, 5030110930-5030110958, 5030110984-5030111008, 5030111045-5030111085, 5030111133-5030111178, 5030111197-5030111223, 5030111273-5030111326, 5030111344-5030111367, 5030111397-5030111445, 5030111457-5030111493, 5030111517-5030111589, 5030111707-5030111789, 5030111794-5030111807, 5030111818-5030111822, 5030111848-5030111854, 5030111869-5030111886, 5030111915-5030111918, 5030111935-5030111970, 5030112107-5030112124, 5030112137-5030112197, 5030112208-5030112241, 5030112281-5030112381, 5030112397-5030112411, 5030112416-5030112438, 5030112471-5030112514, 5030112564-5030112579, 5030112584-5030112646, 5030112691-5030112749, 5030112778-5030112808, 5030112828-5030112849, 5030112866-5030112868, 5030112951-5030112968, 5030113001-5030113015, 5030113059-5030113132, 5030113148-5030113174, 5030113177-5030113355, 5030113370-5030113396, 5030113481-5030113496, 5030113525-5030113534, 5030113564-5030113577, 5030113655-5030113667, 5030113674-5030113706, 5030113766-5030113803, 5030113840-5030113881, 5030113928-5030113954, 5030114046-5030114052, 5030114057-5030114071, 5030114088-5030114100, 5030114133-5030114176, 5030114221-5030114233, 5030114265-5030114309, 5030114335-5030114347, 5030114442-5030114465, 5030114513-5030114527, 5030114558-5030114569, 5030114618-5030114626, 5030114653-5030114657, 5030114670-5030114736, 5030114742-5030114752, 5030114769-5030114793, 5030114809-5030114863, 5030114876-5030114898, 5030114910-5030114936, 5030114953-5030114971, 5030115002-5030115032, 5030115040-5030115070, 5030115098-5030115128, 5030115144-5030115195, 5030115210-5030115272, 5030115285-5030115332, 5030115355-5030115412, 5030115434-5030115550, 5030115553-5030115586, 5030115615, 5030115627-5030115677, 5030115680-5030115689, 5030115701-5030115720, 5030115734-5030115783, 5030115812-5030115816, 5030115836-5030115855, 5030115867-5030115896, 5030115916-5030115949, 5030115982-5030116007, 5030116012-5030116038, 5030116072-5030116084, 5030116136-5030116163, 5030116193-5030116238, 5030116268-5030116289, 5030116315-5030116345, 5030116355-5030116396, 5030116427-5030116479, 5030116483-5030116520, 5030116532-5030116550, 5030116580-5030116692, 5030116719-5030116809, 5030116813-5030116830, 5030116840-5030116848, 5030116876-5030116890, 5030116900-5030116921, 5030116947-5030116954, 5030116988-5030116996, 5030117010-5030117015, 5030117044-5030117058, 5030117063-5030117066, 5030117086-5030117091, 5030117135-5030117248, 5030117261-5030117312, 5030117329-5030117375
        Free inodes: 19648770, 19648772, 19648774, 19648778-19648779, 19648783, 196487
      

      Ran read-only e2fsck, it reports the group desc checksum problem, plus a few other discrepancies. e2fsck on backup sb does not report the group desc problem, but does report numerous pass 5 problems with free blocks/inodes.

      Attachments

        Activity

          People

            wc-triage WC Triage
            artem_blagodarenko Artem Blagodarenko (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: