Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3668

ldiskfs_check_descriptors: Block bitmap for group not in group

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Critical
    • None
    • Lustre 2.1.6
    • 3
    • 9453

    Description

      Our $SCRATCH file system is down and we are unable to mount an OST due to corrupted group descriptors reported.

      Symptoms:

      (1) cannot mount as normal lustre fs
      (2) also cannot mount as ldiskfs
      (3) e2fsck reports alarming number of issues

      Scenario:

      The OST is a RAID6 (8+2) config with external journals. At 18:06 yesterday, MD raid detected a disk error, evicted the failed disk, and started rebuilding the device with a hot spare. Before the rebuild finished, ldiskfs reported the error below and the device went read-only.

      Jul 29 22:16:40 oss28 kernel: [547129.288298] LDISKFS-fs error (device md14): ld
      iskfs_lookup: deleted inode referenced: 2463495
      Jul 29 22:16:40 oss28 kernel: [547129.298723] Aborting journal on device md24.
      Jul 29 22:16:40 oss28 kernel: [547129.304211] LustreError: 17212:0:(obd.h:1615:o
      bd_transno_commit_cb()) scratch-OST0124: transno 176013176 commit error: 2
      Jul 29 22:16:40 oss28 kernel: [547129.316134] LustreError: 17212:0:(obd.h:1615:o
      bd_transno_commit_cb()) scratch-OST0124: transno 176013175 commit error: 2
      Jul 29 22:16:40 oss28 kernel: [547129.316136] LDISKFS-fs error (device md14): ld
      iskfs_journal_start_sb: Detected aborted journal
      Jul 29 22:16:40 oss28 kernel: [547129.316139] LDISKFS-fs (md14): Remounting file
      system read-only

      Host was rebooted at 6am and have been unable to mount since. Would appreciate some suggestions on the best approach to try and recover with e2fsck, journal rebuilding, etc to recover this OST.

      I will follow up with output from e2fsck -f -n which is running now (attempting to use backup superblock). Typical entries look as follows:

      e2fsck 1.42.7.wc1 (12-Apr-2013)
      Inode table for group 3536 is not in group. (block 103079215118)
      WARNING: SEVERE DATA LOSS POSSIBLE.
      Relocate? no

      Block bitmap for group 3538 is not in group. (block 107524506255360)
      Relocate? no

      Inode bitmap for group 3538 is not in group. (block 18446612162378989568)
      Relocate? no

      Inode table for group 3539 is not in group. (block 3439182177370112)
      WARNING: SEVERE DATA LOSS POSSIBLE.
      Relocate? no

      Block bitmap for group 3541 is not in group. (block 138784755704397824)
      Relocate? no

      Inode table for group 3542 is not in group. (block 7138029487521792000)
      WARNING: SEVERE DATA LOSS POSSIBLE.
      Relocate? no

      Block bitmap for group 3544 is not in group. (block 180388626432)
      Relocate? no

      Inode table for group 3545 is not in group. (block 25769803776)
      WARNING: SEVERE DATA LOSS POSSIBLE.
      Relocate? no

      Block bitmap for group 3547 is not in group. (block 346054104973312)
      Relocate? no

      Inode 503 has compression flag set on filesystem without compression support. \
      Clear? no

      Inode 503 has INDEX_FL flag set but is not a directory.
      Clear HTree index? no

      HTREE directory inode 503 has an invalid root node.
      Clear HTree index? no

      HTREE directory inode 503 has an unsupported hash version (40)
      Clear HTree index? no

      HTREE directory inode 503 uses an incompatible htree root node flag.
      Clear HTree index? no

      HTREE directory inode 503 has a tree depth (16) which is too big
      Clear HTree index? no

      Inode 503, i_blocks is 842359139, should be 0. Fix? no

      Inode 504 is in use, but has dtime set. Fix? no

      Inode 504 has imagic flag set. Clear? no

      Inode 504 has a extra size (25649) which is invalid
      Fix? no

      Inode 504 has INDEX_FL flag set but is not a directory.
      Clear HTree index? no

      Inode 562 has INDEX_FL flag set but is not a directory.
      Clear HTree index? no

      HTREE directory inode 562 has an invalid root node.
      Clear HTree index? no

      HTREE directory inode 562 has an unsupported hash version (51)
      Clear HTree index? no

      HTREE directory inode 562 has a tree depth (59) which is too big
      Clear HTree index? no

      Inode 562, i_blocks is 828596838, should be 0. Fix? no

      Inode 563 is in use, but has dtime set. Fix? no

      Inode 563 has imagic flag set. Clear? no

      Inode 563 has a extra size (12387) which is invalid
      Fix? no

      lock #623050609 (3039575950) causes file to be too big. IGNORED.
      Block #623050610 (3038656474) causes file to be too big. IGNORED.
      Block #623050611 (3037435566) causes file to be too big. IGNORED.
      Block #623050612 (3035215768) causes file to be too big. IGNORED.
      Block #623050613 (3031785159) causes file to be too big. IGNORED.
      Block #623050614 (3027736066) causes file to be too big. IGNORED.
      Block #623050615 (3019627313) causes file to be too big. IGNORED.
      Block #623050616 (2970766533) causes file to be too big. IGNORED.
      Block #623050617 (871157932) causes file to be too big. IGNORED.
      Block #623050618 (879167937) causes file to be too big. IGNORED.
      Block #623050619 (883249763) causes file to be too big. IGNORED.
      Block #623050620 (885943218) causes file to be too big. IGNORED.
      Too many illegal blocks in inode 1618.
      Clear inode? no

      Suppress messages? no

      Attachments

        Issue Links

          Activity

            People

              adilger Andreas Dilger
              koomie Karl W Schulz (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: