Loading...

Details

Type: Bug
Resolution: Won't Fix
Priority: Critical
Fix Version/s: None
Affects Version/s: Lustre 2.1.6
Labels:
- patch
Environment:

Hide
Stampede: CentOS6

OSS's running whamcloud 2.1.6 distribution:

* kernel-2.6.32-358.11.1.el6_lustre.x86_64
* lustre-2.1.6-2.6.32_358.11.1.el6_lustre.x86_64.x86_64
* lustre-ldiskfs-3.3.0-2.6.32_358.11.1.el6_lustre.x86_64.x86_64
* e2fsprogs-1.42.7.wc1-7.el6.x86_64

Show
Stampede: CentOS6 OSS's running whamcloud 2.1.6 distribution: * kernel-2.6.32-358.11.1.el6_lustre.x86_64 * lustre-2.1.6-2.6.32_358.11.1.el6_lustre.x86_64.x86_64 * lustre-ldiskfs-3.3.0-2.6.32_358.11.1.el6_lustre.x86_64.x86_64 * e2fsprogs-1.42.7.wc1-7.el6.x86_64

Severity:
3
Rank (Obsolete):
9453

Description

Our $SCRATCH file system is down and we are unable to mount an OST due to corrupted group descriptors reported.

Symptoms:

(1) cannot mount as normal lustre fs
(2) also cannot mount as ldiskfs
(3) e2fsck reports alarming number of issues

Scenario:

The OST is a RAID6 (8+2) config with external journals. At 18:06 yesterday, MD raid detected a disk error, evicted the failed disk, and started rebuilding the device with a hot spare. Before the rebuild finished, ldiskfs reported the error below and the device went read-only.

Jul 29 22:16:40 oss28 kernel: [547129.288298] LDISKFS-fs error (device md14): ld
iskfs_lookup: deleted inode referenced: 2463495
Jul 29 22:16:40 oss28 kernel: [547129.298723] Aborting journal on device md24.
Jul 29 22:16:40 oss28 kernel: [547129.304211] LustreError: 17212:0:(obd.h:1615:o
bd_transno_commit_cb()) scratch-OST0124: transno 176013176 commit error: 2
Jul 29 22:16:40 oss28 kernel: [547129.316134] LustreError: 17212:0:(obd.h:1615:o
bd_transno_commit_cb()) scratch-OST0124: transno 176013175 commit error: 2
Jul 29 22:16:40 oss28 kernel: [547129.316136] LDISKFS-fs error (device md14): ld
iskfs_journal_start_sb: Detected aborted journal
Jul 29 22:16:40 oss28 kernel: [547129.316139] LDISKFS-fs (md14): Remounting file
system read-only

Host was rebooted at 6am and have been unable to mount since. Would appreciate some suggestions on the best approach to try and recover with e2fsck, journal rebuilding, etc to recover this OST.

I will follow up with output from e2fsck -f -n which is running now (attempting to use backup superblock). Typical entries look as follows:

e2fsck 1.42.7.wc1 (12-Apr-2013)
Inode table for group 3536 is not in group. (block 103079215118)
WARNING: SEVERE DATA LOSS POSSIBLE.
Relocate? no

Block bitmap for group 3538 is not in group. (block 107524506255360)
Relocate? no

Inode bitmap for group 3538 is not in group. (block 18446612162378989568)
Relocate? no

Inode table for group 3539 is not in group. (block 3439182177370112)
WARNING: SEVERE DATA LOSS POSSIBLE.
Relocate? no

Block bitmap for group 3541 is not in group. (block 138784755704397824)
Relocate? no

Inode table for group 3542 is not in group. (block 7138029487521792000)
WARNING: SEVERE DATA LOSS POSSIBLE.
Relocate? no

Block bitmap for group 3544 is not in group. (block 180388626432)
Relocate? no

Inode table for group 3545 is not in group. (block 25769803776)
WARNING: SEVERE DATA LOSS POSSIBLE.
Relocate? no

Block bitmap for group 3547 is not in group. (block 346054104973312)
Relocate? no

Inode 503 has compression flag set on filesystem without compression support. \
Clear? no

Inode 503 has INDEX_FL flag set but is not a directory.
Clear HTree index? no

HTREE directory inode 503 has an invalid root node.
Clear HTree index? no

HTREE directory inode 503 has an unsupported hash version (40)
Clear HTree index? no

HTREE directory inode 503 uses an incompatible htree root node flag.
Clear HTree index? no

HTREE directory inode 503 has a tree depth (16) which is too big
Clear HTree index? no

Inode 503, i_blocks is 842359139, should be 0. Fix? no

Inode 504 is in use, but has dtime set. Fix? no

Inode 504 has imagic flag set. Clear? no

Inode 504 has a extra size (25649) which is invalid
Fix? no

Inode 504 has INDEX_FL flag set but is not a directory.
Clear HTree index? no

Inode 562 has INDEX_FL flag set but is not a directory.
Clear HTree index? no

HTREE directory inode 562 has an invalid root node.
Clear HTree index? no

HTREE directory inode 562 has an unsupported hash version (51)
Clear HTree index? no

HTREE directory inode 562 has a tree depth (59) which is too big
Clear HTree index? no

Inode 562, i_blocks is 828596838, should be 0. Fix? no

Inode 563 is in use, but has dtime set. Fix? no

Inode 563 has imagic flag set. Clear? no

Inode 563 has a extra size (12387) which is invalid
Fix? no

lock #623050609 (3039575950) causes file to be too big. IGNORED.
Block #623050610 (3038656474) causes file to be too big. IGNORED.
Block #623050611 (3037435566) causes file to be too big. IGNORED.
Block #623050612 (3035215768) causes file to be too big. IGNORED.
Block #623050613 (3031785159) causes file to be too big. IGNORED.
Block #623050614 (3027736066) causes file to be too big. IGNORED.
Block #623050615 (3019627313) causes file to be too big. IGNORED.
Block #623050616 (2970766533) causes file to be too big. IGNORED.
Block #623050617 (871157932) causes file to be too big. IGNORED.
Block #623050618 (879167937) causes file to be too big. IGNORED.
Block #623050619 (883249763) causes file to be too big. IGNORED.
Block #623050620 (885943218) causes file to be too big. IGNORED.
Too many illegal blocks in inode 1618.
Clear inode? no

Suppress messages? no

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

md14_dumpe2fs.tar.gz
30/Jul/13 3:23 PM
2 kB
Karl W Schulz

Issue Links

is related to

LU-14 live replacement of OST

Resolved

ldiskfs_check_descriptors: Block bitmap for group not in group

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates