[LU-3668] ldiskfs_check_descriptors: Block bitmap for group not in group - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Won't Fix
Priority: Critical
Fix Version/s: None
Affects Version/s: Lustre 2.1.6
Labels:
- patch
Environment:

Hide
Stampede: CentOS6

OSS's running whamcloud 2.1.6 distribution:

* kernel-2.6.32-358.11.1.el6_lustre.x86_64
* lustre-2.1.6-2.6.32_358.11.1.el6_lustre.x86_64.x86_64
* lustre-ldiskfs-3.3.0-2.6.32_358.11.1.el6_lustre.x86_64.x86_64
* e2fsprogs-1.42.7.wc1-7.el6.x86_64

Show
Stampede: CentOS6 OSS's running whamcloud 2.1.6 distribution: * kernel-2.6.32-358.11.1.el6_lustre.x86_64 * lustre-2.1.6-2.6.32_358.11.1.el6_lustre.x86_64.x86_64 * lustre-ldiskfs-3.3.0-2.6.32_358.11.1.el6_lustre.x86_64.x86_64 * e2fsprogs-1.42.7.wc1-7.el6.x86_64

Severity:
3
Rank (Obsolete):
9453

Description

Our $SCRATCH file system is down and we are unable to mount an OST due to corrupted group descriptors reported.

Symptoms:

(1) cannot mount as normal lustre fs
(2) also cannot mount as ldiskfs
(3) e2fsck reports alarming number of issues

Scenario:

The OST is a RAID6 (8+2) config with external journals. At 18:06 yesterday, MD raid detected a disk error, evicted the failed disk, and started rebuilding the device with a hot spare. Before the rebuild finished, ldiskfs reported the error below and the device went read-only.

Jul 29 22:16:40 oss28 kernel: [547129.288298] LDISKFS-fs error (device md14): ld
iskfs_lookup: deleted inode referenced: 2463495
Jul 29 22:16:40 oss28 kernel: [547129.298723] Aborting journal on device md24.
Jul 29 22:16:40 oss28 kernel: [547129.304211] LustreError: 17212:0:(obd.h:1615:o
bd_transno_commit_cb()) scratch-OST0124: transno 176013176 commit error: 2
Jul 29 22:16:40 oss28 kernel: [547129.316134] LustreError: 17212:0:(obd.h:1615:o
bd_transno_commit_cb()) scratch-OST0124: transno 176013175 commit error: 2
Jul 29 22:16:40 oss28 kernel: [547129.316136] LDISKFS-fs error (device md14): ld
iskfs_journal_start_sb: Detected aborted journal
Jul 29 22:16:40 oss28 kernel: [547129.316139] LDISKFS-fs (md14): Remounting file
system read-only

Host was rebooted at 6am and have been unable to mount since. Would appreciate some suggestions on the best approach to try and recover with e2fsck, journal rebuilding, etc to recover this OST.

I will follow up with output from e2fsck -f -n which is running now (attempting to use backup superblock). Typical entries look as follows:

e2fsck 1.42.7.wc1 (12-Apr-2013)
Inode table for group 3536 is not in group. (block 103079215118)
WARNING: SEVERE DATA LOSS POSSIBLE.
Relocate? no

Block bitmap for group 3538 is not in group. (block 107524506255360)
Relocate? no

Inode bitmap for group 3538 is not in group. (block 18446612162378989568)
Relocate? no

Inode table for group 3539 is not in group. (block 3439182177370112)
WARNING: SEVERE DATA LOSS POSSIBLE.
Relocate? no

Block bitmap for group 3541 is not in group. (block 138784755704397824)
Relocate? no

Inode table for group 3542 is not in group. (block 7138029487521792000)
WARNING: SEVERE DATA LOSS POSSIBLE.
Relocate? no

Block bitmap for group 3544 is not in group. (block 180388626432)
Relocate? no

Inode table for group 3545 is not in group. (block 25769803776)
WARNING: SEVERE DATA LOSS POSSIBLE.
Relocate? no

Block bitmap for group 3547 is not in group. (block 346054104973312)
Relocate? no

Inode 503 has compression flag set on filesystem without compression support. \
Clear? no

Inode 503 has INDEX_FL flag set but is not a directory.
Clear HTree index? no

HTREE directory inode 503 has an invalid root node.
Clear HTree index? no

HTREE directory inode 503 has an unsupported hash version (40)
Clear HTree index? no

HTREE directory inode 503 uses an incompatible htree root node flag.
Clear HTree index? no

HTREE directory inode 503 has a tree depth (16) which is too big
Clear HTree index? no

Inode 503, i_blocks is 842359139, should be 0. Fix? no

Inode 504 is in use, but has dtime set. Fix? no

Inode 504 has imagic flag set. Clear? no

Inode 504 has a extra size (25649) which is invalid
Fix? no

Inode 504 has INDEX_FL flag set but is not a directory.
Clear HTree index? no

Inode 562 has INDEX_FL flag set but is not a directory.
Clear HTree index? no

HTREE directory inode 562 has an invalid root node.
Clear HTree index? no

HTREE directory inode 562 has an unsupported hash version (51)
Clear HTree index? no

HTREE directory inode 562 has a tree depth (59) which is too big
Clear HTree index? no

Inode 562, i_blocks is 828596838, should be 0. Fix? no

Inode 563 is in use, but has dtime set. Fix? no

Inode 563 has imagic flag set. Clear? no

Inode 563 has a extra size (12387) which is invalid
Fix? no

lock #623050609 (3039575950) causes file to be too big. IGNORED.
Block #623050610 (3038656474) causes file to be too big. IGNORED.
Block #623050611 (3037435566) causes file to be too big. IGNORED.
Block #623050612 (3035215768) causes file to be too big. IGNORED.
Block #623050613 (3031785159) causes file to be too big. IGNORED.
Block #623050614 (3027736066) causes file to be too big. IGNORED.
Block #623050615 (3019627313) causes file to be too big. IGNORED.
Block #623050616 (2970766533) causes file to be too big. IGNORED.
Block #623050617 (871157932) causes file to be too big. IGNORED.
Block #623050618 (879167937) causes file to be too big. IGNORED.
Block #623050619 (883249763) causes file to be too big. IGNORED.
Block #623050620 (885943218) causes file to be too big. IGNORED.
Too many illegal blocks in inode 1618.
Clear inode? no

Suppress messages? no

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

md14_dumpe2fs.tar.gz
2 kB
30/Jul/13 3:23 PM

Issue Links

is related to

LU-14 live replacement of OST

Resolved

Activity

[LU-3668] ldiskfs_check_descriptors: Block bitmap for group not in group

Tommy Minyard (Inactive) added a comment - 30/Jul/13 10:05 PM

One quick update, we stopped the array and restarted it without the spare drive that was added in last night (running with 9 out of 10 of the drives currently). At this point, the e2fsck output looks much better than before (see below). One question from our side, should we just let e2fsck use the default superblock or should we specify one with the -b option? Also, should we be concerned about any of the errors that e2fsck has reported initially, most look like no major issue, except maybe the first one with resize inode not valid? The current e2fsck is not making any changes. Our plan now is to let this run and see how many errors it finds and if not too bad, rerun it with the -p option to make some repairs. We will still need to add back in the 10th drive and let the array rebuild at some point, but right now we just want to make sure we have a valid MD array that will mount without error.

[root@oss28.stampede]# e2fsck -fn -B 4096 /dev/md14
e2fsck 1.42.7.wc1 (12-Apr-2013)
ext2fs_check_desc: Corrupt group descriptor: bad block for block bitmap
e2fsck: Group descriptors look bad... trying backup blocks...
Resize inode not valid. Recreate? no

Pass 1: Checking inodes, blocks, and sizes
Inode 11468804 has an invalid extent node (blk 2936017803, lblk 393)
Clear? no

Inode 11468804, i_blocks is 8264, should be 5024. Fix? no

Inode 11534337 has an invalid extent node (blk 2952816317, lblk 764)
Clear? no

Inode 11534337, i_size is 4292608, should be 3129344. Fix? no

Inode 11534337, i_blocks is 8408, should be 6128. Fix? no

Inode 13092415 has an invalid extent node (blk 3523217944, lblk 0)
Clear? no

Inode 13092415, i_blocks is 544, should be 0. Fix? no

Inode 14291200 has an invalid extent node (blk 3526886078, lblk 0)
Clear? no

Inode 14291200, i_blocks is 2056, should be 0. Fix? no

Tommy Minyard (Inactive) added a comment - 30/Jul/13 10:05 PM One quick update, we stopped the array and restarted it without the spare drive that was added in last night (running with 9 out of 10 of the drives currently). At this point, the e2fsck output looks much better than before (see below). One question from our side, should we just let e2fsck use the default superblock or should we specify one with the -b option? Also, should we be concerned about any of the errors that e2fsck has reported initially, most look like no major issue, except maybe the first one with resize inode not valid? The current e2fsck is not making any changes. Our plan now is to let this run and see how many errors it finds and if not too bad, rerun it with the -p option to make some repairs. We will still need to add back in the 10th drive and let the array rebuild at some point, but right now we just want to make sure we have a valid MD array that will mount without error. [root@oss28.stampede] # e2fsck -fn -B 4096 /dev/md14 e2fsck 1.42.7.wc1 (12-Apr-2013) ext2fs_check_desc: Corrupt group descriptor: bad block for block bitmap e2fsck: Group descriptors look bad... trying backup blocks... Resize inode not valid. Recreate? no Pass 1: Checking inodes, blocks, and sizes Inode 11468804 has an invalid extent node (blk 2936017803, lblk 393) Clear? no Inode 11468804, i_blocks is 8264, should be 5024. Fix? no Inode 11534337 has an invalid extent node (blk 2952816317, lblk 764) Clear? no Inode 11534337, i_size is 4292608, should be 3129344. Fix? no Inode 11534337, i_blocks is 8408, should be 6128. Fix? no Inode 13092415 has an invalid extent node (blk 3523217944, lblk 0) Clear? no Inode 13092415, i_blocks is 544, should be 0. Fix? no Inode 14291200 has an invalid extent node (blk 3526886078, lblk 0) Clear? no Inode 14291200, i_blocks is 2056, should be 0. Fix? no

Andreas Dilger added a comment - 30/Jul/13 9:58 PM

This is the process to modify the /CONFIGS/mountdata file copied from OST0001 for OST0002, on my MythTV Lustre filesystem named "myth". I verified at the end that the generated "md2.bin" file was binary identical to the one that exists on OST0002 already.

# mount -t ldiskfs /dev/vgmyth/lvmythost1 /mnt/tmp # mount other OST as ldiskfs
# xxd /mnt/tmp/CONFIGS/mountdata > /tmp/md1.asc    # save mountdata for reference
# xxd /mnt/tmp/CONFIGS/mountdata > /tmp/md2.asc    # save another one for editing
# vi /tmp/md2.asc                                  # edit 0001 to 0002 in 3 places
# xxd -r /tmp/md2.asc > /tmp/md2.bin               # convert modified one to binary
# xxd -r /tmp/md2.bin > /tmp/md2.asc2              # convert back to ASCII to verify
# diff -u /tmp/md1.asc /tmp/md2.asc2               # compare original and modified
--- /tmp/md1.asc  2013-07-30 15:40:12.201994814 -0600
+++ /tmp/md2.asc  2013-07-30 15:40:48.775245386 -0600
@@ -1,14 +1,14 @@
 0000000: 0100 d01d 0000 0000 0000 0000 0000 0000  ................
-0000010: 0300 0000 0200 0000 0100 0000 0100 0000  ................
+0000010: 0300 0000 0200 0000 0200 0000 0100 0000  ................
 0000020: 6d79 7468 0065 0000 0000 0000 0000 0000  myth.e..........
 0000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
 0000040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
 0000050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
-0000060: 6d79 7468 2d4f 5354 3030 3031 0000 0000  myth-OST0001....
+0000060: 6d79 7468 2d4f 5354 3030 3032 0000 0000  myth-OST0002....
 0000070: 0000 0000 0000 0000 0000 0000 0000 0000  ................
 0000080: 0000 0000 0000 0000 0000 0000 0000 0000  ................
 0000090: 0000 0000 0000 0000 0000 0000 0000 0000  ................
-00000a0: 6d79 7468 2d4f 5354 3030 3031 5f55 5549  myth-OST0001_UUI
+00000a0: 6d79 7468 2d4f 5354 3030 3032 5f55 5549  myth-OST0002_UUI
 00000b0: 4400 0000 0000 0000 0000 0000 0000 0000  D...............
 00000c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
 00000d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................

Peter, it might make sense to allow a mkfs.lustre formatting option to clear the LDD_F_VIRGIN flag so that this binary editing dance isn't needed, and the "new" OST will not try to register with the MGS.

Andreas Dilger added a comment - 30/Jul/13 9:58 PM This is the process to modify the /CONFIGS/mountdata file copied from OST0001 for OST0002, on my MythTV Lustre filesystem named "myth". I verified at the end that the generated "md2.bin" file was binary identical to the one that exists on OST0002 already. # mount -t ldiskfs /dev/vgmyth/lvmythost1 /mnt/tmp # mount other OST as ldiskfs # xxd /mnt/tmp/CONFIGS/mountdata > /tmp/md1.asc # save mountdata for reference # xxd /mnt/tmp/CONFIGS/mountdata > /tmp/md2.asc # save another one for editing # vi /tmp/md2.asc # edit 0001 to 0002 in 3 places # xxd -r /tmp/md2.asc > /tmp/md2.bin # convert modified one to binary # xxd -r /tmp/md2.bin > /tmp/md2.asc2 # convert back to ASCII to verify # diff -u /tmp/md1.asc /tmp/md2.asc2 # compare original and modified --- /tmp/md1.asc 2013-07-30 15:40:12.201994814 -0600 +++ /tmp/md2.asc 2013-07-30 15:40:48.775245386 -0600 @@ -1,14 +1,14 @@ 0000000: 0100 d01d 0000 0000 0000 0000 0000 0000 ................ -0000010: 0300 0000 0200 0000 0100 0000 0100 0000 ................ +0000010: 0300 0000 0200 0000 0200 0000 0100 0000 ................ 0000020: 6d79 7468 0065 0000 0000 0000 0000 0000 myth.e.......... 0000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................ -0000060: 6d79 7468 2d4f 5354 3030 3031 0000 0000 myth-OST0001.... +0000060: 6d79 7468 2d4f 5354 3030 3032 0000 0000 myth-OST0002.... 0000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000080: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000090: 0000 0000 0000 0000 0000 0000 0000 0000 ................ -00000a0: 6d79 7468 2d4f 5354 3030 3031 5f55 5549 myth-OST0001_UUI +00000a0: 6d79 7468 2d4f 5354 3030 3032 5f55 5549 myth-OST0002_UUI 00000b0: 4400 0000 0000 0000 0000 0000 0000 0000 D............... 00000c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ Peter, it might make sense to allow a mkfs.lustre formatting option to clear the LDD_F_VIRGIN flag so that this binary editing dance isn't needed, and the "new" OST will not try to register with the MGS.

Tommy Minyard (Inactive) added a comment - 30/Jul/13 7:31 PM

The OST is currently deactivated in the MDS, one of the first things we did this morning after finding the problem. I have also deactivated it on all client nodes for the cluster to prevent user tasks from hanging when trying to access a file that resides on that OST. I will talk with Karl and we will start testing with read-only assembly of the array to see if we can get it recovered.

Tommy Minyard (Inactive) added a comment - 30/Jul/13 7:31 PM The OST is currently deactivated in the MDS, one of the first things we did this morning after finding the problem. I have also deactivated it on all client nodes for the cluster to prevent user tasks from hanging when trying to access a file that resides on that OST. I will talk with Karl and we will start testing with read-only assembly of the array to see if we can get it recovered.

Andreas Dilger added a comment - 30/Jul/13 7:27 PM

I would also recommend to deactivate this OST on the MDS so that it does not try to modify it if (hopefully) it can be accessed again and is mounted with Lustre again. That would avoid allocating new objects on the OST, and give us some time to figure out what to do next.

Andreas Dilger added a comment - 30/Jul/13 7:27 PM I would also recommend to deactivate this OST on the MDS so that it does not try to modify it if (hopefully) it can be accessed again and is mounted with Lustre again. That would avoid allocating new objects on the OST, and give us some time to figure out what to do next.

Andreas Dilger added a comment - 30/Jul/13 7:23 PM

It might be possible to pull the new disk and run in degraded mode, to see if this allows the filesystem data to be read correctly. It may also be that the MD RAID rebuild has written bad data to the parity blocks by this point, I'm not sure. At this point that is the only thing I can think of that is likely to be able to recover this OST.

Andreas Dilger added a comment - 30/Jul/13 7:23 PM It might be possible to pull the new disk and run in degraded mode, to see if this allows the filesystem data to be read correctly. It may also be that the MD RAID rebuild has written bad data to the parity blocks by this point, I'm not sure. At this point that is the only thing I can think of that is likely to be able to recover this OST.

James Nunez (Inactive) added a comment - 30/Jul/13 7:15 PM

Tommy

We're looking into the problem and formulating next steps.

James Nunez (Inactive) added a comment - 30/Jul/13 7:15 PM Tommy We're looking into the problem and formulating next steps.

Tommy Minyard (Inactive) added a comment - 30/Jul/13 7:03 PM

Thanks for the additional information, Andreas. If possible, could we set up a con-call this afternoon and discuss some options (I think Peter may have been trying to get this organized even though he is on vacation)? At this point, would it be better to go back to the RAID-6 device and try to start from there? We know which disk was the last one added. We can stop the array, start it in read-only mode without the last disk added and see what the array says at that time with e2fsck.

Tommy Minyard (Inactive) added a comment - 30/Jul/13 7:03 PM Thanks for the additional information, Andreas. If possible, could we set up a con-call this afternoon and discuss some options (I think Peter may have been trying to get this organized even though he is on vacation)? At this point, would it be better to go back to the RAID-6 device and try to start from there? We know which disk was the last one added. We can stop the array, start it in read-only mode without the last disk added and see what the array says at that time with e2fsck.

Andreas Dilger added a comment - 30/Jul/13 6:50 PM

In theory there should still be backup superblocks + group descriptors at 3855122432 and 5804752896, which are within the 5860530816-block filesystem.

That said, at this point I'm concerned that the whole OST is corrupted somehow by improper RAID parity reconstruction or similar. For there to be corruption in all of the group descriptors, spread across the whole filesystem implies that even if we were able to manually rebuild the descriptor table from the good blocks in various different groups it is likely that the data will be equally corrupted.

In your most recent e2fsck output (9:16 am) it appears for the primary group descriptor that descriptor block #39 (filesystem block 40) is corrupt (2508 * 64 / 4096 = 2559 * 64 / 4096 = 39 + 1 for the offset of the first GDT in the filesystem). It would be possible to restore this one block from a backup descriptor block (e.g. 39 + 32769=32808), something like:

dd if=/dev/md14 of=/dev/md14 bs=4096 count=1 skip=32808 seek=40 conv=notrunc

This is only really practical to do if there are only one or two corrupt group descriptor blocks. It isn't clear to me if the above error messages are just a snippet of huge swaths of corruption in each group, or if there is only a single bad block in the ~2800 or so group descriptor blocks. In the latter case, there is some hope that the filesystem could at least be partially recovered. If there are many bad group descriptors in every backup it is likely there is an equal amount of corruption of the file data.

Andreas Dilger added a comment - 30/Jul/13 6:50 PM In theory there should still be backup superblocks + group descriptors at 3855122432 and 5804752896, which are within the 5860530816-block filesystem. That said, at this point I'm concerned that the whole OST is corrupted somehow by improper RAID parity reconstruction or similar. For there to be corruption in all of the group descriptors, spread across the whole filesystem implies that even if we were able to manually rebuild the descriptor table from the good blocks in various different groups it is likely that the data will be equally corrupted. In your most recent e2fsck output (9:16 am) it appears for the primary group descriptor that descriptor block #39 (filesystem block 40) is corrupt (2508 * 64 / 4096 = 2559 * 64 / 4096 = 39 + 1 for the offset of the first GDT in the filesystem). It would be possible to restore this one block from a backup descriptor block (e.g. 39 + 32769=32808), something like: dd if=/dev/md14 of=/dev/md14 bs=4096 count=1 skip=32808 seek=40 conv=notrunc This is only really practical to do if there are only one or two corrupt group descriptor blocks. It isn't clear to me if the above error messages are just a snippet of huge swaths of corruption in each group, or if there is only a single bad block in the ~2800 or so group descriptor blocks. In the latter case, there is some hope that the filesystem could at least be partially recovered. If there are many bad group descriptors in every backup it is likely there is an equal amount of corruption of the file data.

Karl W Schulz (Inactive) added a comment - 30/Jul/13 4:33 PM

Just documenting that there does not appear to be any appreciable improvement using alternative superblocks; it always shows "Block bitmap for group <x> is not in group"

I tried the following superblock values:

Primary superblock at 0, Group descriptors at 1-2795
Backup superblock at 32768, Group descriptors at 32769-35563
Backup superblock at 98304, Group descriptors at 98305-101099
Backup superblock at 163840, Group descriptors at 163841-166635
Backup superblock at 229376, Group descriptors at 229377-232171
Backup superblock at 294912, Group descriptors at 294913-297707
Backup superblock at 819200, Group descriptors at 819201-821995
Backup superblock at 884736, Group descriptors at 884737-887531
Backup superblock at 1605632, Group descriptors at 1605633-1608427
Backup superblock at 2654208, Group descriptors at 2654209-2657003
Backup superblock at 4096000, Group descriptors at 4096001-4098795
Backup superblock at 7962624, Group descriptors at 7962625-7965419
Backup superblock at 11239424, Group descriptors at 11239425-11242219
Backup superblock at 20480000, Group descriptors at 20480001-20482795
Backup superblock at 23887872, Group descriptors at 23887873-23890667
Backup superblock at 71663616, Group descriptors at 71663617-71666411
Backup superblock at 78675968, Group descriptors at 78675969-78678763
Backup superblock at 102400000, Group descriptors at 102400001-102402795
Backup superblock at 214990848, Group descriptors at 214990849-214993643
Backup superblock at 512000000, Group descriptors at 512000001-512002795
Backup superblock at 550731776, Group descriptors at 550731777-550734571
Backup superblock at 644972544, Group descriptors at 644972545-644975339
Backup superblock at 1934917632, Group descriptors at 1934917633-1934920427

If I go to the next value of -b 2560000000 it states that the superblock cannot be read.

Karl W Schulz (Inactive) added a comment - 30/Jul/13 4:33 PM Just documenting that there does not appear to be any appreciable improvement using alternative superblocks; it always shows "Block bitmap for group <x> is not in group" I tried the following superblock values: Primary superblock at 0, Group descriptors at 1-2795 Backup superblock at 32768, Group descriptors at 32769-35563 Backup superblock at 98304, Group descriptors at 98305-101099 Backup superblock at 163840, Group descriptors at 163841-166635 Backup superblock at 229376, Group descriptors at 229377-232171 Backup superblock at 294912, Group descriptors at 294913-297707 Backup superblock at 819200, Group descriptors at 819201-821995 Backup superblock at 884736, Group descriptors at 884737-887531 Backup superblock at 1605632, Group descriptors at 1605633-1608427 Backup superblock at 2654208, Group descriptors at 2654209-2657003 Backup superblock at 4096000, Group descriptors at 4096001-4098795 Backup superblock at 7962624, Group descriptors at 7962625-7965419 Backup superblock at 11239424, Group descriptors at 11239425-11242219 Backup superblock at 20480000, Group descriptors at 20480001-20482795 Backup superblock at 23887872, Group descriptors at 23887873-23890667 Backup superblock at 71663616, Group descriptors at 71663617-71666411 Backup superblock at 78675968, Group descriptors at 78675969-78678763 Backup superblock at 102400000, Group descriptors at 102400001-102402795 Backup superblock at 214990848, Group descriptors at 214990849-214993643 Backup superblock at 512000000, Group descriptors at 512000001-512002795 Backup superblock at 550731776, Group descriptors at 550731777-550734571 Backup superblock at 644972544, Group descriptors at 644972545-644975339 Backup superblock at 1934917632, Group descriptors at 1934917633-1934920427 If I go to the next value of -b 2560000000 it states that the superblock cannot be read.

Karl W Schulz (Inactive) added a comment - 30/Jul/13 3:23 PM

Output of dumpe2fs with -B 4096 and alternate values for -b.

Karl W Schulz (Inactive) added a comment - 30/Jul/13 3:23 PM Output of dumpe2fs with -B 4096 and alternate values for -b.

Karl W Schulz (Inactive) added a comment - 30/Jul/13 3:16 PM

Yes, based on a post you made previously, we also tried values of -b = 32768,98304,163840,229376,294912,819200, and 884736. For values smaller than 884736, the first message we saw from fsck is of the form "block bitmap for group <x> is not in group". The snippet of e2fsck output pasted above is with b=884736 and although the bad block bitmap is not the first error detected, it occurs shortly thereafter.

Here is the top of a standard fsck: e2fsck -f -n /dev/md14

head -50 /tmp/fsck.log
e2fsck 1.42.7.wc1 (12-Apr-2013)
ext2fs_check_desc: Corrupt group descriptor: bad block for block bitmap
e2fsck: Group descriptors look bad... trying backup blocks...
Block bitmap for group 2508 is not in group. (block 261993005056)
Relocate? no

Inode table for group 2536 is not in group. (block 261993005056)
WARNING: SEVERE DATA LOSS POSSIBLE.
Relocate? no

Block bitmap for group 2546 is not in group. (block 3456555320082432)
Relocate? no

Inode bitmap for group 2546 is not in group. (block 18446612162378989568)
Relocate? no

Inode table for group 2547 is not in group. (block 3487607933632512)
WARNING: SEVERE DATA LOSS POSSIBLE.
Relocate? no

Block bitmap for group 2549 is not in group. (block 10222520243247382528)
Relocate? no

Inode table for group 2550 is not in group. (block 9007199254740992)
WARNING: SEVERE DATA LOSS POSSIBLE.
Relocate? no

Block bitmap for group 2552 is not in group. (block 30064771072)
Relocate? no

Inode table for group 2553 is not in group. (block 13108240187392)
WARNING: SEVERE DATA LOSS POSSIBLE.
Relocate? no

Block bitmap for group 2555 is not in group. (block 1960356217880576)
Relocate? no

Inode bitmap for group 2555 is not in group. (block 18446612140904153088)
Relocate? no

Inode table for group 2556 is not in group. (block 3456551025115136)
WARNING: SEVERE DATA LOSS POSSIBLE.
Relocate? no

Block bitmap for group 2558 is not in group. (block 1051959948897943552)
Relocate? no

Inode table for group 2559 is not in group. (block 17592186044416)
WARNING: SEVERE DATA LOSS POSSIBLE.
Relocate? no

I stopped at -b=884736, but will try higher values just in case. Also, will upload the requested dumpe2fs output here shortly.

Karl W Schulz (Inactive) added a comment - 30/Jul/13 3:16 PM Yes, based on a post you made previously, we also tried values of -b = 32768,98304,163840,229376,294912,819200, and 884736. For values smaller than 884736, the first message we saw from fsck is of the form "block bitmap for group <x> is not in group". The snippet of e2fsck output pasted above is with b=884736 and although the bad block bitmap is not the first error detected, it occurs shortly thereafter. Here is the top of a standard fsck: e2fsck -f -n /dev/md14 head -50 /tmp/fsck.log e2fsck 1.42.7.wc1 (12-Apr-2013) ext2fs_check_desc: Corrupt group descriptor: bad block for block bitmap e2fsck: Group descriptors look bad... trying backup blocks... Block bitmap for group 2508 is not in group. (block 261993005056) Relocate? no Inode table for group 2536 is not in group. (block 261993005056) WARNING: SEVERE DATA LOSS POSSIBLE. Relocate? no Block bitmap for group 2546 is not in group. (block 3456555320082432) Relocate? no Inode bitmap for group 2546 is not in group. (block 18446612162378989568) Relocate? no Inode table for group 2547 is not in group. (block 3487607933632512) WARNING: SEVERE DATA LOSS POSSIBLE. Relocate? no Block bitmap for group 2549 is not in group. (block 10222520243247382528) Relocate? no Inode table for group 2550 is not in group. (block 9007199254740992) WARNING: SEVERE DATA LOSS POSSIBLE. Relocate? no Block bitmap for group 2552 is not in group. (block 30064771072) Relocate? no Inode table for group 2553 is not in group. (block 13108240187392) WARNING: SEVERE DATA LOSS POSSIBLE. Relocate? no Block bitmap for group 2555 is not in group. (block 1960356217880576) Relocate? no Inode bitmap for group 2555 is not in group. (block 18446612140904153088) Relocate? no Inode table for group 2556 is not in group. (block 3456551025115136) WARNING: SEVERE DATA LOSS POSSIBLE. Relocate? no Block bitmap for group 2558 is not in group. (block 1051959948897943552) Relocate? no Inode table for group 2559 is not in group. (block 17592186044416) WARNING: SEVERE DATA LOSS POSSIBLE. Relocate? no I stopped at -b=884736, but will try higher values just in case. Also, will upload the requested dumpe2fs output here shortly.

People

Assignee:: Andreas Dilger

Reporter:: Karl W Schulz (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 30/Jul/13 1:05 PM

Updated:: 29/Mar/14 12:36 AM

Resolved:: 29/Mar/14 12:36 AM