[LU-9410] on-disk bitmap corrupted - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: Lustre 2.10.1, Lustre 2.11.0
Affects Version/s: Lustre 2.7.0
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

We had 2 OSS and 3 different OST crash with bitmap corrupted messages.

Apr  3 18:38:16 nbp1-oss6 kernel: LDISKFS-fs error (device dm-42): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 245659corrupted: 32768 blocks free in bitmap, 0 - in gd
Apr  3 18:38:16 nbp1-oss6 kernel: 
Apr  3 18:38:16 nbp1-oss6 kernel: Aborting journal on device dm-3.
Apr  3 18:38:16 nbp1-oss6 kernel: LDISKFS-fs (dm-42): Remounting filesystem read-only
Apr  3 18:38:16 nbp1-oss6 kernel: LDISKFS-fs error (device dm-42): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 245660corrupted: 32768 blocks free in bitmap, 0 - in gd

These errors were on 2 different backend RAID devices. Note worthy items:
1 .The filesystem was +90% full and 1/2 of the data was deleted.
2. OSTs are formatted with " -E packed_meta_blocks=1 "

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

bt.2017-07-26-02.48.00
765 kB
26/Jul/17 8:25 PM
bt.2017-07-26-12.08.43
808 kB
26/Jul/17 8:25 PM
foreach.out
736 kB
26/Jul/17 4:00 AM
mballoc.c
145 kB
12/Aug/17 2:25 AM
ost258.dumpe2fs.after.fsck.gz
34.46 MB
11/Aug/17 5:27 PM
ost258.dumpe2fs.after.readonly.gz
34.44 MB
11/Aug/17 5:27 PM
syslog.gp270808.error.gz
13.37 MB
15/Aug/17 2:37 AM
vmcore-dmesg.txt
512 kB
26/Jul/17 4:00 AM

Issue Links

duplicates

LU-1026 ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 23828 corrupted

Resolved

LU-7114 ldiskfs: corrupted bitmaps handling patches

Resolved

is related to

LU-10837 no bitmap check if block bitmap is uninitialized

Resolved

Activity

[LU-9410] on-disk bitmap corrupted

nasf (Inactive) added a comment - 12/Aug/17 1:37 AM

we appear to be missing https://review.whamcloud.com/#/c/16312/ (https://review.whamcloud.com/#/c/16679/) from 6.8 cent version. Does this cause any issue with debug patches or recommended action?

We are planning to run with 16312 and suggested debug patches

for some reason this patch never made in into the series/6.8

Are we convinced this is a duplicate of ~~LU-7114~~

It is true that we missed such patch, Andreas has pointed it out in the first comment:
https://jira.hpdd.intel.com/browse/LU-9410?focusedCommentId=193803&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-193803

But this patch is mostly used for handling the case after the bitmap corruption happened. It allows the system to go ahead without failure right away, then the users can run e2fsck at the maintain windows. As mhanafi commented:
https://jira.hpdd.intel.com/browse/LU-9410?focusedCommentId=205024&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-205024, it may not help too much for NASA case.

nasf (Inactive) added a comment - 12/Aug/17 1:37 AM we appear to be missing https://review.whamcloud.com/#/c/16312/ ( https://review.whamcloud.com/#/c/16679/ ) from 6.8 cent version. Does this cause any issue with debug patches or recommended action? We are planning to run with 16312 and suggested debug patches for some reason this patch never made in into the series/6.8 Are we convinced this is a duplicate of LU-7114 It is true that we missed such patch, Andreas has pointed it out in the first comment: https://jira.hpdd.intel.com/browse/LU-9410?focusedCommentId=193803&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-193803 But this patch is mostly used for handling the case after the bitmap corruption happened. It allows the system to go ahead without failure right away, then the users can run e2fsck at the maintain windows. As mhanafi commented: https://jira.hpdd.intel.com/browse/LU-9410?focusedCommentId=205024&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-205024 , it may not help too much for NASA case.

nasf (Inactive) added a comment - 12/Aug/17 1:26 AM

Do we should get the debugging without the kernel recompile?

With my patch applied, you need NOT recompile the kernel. In fact, the CONFIG_EXT4_DEBUG is almost redundant since we can control the mb debug via the debug level.
Considering the logs when corruption happened "32768 blocks free in bitmap", it seems that the BG was initialized by logic instead of loading from disk. On some degree, it can explain the in-ram corruption. So I added more debug information in my patch. I think it is worth to try.

nasf (Inactive) added a comment - 12/Aug/17 1:26 AM Do we should get the debugging without the kernel recompile? With my patch applied, you need NOT recompile the kernel. In fact, the CONFIG_EXT4_DEBUG is almost redundant since we can control the mb debug via the debug level. Considering the logs when corruption happened "32768 blocks free in bitmap", it seems that the BG was initialized by logic instead of loading from disk. On some degree, it can explain the in-ram corruption. So I added more debug information in my patch. I think it is worth to try.

Jay Lan (Inactive) added a comment - 12/Aug/17 1:20 AM

The kernel was originally built with CONFIG_EXT4_DEBUG disabled and lustre server rpms were built for that.
Mahmoud told me that he did not see /sys/kernel/debug/ldiskfs option, then I rebuilt the kernel and the lustre.

So, we have both available for testing. If CONFIG_EXT4_DEBUG not needed in the kernel, how we can enable the mb-debug?

Jay Lan (Inactive) added a comment - 12/Aug/17 1:20 AM The kernel was originally built with CONFIG_EXT4_DEBUG disabled and lustre server rpms were built for that. Mahmoud told me that he did not see /sys/kernel/debug/ldiskfs option, then I rebuilt the kernel and the lustre. So, we have both available for testing. If CONFIG_EXT4_DEBUG not needed in the kernel, how we can enable the mb-debug?

Mahmoud Hanafi added a comment - 12/Aug/17 1:16 AM

Do we should get the debugging without the kernel recompile?

Mahmoud Hanafi added a comment - 12/Aug/17 1:16 AM Do we should get the debugging without the kernel recompile?

nasf (Inactive) added a comment - 12/Aug/17 1:13 AM

Kernel was rebuilt with CONFIG_EXT4_DEBUG on.

Sorry, in my test, the CONFIG_EXT4_DEBUG is disabled by default, so I thought NASA may disable it by default also. Then to avoid rebuilding kernel, I made patch to remove the conditional compile for mb debug. On the other hand, I added more debug for initializing BG block bitmap case.

nasf (Inactive) added a comment - 12/Aug/17 1:13 AM Kernel was rebuilt with CONFIG_EXT4_DEBUG on. Sorry, in my test, the CONFIG_EXT4_DEBUG is disabled by default, so I thought NASA may disable it by default also. Then to avoid rebuilding kernel, I made patch to remove the conditional compile for mb debug. On the other hand, I added more debug for initializing BG block bitmap case.

nasf (Inactive) added a comment - 12/Aug/17 1:08 AM

After the kernel and lustre rebuild/install we don't see /sys/kernel/debug/ldiskfs option.

What is the output:

find /proc /sys -name mballoc-debug

nasf (Inactive) added a comment - 12/Aug/17 1:08 AM After the kernel and lustre rebuild/install we don't see /sys/kernel/debug/ldiskfs option. What is the output: find /proc /sys -name mballoc-debug

Jay Lan (Inactive) added a comment - 12/Aug/17 1:05 AM

Kernel was rebuilt with CONFIG_EXT4_DEBUG on.

Jay Lan (Inactive) added a comment - 12/Aug/17 1:05 AM Kernel was rebuilt with CONFIG_EXT4_DEBUG on.

Mahmoud Hanafi added a comment - 12/Aug/17 1:03 AM - edited

After the kernel and lustre rebuild/install we don't see /sys/kernel/debug/ldiskfs option.

nbp2-oss1 /boot # cat config-2.6.32-642.15.1.el6.20170609.x86_64.lustre273 |grep CONFIG_EXT4_DEBUG
CONFIG_EXT4_DEBUG=y
nbp2-oss1 /boot # ls -l /sys/kernel/debug/
total 0

Mahmoud Hanafi added a comment - 12/Aug/17 1:03 AM - edited After the kernel and lustre rebuild/install we don't see /sys/kernel/debug/ldiskfs option. nbp2-oss1 /boot # cat config-2.6.32-642.15.1.el6.20170609.x86_64.lustre273 |grep CONFIG_EXT4_DEBUG CONFIG_EXT4_DEBUG=y nbp2-oss1 /boot # ls -l /sys/kernel/debug/ total 0

Bob Ciotti (Inactive) added a comment - 11/Aug/17 8:22 PM - edited

we appear to be missing https://review.whamcloud.com/#/c/16312/ (https://review.whamcloud.com/#/c/16679/) from 6.8 cent version. Does this cause any issue with debug patches or recommended action?

We are planning to run with 16312 and suggested debug patches

for some reason this patch never made in into the series/6.8

Are we convinced this is a duplicate of ~~LU-7114~~

Bob Ciotti (Inactive) added a comment - 11/Aug/17 8:22 PM - edited we appear to be missing https://review.whamcloud.com/#/c/16312/ ( https://review.whamcloud.com/#/c/16679/ ) from 6.8 cent version. Does this cause any issue with debug patches or recommended action? We are planning to run with 16312 and suggested debug patches for some reason this patch never made in into the series/6.8 Are we convinced this is a duplicate of LU-7114

nasf (Inactive) added a comment - 11/Aug/17 6:21 PM

1. Most of the OSTs that hit this bug have Flex block group size=64 vs others set to Flex block group size=256. The back end raid is set for 1MB stripe size. (8 data disk with 128MB per disk stripe). And we pack all metadata blocks in the front of the LUN. Could this be a factor?

I am not sure for this.

2. Does fsck in fact check for bitmap corruption on disk? if we don't see it fixing anything does that confirm that these are in memory corruptions?

The e2fsck will verify the free blocks/inodes in the bitmap with the value recorded in the group descriptor.

3. If these are in memory corruption can we get a debug patch that will re-read from disk before marking the bitmap as bad.

4. Can you provide any other debug patch to help narrow the root cause?

I made a debug patch (https://review.whamcloud.com/28489) with mb debug enable, please apply it with the former patch (https://review.whamcloud.com/#/c/28249/) together. Please NOTE: the mb debug switch is under /sys/kernel/debug/ldiskfs/mballoc-debug on the server node. It is disabled by default with the value 0. Please set it as '1' before you mount up the Lustre device.

Thanks!

nasf (Inactive) added a comment - 11/Aug/17 6:21 PM 1. Most of the OSTs that hit this bug have Flex block group size=64 vs others set to Flex block group size=256. The back end raid is set for 1MB stripe size. (8 data disk with 128MB per disk stripe). And we pack all metadata blocks in the front of the LUN. Could this be a factor? I am not sure for this. 2. Does fsck in fact check for bitmap corruption on disk? if we don't see it fixing anything does that confirm that these are in memory corruptions? The e2fsck will verify the free blocks/inodes in the bitmap with the value recorded in the group descriptor. 3. If these are in memory corruption can we get a debug patch that will re-read from disk before marking the bitmap as bad. 4. Can you provide any other debug patch to help narrow the root cause? I made a debug patch ( https://review.whamcloud.com/28489 ) with mb debug enable, please apply it with the former patch ( https://review.whamcloud.com/#/c/28249/ ) together. Please NOTE: the mb debug switch is under /sys/kernel/debug/ldiskfs/mballoc-debug on the server node. It is disabled by default with the value 0. Please set it as '1' before you mount up the Lustre device. Thanks!

Gerrit Updater added a comment - 11/Aug/17 6:13 PM

Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/28489
Subject: ~~LU-9410~~ ldiskfs: enable mb debug
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d4e2f024d91f375085c24906313b7bc522464c20

Gerrit Updater added a comment - 11/Aug/17 6:13 PM Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/28489 Subject: LU-9410 ldiskfs: enable mb debug Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: d4e2f024d91f375085c24906313b7bc522464c20

People

Assignee:: nasf (Inactive)

Reporter:: Mahmoud Hanafi

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 27/Apr/17 1:40 AM

Updated:: 22/Mar/18 5:22 PM

Resolved:: 28/Aug/17 7:05 AM