[LU-9410] on-disk bitmap corrupted Created: 27/Apr/17 Updated: 22/Mar/18 Resolved: 28/Aug/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | Lustre 2.10.1, Lustre 2.11.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Mahmoud Hanafi | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
We had 2 OSS and 3 different OST crash with bitmap corrupted messages. Apr 3 18:38:16 nbp1-oss6 kernel: LDISKFS-fs error (device dm-42): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 245659corrupted: 32768 blocks free in bitmap, 0 - in gd Apr 3 18:38:16 nbp1-oss6 kernel: Apr 3 18:38:16 nbp1-oss6 kernel: Aborting journal on device dm-3. Apr 3 18:38:16 nbp1-oss6 kernel: LDISKFS-fs (dm-42): Remounting filesystem read-only Apr 3 18:38:16 nbp1-oss6 kernel: LDISKFS-fs error (device dm-42): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 245660corrupted: 32768 blocks free in bitmap, 0 - in gd These errors were on 2 different backend RAID devices. Note worthy items: |
| Comments |
| Comment by Andreas Dilger [ 27/Apr/17 ] |
|
There is a patch https://review.whamcloud.com/16679 " |
| Comment by Mahmoud Hanafi [ 16/May/17 ] |
|
We hit this this issue again. We are trying to determine root cause and eliminate Intel CAS as a possible source. Is FSCK expected to detected and fix these types of errors? |
| Comment by Mahmoud Hanafi [ 17/May/17 ] |
|
Some background. We been running with 2.7 on all our OSS for sometime and haven't see this error. A few months ago we expanded 3 OSSes with additional 12 OSTs bring the total to 24 osts per OSS. These are the OSS that have hit this issue. This has occurred on different back-end raids and no errors are logged on the raids. Typically the errors is seen during high load on the OSS. We need a way to debug and find the root cause of the issue. We are open to installing a debug patch. After the last crash the OST was not scanned with fsck. If these errors are real corruption on the disk would an fsck find them? Please advise. |
| Comment by Peter Jones [ 18/May/17 ] |
|
Fan Yong Could you please advise here? Thanks Peter |
| Comment by Mahmoud Hanafi [ 18/May/17 ] |
|
in ldiskfs_mb_init_cache between the time the bitmap is read and then checked in ldiskfs_mb_generate_from_pa isn't possible for the bitmaps to changed. |
| Comment by nasf (Inactive) [ 24/May/17 ] |
|
We have ever hit similar troubles with similar message as following: ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group xxx corrupted: mmmm blocks free in bitmap, nnnn - in gd For almost all of these cases, 'mmmm' > 'nnnn'. That means the bitmap contains more available blocks than the statistic in the group desc. One suspected point is related with ldiskfs_ext_walk_space. Would you please to try the patch https://review.whamcloud.com/#/c/21603/ from |
| Comment by Mahmoud Hanafi [ 24/Jul/17 ] |
|
We hit this bug with the patch from
|
| Comment by nasf (Inactive) [ 25/Jul/17 ] |
|
Have you ever run e2fsck after the bitmap corruption? What does the e2fsck report? |
| Comment by Mahmoud Hanafi [ 25/Jul/17 ] |
|
We had 2 crashes and we ran e2fsck both time. It didn't find any thing.
|
| Comment by nasf (Inactive) [ 25/Jul/17 ] |
|
Then what is your kernel version? |
| Comment by Mahmoud Hanafi [ 25/Jul/17 ] |
|
we have hit it with 2.6.32-642.15.1.el6 and 2.6.32-642.13.1.el6 We just recovered from a crash and fsck show nothing. (e2fsck -fp) |
| Comment by nasf (Inactive) [ 25/Jul/17 ] |
|
It seems some in-RAM data corruption. We have hit similar trouble at other customer sites. Currently, we suspect that it is some kernel issue and may be fixed via the kernel patch "Addresses-Google-Bug: 2828254". One of our partners is verifying that whether the trouble can be fixed via such patch. |
| Comment by Mahmoud Hanafi [ 25/Jul/17 ] |
|
Do we have the patch ported to 2.7?
|
| Comment by nasf (Inactive) [ 25/Jul/17 ] |
|
It is EXT4 itself patch, we are still verifying it. |
| Comment by Mahmoud Hanafi [ 25/Jul/17 ] |
|
We had a crash were running fsck did find lots of wrong free block counts. Free blocks count wrong for group #326001 (0, counted=32768). Fix<y>? yes Free blocks count wrong for group #326002 (0, counted=32768). Fix<y>? yes Free blocks count wrong for group #326003 (0, counted=32768). Fix<y>? yes Free blocks count wrong for group #326004 (0, counted=32768). |
| Comment by nasf (Inactive) [ 25/Jul/17 ] |
|
Then the EXT4 patch may not cover your case, have to fix it via e2fsck. |
| Comment by Mahmoud Hanafi [ 25/Jul/17 ] |
|
have you ported the patch to centos6.x yet. We are going to try to port the patch over to centos6 but if you have done so it would save us the work.
|
| Comment by nasf (Inactive) [ 26/Jul/17 ] |
|
You mean the EXT4 patch "Addresses-Google-Bug: 2828254", right? We have not ported such patch yet, we are not 100% sure that it is really useful for fixing our current in-RAM data corruption, still in verifying. But as you described in the comment https://jira.hpdd.intel.com/browse/LU-9410?focusedCommentId=203483&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-203483, means there is real on-disk data corruption. I am afraid that only porting such patch may be not enough for your case. |
| Comment by Mahmoud Hanafi [ 26/Jul/17 ] |
|
What about a debug patch? We are hitting this every few hours on our production file system. |
| Comment by nasf (Inactive) [ 26/Jul/17 ] |
|
I just got the latest feedback from our partner minutes ago, they still hit the in-RAM bitmap corruption after applying such EXT4 patch. So we have to make new investigation. Sorry for that. |
| Comment by nasf (Inactive) [ 26/Jul/17 ] |
|
Would you please to describe what operations you did that may trigger your trouble few hour ago? Please upload the latest logs for the corruption. My understand is that the corruption still occurred on the new expended OSTs, right? |
| Comment by Mahmoud Hanafi [ 26/Jul/17 ] |
|
The filesystem has 20 OSS and 360 OSTs. We have been seen bitmap corruption and remount to ro every few hours on different osts and OSSs. 9 of 10 time fsck doesn't report any errors. What type of logs would be helpfull. We set are mountoption to error=panic so we can get a crash dump right away.
|
| Comment by nasf (Inactive) [ 26/Jul/17 ] |
|
Both the crash dump and /var/log/messages may be helpful. You have mentioned that you did not hit the trouble before you expend your OSS, my understand is that the data corruption only happened on the new OSTs, not on the old ones, is it true? |
| Comment by Mahmoud Hanafi [ 26/Jul/17 ] |
|
We have 2 filesystem that have been expanded and we are only seeing this on them. I need to double check to make sure we never had the corruption on the old osts. Since the new osts are empty they have higher utilization. I will upload crash dump. |
| Comment by Mahmoud Hanafi [ 26/Jul/17 ] |
|
It will take sometime to upload the vmcore. But i have attached bt and dmesg. |
| Comment by Mahmoud Hanafi [ 26/Jul/17 ] |
|
crash dumps uploaded. I have attached 2 more backtrace from 2 more crash dumps. We have had the same ost crash for the past 3 times. |
| Comment by Gerrit Updater [ 27/Jul/17 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/28249 |
| Comment by nasf (Inactive) [ 27/Jul/17 ] |
|
Mahmoud, I made a kernel patch (https://review.whamcloud.com/28249) based on el6.8 (2.6.32-642.15.1.el6) to resolve some in-RAM data corruption. Would you please to try such patch? Thanks! |
| Comment by Mahmoud Hanafi [ 27/Jul/17 ] |
|
We deactivate one of the OST that kept crashing and we have been stable for 12hours. We will give the patch a try. |
| Comment by Jay Lan (Inactive) [ 27/Jul/17 ] |
|
I thought nasf wrote: |
| Comment by nasf (Inactive) [ 28/Jul/17 ] |
|
One of our partners has ported the EXT4 patch by themselves, they finally told me that it does not resolve their issue but without showing me their patch. My patch 28249 is NOT the EXT4 patch to be ported, it prevents the bitmap readers accessing the bitmap if without lock the buffer head. It avoids complex logic of the EXT4 patch and may cover more corner cases. So I hope NASA site can try the patch. The side-effect of the patch 28249 is that it may affect the performance a bit, but very very little. |
| Comment by Jay Lan (Inactive) [ 28/Jul/17 ] |
|
Thanks, nasf! We will give it a try. |
| Comment by Mahmoud Hanafi [ 09/Aug/17 ] |
|
The patch provided did not help with the bitmap errors! Did the crash dump provide any helpful info? |
| Comment by nasf (Inactive) [ 10/Aug/17 ] |
|
What is the storage (hardware vendor) you are using? |
| Comment by Mahmoud Hanafi [ 10/Aug/17 ] |
|
netapp E5500 When we run fsck it sometimes will fix quota stuff but no bitmap. |
| Comment by Mahmoud Hanafi [ 10/Aug/17 ] |
|
One thing I noticed during recover on a client that was active on the ost got this error [190767.968385] LustreError: 3655:0:(import.c:1261:ptlrpc_connect_interpret()) nbp2-OST0157_UUID went back in time (transno 17179894486 was previously committed, server now claims 12899642596)! See https://bugzilla.lustre.org/show_bug.cgi?id=9646 |
| Comment by nasf (Inactive) [ 10/Aug/17 ] |
|
Mahmoud, I was told that your system may crash every few hours. I assume that they are similar bitmap corruption as the you described in the ticket summary, right?
And, when you run e2fsck after the corruption, the e2fsck will NOT report bitmap issues, instead, sometimes, nothing inconsistency found, sometimes, quota things repaired, right? Have I missed anything else for your current trouble? |
| Comment by Mahmoud Hanafi [ 10/Aug/17 ] |
|
That is correct. Most of the time an fsck will report something like this nbp2-OST0118: nbp2-OST0118 contains a file system with errors, check forced. nbp2-OST0118: nbp2-OST0118: 776296/74698752 files (20.6% non-contiguous), 2193846359/19122880512 blocks
|
| Comment by Mahmoud Hanafi [ 10/Aug/17 ] |
|
uploading lustre-log.1502322177.17192.txt.gz to ftp site /uploads/ This was taken at the time of OST0157 hitting bitmap error at 16:37:49 PDT
|
| Comment by Shuichi Ihara (Inactive) [ 10/Aug/17 ] |
|
please make sure ldiskfs/kernel_patches/series/ldiskfs-2.6-rhel6.8.series really contains rhel6.6/ext4-corrupted-inode-block-bitmaps-handling-patches.patch. I think you might be missing that patch for RHEL6.8 kernel. |
| Comment by Mahmoud Hanafi [ 10/Aug/17 ] |
|
@shuichi Ihara, We looked at that patch and decided not to apply it because these don't appear to be real on disk corruptions. |
| Comment by Shuichi Ihara (Inactive) [ 10/Aug/17 ] |
|
Yes, that's true. but, still detect and print corruption and you can fsck at maintance window instead of ReadOnly immediately. |
| Comment by Mahmoud Hanafi [ 10/Aug/17 ] |
|
1. Most of the OSTs that hit this bug have Flex block group size=64 vs others set to Flex block group size=256. The back end raid is set for 1MB stripe size. (8 data disk with 128MB per disk stripe). And we pack all metadata blocks in the front of the LUN. Could this be a factor? 2. Does fsck in fact check for bitmap corruption on disk? if we don't see it fixing anything does that confirm that these are in memory corruptions? 3. If these are in memory corruption can we get a debug patch that will re-read from disk before marking the bitmap as bad. 4. Can you provide any other debug patch to help narrow the root cause? |
| Comment by nasf (Inactive) [ 11/Aug/17 ] |
|
mhanafi, would you please to show me the output: dumpe2fs -f $OST_device Thanks! |
| Comment by Mahmoud Hanafi [ 11/Aug/17 ] |
|
We had a ost go read only today and I was able to gather some very useful info. I dump /proc/fs/ldiskfs/dm-13/mb_groups It had I/O error for block groups that the ost was complaining about. I dump (dumpe2fs) the block group info for the device and it did in fact show blockgroups with 0 free blocks but having free blocks. like this one that triggered the OST to readonly. Group 314421: (Blocks 10302947328-10302980095) [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED] I recheck the blockgroups, using dumpe2fs after the fsck and it had fixed those blocks. I will attach the full dumpe2fs output to the case |
| Comment by Gerrit Updater [ 11/Aug/17 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/28489 |
| Comment by nasf (Inactive) [ 11/Aug/17 ] |
I am not sure for this.
The e2fsck will verify the free blocks/inodes in the bitmap with the value recorded in the group descriptor.
I made a debug patch (https://review.whamcloud.com/28489) with mb debug enable, please apply it with the former patch (https://review.whamcloud.com/#/c/28249/) together. Please NOTE: the mb debug switch is under /sys/kernel/debug/ldiskfs/mballoc-debug on the server node. It is disabled by default with the value 0. Please set it as '1' before you mount up the Lustre device. Thanks! |
| Comment by Bob Ciotti [ 11/Aug/17 ] |
|
we appear to be missing https://review.whamcloud.com/#/c/16312/ (https://review.whamcloud.com/#/c/16679/) from 6.8 cent version. Does this cause any issue with debug patches or recommended action? We are planning to run with 16312 and suggested debug patches for some reason this patch never made in into the series/6.8 Are we convinced this is a duplicate of |
| Comment by Mahmoud Hanafi [ 12/Aug/17 ] |
|
After the kernel and lustre rebuild/install we don't see /sys/kernel/debug/ldiskfs option. nbp2-oss1 /boot # cat config-2.6.32-642.15.1.el6.20170609.x86_64.lustre273 |grep CONFIG_EXT4_DEBUG |
| Comment by Jay Lan (Inactive) [ 12/Aug/17 ] |
|
Kernel was rebuilt with CONFIG_EXT4_DEBUG on. |
| Comment by nasf (Inactive) [ 12/Aug/17 ] |
What is the output: find /proc /sys -name mballoc-debug |
| Comment by nasf (Inactive) [ 12/Aug/17 ] |
Sorry, in my test, the CONFIG_EXT4_DEBUG is disabled by default, so I thought NASA may disable it by default also. Then to avoid rebuilding kernel, I made patch to remove the conditional compile for mb debug. On the other hand, I added more debug for initializing BG block bitmap case. |
| Comment by Mahmoud Hanafi [ 12/Aug/17 ] |
|
Do we should get the debugging without the kernel recompile?
|
| Comment by Jay Lan (Inactive) [ 12/Aug/17 ] |
|
The kernel was originally built with CONFIG_EXT4_DEBUG disabled and lustre server rpms were built for that. So, we have both available for testing. If CONFIG_EXT4_DEBUG not needed in the kernel, how we can enable the mb-debug? |
| Comment by nasf (Inactive) [ 12/Aug/17 ] |
With my patch applied, you need NOT recompile the kernel. In fact, the CONFIG_EXT4_DEBUG is almost redundant since we can control the mb debug via the debug level. |
| Comment by nasf (Inactive) [ 12/Aug/17 ] |
It is true that we missed such patch, Andreas has pointed it out in the first comment: But this patch is mostly used for handling the case after the bitmap corruption happened. It allows the system to go ahead without failure right away, then the users can run e2fsck at the maintain windows. As mhanafi commented: |
| Comment by Jay Lan (Inactive) [ 12/Aug/17 ] |
|
OK, running kernel without CONFIG_EXT4_DEBUG and lustre with your patches, how do we enable debugging if we do not see /sys/kernel/debug/ldiskfs/ ? Please elaborate. Thanks. |
| Comment by nasf (Inactive) [ 12/Aug/17 ] |
|
What is the output with ldiskfs.ko insmod: find /proc /sys -name mballoc-debug |
| Comment by Mahmoud Hanafi [ 12/Aug/17 ] |
|
find /proc /sys -name mballoc-debug has not output |
| Comment by nasf (Inactive) [ 12/Aug/17 ] |
|
Please attach the source file ldiskfs/mballoc.c, you can find it in your compile directory. Thanks! |
| Comment by Jay Lan (Inactive) [ 12/Aug/17 ] |
|
mballoc.c attached. |
| Comment by nasf (Inactive) [ 12/Aug/17 ] |
|
https://review.whamcloud.com/28489 is refreshed, please try again. Thanks! |
| Comment by Mahmoud Hanafi [ 13/Aug/17 ] |
|
So haven't put patch debug 28489 in place but are now running with " ug 12 01:05:43 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 01:05:43 nbp2-oss20 kernel: Aug 12 01:05:43 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790 Aug 12 01:05:43 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 01:05:43 nbp2-oss20 kernel: Aug 12 01:05:43 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790 Aug 12 01:05:44 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 01:05:45 nbp2-oss20 kernel: Aug 12 01:05:45 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790 Aug 12 01:05:45 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 01:05:45 nbp2-oss20 kernel: Aug 12 01:05:45 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790 Aug 12 01:05:46 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 01:05:47 nbp2-oss20 kernel: Aug 12 01:05:47 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790 Aug 12 01:05:47 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 01:05:47 nbp2-oss20 kernel: Aug 12 01:05:47 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790 Aug 12 01:05:49 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 01:05:50 nbp2-oss20 kernel: Aug 12 01:05:50 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790 Aug 12 01:05:50 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 01:05:50 nbp2-oss20 kernel: Aug 12 01:05:50 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790 Aug 12 01:05:53 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 01:05:54 nbp2-oss20 kernel: Aug 12 01:05:54 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790 Aug 12 01:05:54 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 01:05:54 nbp2-oss20 kernel: Aug 12 01:05:54 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790 Aug 12 01:05:59 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 01:05:59 nbp2-oss20 kernel: Aug 12 01:05:59 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790 Aug 12 01:05:59 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 01:05:59 nbp2-oss20 kernel: Aug 12 01:05:59 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790 Aug 12 01:06:05 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 01:06:05 nbp2-oss20 kernel: Aug 12 01:06:05 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790 Aug 12 01:06:05 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 01:06:05 nbp2-oss20 kernel: Aug 12 01:06:05 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790 Aug 12 01:06:12 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd Some time later Aug 12 04:05:12 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 276684 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 04:05:12 nbp2-oss20 kernel: Aug 12 04:05:12 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 276685 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 04:07:56 nbp2-oss20 pcp-pmie[5801]: High 1-minute load average 354load@nbp2-oss20 Aug 12 04:07:56 nbp2-oss20 - in gd Aug 12 04:07:56 nbp2-oss20 kernel: Aug 12 04:07:56 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 304861 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 04:07:56 nbp2-oss20 kernel: Aug 12 04:07:56 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 304862 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 04:07:56 nbp2-oss20 kernel: Aug 12 04:07:56 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 304863 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 04:07:56 nbp2-oss20 kernel: Aug 12 04:07:56 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 304864 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 04:07:56 nbp2-oss20 kernel: ..... It has marked 6727 uniq groups as bad for dm-21(ost319)
|
| Comment by nasf (Inactive) [ 13/Aug/17 ] |
|
|
| Comment by Mahmoud Hanafi [ 14/Aug/17 ] |
|
With the new build are we suppose to have mballoc-debug in /proc or /sys? because the find doesn't find anything.
Never mind I figured this out. We need to mount debugfs for it to show up. |
| Comment by Mahmoud Hanafi [ 15/Aug/17 ] |
|
Got block group debug logs with corruption. Block group is #270808. I will attach full log file to the case. syslog.gp270808.error.gz Aug 14 18:37:14 nbp2-oss20 kernel: (/tmp/rpmbuild-lustre-jlan-PYDDD1xV/BUILD/lustre-2.7.3/ldiskfs/mballoc.c, 1103): ldiskfs_mb_load_buddy: load group 270808 Aug 14 18:37:14 nbp2-oss20 kernel: (/tmp/rpmbuild-lustre-jlan-PYDDD1xV/BUILD/lustre-2.7.3/ldiskfs/mballoc.c, 1032): ldiskfs_mb_init_group: init group 270808 Aug 14 18:37:14 nbp2-oss20 kernel: (/tmp/rpmbuild-lustre-jlan-PYDDD1xV/BUILD/lustre-2.7.3/ldiskfs/balloc.c, 179): ldiskfs_init_block_bitmap: #24877: init the group 270808 of total groups 583584: group_blocks 32768, free_blocks 32768, free_blocks_in_gdp 0, ret 32768 Aug 14 18:37:14 nbp2-oss20 kernel: (/tmp/rpmbuild-lustre-jlan-PYDDD1xV/BUILD/lustre-2.7.3/ldiskfs/mballoc.c, 927): ldiskfs_mb_init_cache: put bitmap for group 270808 in page 541616/0 Aug 14 18:37:14 nbp2-oss20 kernel: on-disk bitmap for group 270808 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 14 18:37:14 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 270808 Aug 14 18:37:14 nbp2-oss20 kernel: (/tmp/rpmbuild-lustre-jlan-PYDDD1xV/BUILD/lustre-2.7.3/ldiskfs/mballoc.c, 1103): ldiskfs_mb_load_buddy: load group 270808 Aug 14 18:37:14 nbp2-oss20 kernel: (/tmp/rpmbuild-lustre-jlan-PYDDD1xV/BUILD/lustre-2.7.3/ldiskfs/mballoc.c, 1032): ldiskfs_mb_init_group: init group 270808 Aug 14 18:37:14 nbp2-oss20 kernel: (/tmp/rpmbuild-lustre-jlan-PYDDD1xV/BUILD/lustre-2.7.3/ldiskfs/mballoc.c, 927): ldiskfs_mb_init_cache: put bitmap for group 270808 in page 541616/0 Aug 14 18:37:14 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 270808 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 14 18:37:14 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 270808 Aug 14 18:37:15 nbp2-oss20 kernel: (/tmp/rpmbuild-lustre-jlan-PYDDD1xV/BUILD/lustre-2.7.3/ldiskfs/mballoc.c, 1103): ldiskfs_mb_load_buddy: load group 270808 Aug 14 18:37:15 nbp2-oss20 kernel: (/tmp/rpmbuild-lustre-jlan-PYDDD1xV/BUILD/lustre-2.7.3/ldiskfs/mballoc.c, 1032): ldiskfs_mb_init_group: init group 270808 Aug 14 18:37:15 nbp2-oss20 kernel: (/tmp/rpmbuild-lustre-jlan-PYDDD1xV/BUILD/lustre-2.7.3/ldiskfs/mballoc.c, 927): ldiskfs_mb_init_cache: put bitmap for group 270808 in page 541616/0 Aug 14 18:37:15 nbp2-oss20 kernel: on-disk bitmap for group 270808 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 14 18:37:15 nbp2-oss20 kernel: Error in loading buddy information for 270808 Aug 14 18:37:15 nbp2-oss20 kernel: (/tmp/rpmbuild-lustre-jlan-PYDDD1xV/BUILD/lustre-2.7.3/ldiskfs/mballoc.c, 1103): ldiskfs_mb_load_buddy: load group 270808 Aug 14 18:37:15 nbp2-oss20 kernel: (/tmp/rpmbuild-lustre-jlan-PYDDD1xV/BUILD/lustre-2.7.3/ldiskfs/mballoc.c, 1032): ldiskfs_mb_init_group: init group 270808 Aug 14 18:37:15 nbp2-oss20 kernel: (/tmp/rpmbuild-lustre-jlan-PYDDD1xV/BUILD/lustre-2.7.3/ldiskfs/mballoc.c, 927): ldiskfs_mb_init_cache: put bitmap for group 270808 in page 541616/0 Aug 14 18:37:15 nbp2-oss20 kernel: on-disk bitmap for group 270808 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 14 18:37:15 nbp2-oss20 kernel: (/tmp/rpmbuild-lustre-jlan-PYDDD1xV/BUILD/lustre-2.7.3/ldiskfs/mballoc.c, 1103): ldiskfs_mb_load_buddy: Error in loading buddy information for 270808 Aug 14 18:37:17 nbp2-oss20 kernel: (/tmp/rpmbuild-lustre-jlan-PYDDD1xV/BUILD/lustre-2.7.3/ldiskfs/mballoc.c, 1103): ldiskfs_mb_load_buddy: load group 270808 Aug 14 18:37:17 nbp2-oss20 kernel: (/tmp/rpmbuild-lustre-jlan-PYDDD1xV/BUILD/lustre-2.7.3/ldiskfs/mballoc.c, 1032): ldiskfs_mb_init_group: init group 270808 Aug 14 18:37:17 nbp2-oss20 kernel: (/tmp/rpmbuild-lustre-jlan-PYDDD1xV/BUILD/lustre-2.7.3/ldiskfs/mballoc.c, 927): ldiskfs_mb_init_cache: put bitmap for group 270808 in page 541616/0
|
| Comment by Gerrit Updater [ 15/Aug/17 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/28550 |
| Comment by nasf (Inactive) [ 15/Aug/17 ] |
The logs shows that the ldiskfs_init_block_bitmap() initialized the bitmap, but the free blocks count in the group descriptor is still zero, that caused the subsequent ldiskfs_mb_check_ondisk_bitmap() failure. Currently, I can not say it is corruption, but more like logic issue. The patch will set the free block count based on the real free bits in the bitmap. It may be not the perfect solution, but we can try whether it can resolve your trouble or not. |
| Comment by Mahmoud Hanafi [ 15/Aug/17 ] |
|
I used systemtap to catch one of these bad groups and dump out the ldiskfs_group_desc struct. mballoc.c:826: first_group: 274007 bg_free_blocks_count_hi: 0 bg_block_bitmap_hi: 0 bg_free_blocks_count_lo: 0
mballoc.c:826:$desc {.bg_block_bitmap_lo=328727, .bg_inode_bitmap_lo=930551, .bg_inode_table_lo=3450424, .bg_free_blocks_count_lo=0, .bg_free_inodes_count_lo=128, .bg_used_dirs_count_lo=0, .bg_flags=7, .bg_reserved=[...], .bg_itable_unused_lo=128, .bg_checksum=55256, .bg_block_bitmap_hi=0, .bg_inode_bitmap_hi=0, .bg_inode_table_hi=0, .bg_free_blocks_count_hi=0, .bg_free_inodes_count_hi=0, .bg_used_dirs_count_hi=0, .bg_itable_unused_hi=0, .bg_reserved2=[...]}
It also seem odd that dumpe2fs can produce different results for unused block groups. Sometimes it will show block_bitmap!=free_blocks and other time it will be ok. --- in ldiskfs_valid_block_bitmap() I don't understand this if (LDISKFS_HAS_INCOMPAT_FEATURE(sb, LDISKFS_FEATURE_INCOMPAT_FLEX_BG)) { /* with FLEX_BG, the inode/block bitmaps and itable * blocks may not be in the group at all * so the bitmap validation will be skipped for those groups * or it has to also read the block group where the bitmaps * are located to verify they are set. */ return 1; } We have flex_bg enabled would this apply to us?
For the OST that are prone to the bitmap errors cat /proc/fs/ldiskfs/dm*/mb_groups will reproduce the errors.
|
| Comment by Mahmoud Hanafi [ 16/Aug/17 ] |
|
Applied the new patch. After a full fsck mounting osts resulted in this many block groups getting corrected. ---------------- service603 ---------------- 4549 dm-33): ---------------- service604 ---------------- 4425 dm-32): ---------------- service606 ---------------- 4658 dm-29): ---------------- service610 ---------------- 4631 dm-33): ---------------- service611 ---------------- 4616 dm-28): ---------------- service616 ---------------- 4652 dm-35): ---------------- service617 ---------------- 4501 dm-21): ---------------- service619 ---------------- 4657 dm-25): We need to rate limit the warnings. |
| Comment by nasf (Inactive) [ 16/Aug/17 ] |
|
mhanafi |
| Comment by Mahmoud Hanafi [ 16/Aug/17 ] |
|
here is part of dmesg. The high rate of messages caused the root drive scsi device to reset. But all but one server recovered. I had to turn down printk log level down to get the last one to recover. LDISKFS-fs warning (device dm-33): ldiskfs_init_block_bitmap: Set free blocks as 32768 for group 262310 LDISKFS-fs warning (device dm-33): ldiskfs_init_block_bitmap: Set free blocks as 32768 for group 262311 LDISKFS-fs warning (device dm-33): ldiskfs_init_block_bitmap: Set free blocks as 32768 for group 262312 LDISKFS-fs warning (device dm-33): ldiskfs_init_block_bitmap: Set free blocks as 32768 for group 262313 LDISKFS-fs warning (device dm-33): ldiskfs_init_block_bitmap: Set free blocks as 32768 for group 262314 LNet: 12178:0:(lib-move.c:1487:lnet_parse_put()) Dropping PUT from 12345-10.149.2.156@o2ib313 portal 28 match 1575300167923792 offset 0 length 520: 4 LNet: 12178:0:(lib-move.c:1487:lnet_parse_put()) Skipped 978380 previous similar messages sd 0:0:1:0: attempting task abort! scmd(ffff880af433e0c0) sd 0:0:1:0: [sdb] CDB: Write(10): 2a 00 00 a0 08 08 00 00 08 00 scsi target0:0:1: handle(0x000a), sas_address(0x4433221102000000), phy(2) scsi target0:0:1: enclosure_logical_id(0x50030480198f7e01), slot(2) scsi target0:0:1: enclosure level(0x0000),connector name( ^C) sd 0:0:1:0: task abort: SUCCESS scmd(ffff880af433e0c0) sd 0:0:1:0: attempting task abort! scmd(ffff880a64ab46c0) sd 0:0:1:0: [sdb] CDB: Write(10): 2a 00 00 e0 08 08 00 00 08 00 scsi target0:0:1: handle(0x000a), sas_address(0x4433221102000000), phy(2) scsi target0:0:1: enclosure_logical_id(0x50030480198f7e01), slot(2) scsi target0:0:1: enclosure level(0x0000),connector name( ^C) sd 0:0:1:0: task abort: SUCCESS scmd(ffff880a64ab46c0) sd 0:0:1:0: attempting task abort! scmd(ffff880b21cec180) sd 0:0:1:0: [sdb] CDB: Write(10): 2a 00 00 c0 08 08 00 00 08 00 scsi target0:0:1: handle(0x000a), sas_address(0x4433221102000000), phy(2) DISKFS-fs (dm-23): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-34): mounted filesystem with ordered data mode. quota=on. Opts: mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-29): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-18): mounted filesystem with ordered data mode. quota=on. Opts: Lustre: nbp2-OST0081: Not available for connect from 10.151.43.107@o2ib (not set up) Lustre: Skipped 3 previous similar messages Lustre: nbp2-OST0081: Not available for connect from 10.151.29.130@o2ib (not set up) Lustre: Skipped 113 previous similar messages Lustre: nbp2-OST0081: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450 Lustre: nbp2-OST0081: Will be in recovery for at least 2:30, or until 14441 clients reconnect Lustre: nbp2-OST0081: Denying connection for new client 35b99837-9505-fc4d-270f-f2d1ca30372d (at 10.151.30.176@o2ib), waiting for all 14441 known clients (44 recovered, 1 in progress, and 0 evicted) to recover in 5:10 Here is /var/log/messages Aug 11 17:58:25 nbp2-oss10 kernel: LNet: 12075:0:(lib-move.c:1487:lnet_parse_put()) Dropping PUT from 12345-10.151.30.120@o2ib portal 28 match 1575477031778096 offset 0 length 520: 4 Aug 11 17:58:25 nbp2-oss10 kernel: LNet: 12075:0:(lib-move.c:1487:lnet_parse_put()) Skipped 1037319 previous similar messages Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-30): Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-28): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-31): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-18): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-21): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-19): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-22): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-20): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-26): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-33): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-23): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-32): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:40 nbp2-oss10 kernel: LDISKFS-fs (dm-34): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:40 nbp2-oss10 kernel: LDISKFS-fs (dm-24): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:40 nbp2-oss10 kernel: LDISKFS-fs (dm-25): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:40 nbp2-oss10 kernel: Aug 11 18:03:41 nbp2-oss10 kernel: LDISKFS-fs (dm-29): Aug 11 18:03:41 nbp2-oss10 kernel: LDISKFS-fs (dm-35): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:41 nbp2-oss10 kernel: mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:49 nbp2-oss10 kernel: LDISKFS-fs (dm-27): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:50 nbp2-oss10 kernel: LustreError: 137-5: nbp2-OST0009_UUID: not available for connect from 10.151.50.143@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 18:03:50 nbp2-oss10 kernel: LustreError: Skipped 314 previous similar messages Aug 11 18:03:51 nbp2-oss10 kernel: Lustre: nbp2-OST00d1: Not available for connect from 10.151.9.177@o2ib (not set up) Aug 11 18:03:51 nbp2-oss10 kernel: Lustre: Skipped 11 previous similar messages Aug 11 18:03:51 nbp2-oss10 kernel: LustreError: 137-5: nbp2-OST0009_UUID: not available for connect from 10.151.8.85@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 18:03:51 nbp2-oss10 kernel: LustreError: Skipped 3632 previous similar messages Aug 11 18:03:51 nbp2-oss10 kernel: Lustre: nbp2-OST00d1: Not available for connect from 10.151.50.241@o2ib (not set up) Aug 11 18:03:51 nbp2-oss10 kernel: Lustre: Skipped 180 previous similar messages Aug 11 18:03:52 nbp2-oss10 kernel: LustreError: 137-5: nbp2-OST0135_UUID: not available for connect from 10.151.48.113@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 18:03:52 nbp2-oss10 kernel: LustreError: Skipped 6273 previous similar messages Aug 11 18:03:52 nbp2-oss10 kernel: Lustre: nbp2-OST00d1: Not available for connect from 10.151.7.158@o2ib (not set up) Aug 11 18:03:52 nbp2-oss10 kernel: Lustre: Skipped 402 previous similar messages Aug 11 18:03:52 nbp2-oss10 kernel: Lustre: nbp2-OST00d1: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450 Aug 11 18:03:52 nbp2-oss10 kernel: Lustre: nbp2-OST00d1: Will be in recovery for at least 2:30, or until 14452 clients reconnect |
| Comment by Gerrit Updater [ 16/Aug/17 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/28566 |
| Comment by nasf (Inactive) [ 17/Aug/17 ] |
|
mhanafi, I have to say that this issue may be related with the improperly bitmap consistency verification in our ldiskfs patch without handling flex_bg case. I made a patch https://review.whamcloud.com/28566 to handle related issues. Would you pleas to try (no need other former patches). Thanks! |
| Comment by Jay Lan (Inactive) [ 17/Aug/17 ] |
|
I did a build with #28566 and #28550 yesterday. For testing purpose, do these two conflict? Never mind. I just did another build with #28550 pulled out. |
| Comment by Mahmoud Hanafi [ 17/Aug/17 ] |
|
The filesystem is stable with the workaround patch (/28489/). Can we run with this patch for sometime without any underlining filesystem issues? Or should we replace it with 28566 ASAP. |
| Comment by nasf (Inactive) [ 18/Aug/17 ] |
|
The patch 28550 will take effect before 28566, so if 28550 is applied, then 28566 is meaningless. But 28550 may do more things than the necessary fixes. I am afraid of some penitential side-effect. The filesystem is stable with the workaround patch (/28489/). Can we run with this patch for sometime without any underlining filesystem issues? Or should we replace it with 28566 ASAP. It is interesting to know that. Because 28489 is just a debug patch, I cannot imagine how it can resolve your issue. It may because your system has jumped over the groups with "BLOCK_UNINIT" flag and zero free blocks in GDP. If it is true, then applying 28566 will not show you more benefit. Since your system is stable running, you can replace the patches with 28566 when it 'corrupted' next time. |
| Comment by Mahmoud Hanafi [ 18/Aug/17 ] |
|
Sorry I typed the patch number. I wanted to say it is stable with 28550.
|
| Comment by nasf (Inactive) [ 18/Aug/17 ] |
Then it is reasonable. As I explained above, 28550 may do more than the necessary fixes. But since it runs stable, you can keep it until next 'corruption'. |
| Comment by Mahmoud Hanafi [ 22/Aug/17 ] |
|
updated: we have applied https://review.whamcloud.com/28566 Friday and the filesystem has been stable. |
| Comment by nasf (Inactive) [ 23/Aug/17 ] |
|
mhanafi Thanks for the update. |
| Comment by Mahmoud Hanafi [ 23/Aug/17 ] |
|
Does this patch require any changes to e2fsck? |
| Comment by nasf (Inactive) [ 23/Aug/17 ] |
|
I think that there may be something can be improved for mke2fs, not e2fsck. |
| Comment by Jay Lan (Inactive) [ 24/Aug/17 ] |
|
Do I need this patch for 2.10.0? |
| Comment by nasf (Inactive) [ 25/Aug/17 ] |
|
Yes, master also needs the patch 28566. |
| Comment by Gerrit Updater [ 28/Aug/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28566/ |
| Comment by Gerrit Updater [ 28/Aug/17 ] |
|
Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/28765 |
| Comment by Gerrit Updater [ 06/Sep/17 ] |
|
John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28765/ |