Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.7.0
-
None
-
3
-
9223372036854775807
Description
We had 2 OSS and 3 different OST crash with bitmap corrupted messages.
Apr 3 18:38:16 nbp1-oss6 kernel: LDISKFS-fs error (device dm-42): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 245659corrupted: 32768 blocks free in bitmap, 0 - in gd Apr 3 18:38:16 nbp1-oss6 kernel: Apr 3 18:38:16 nbp1-oss6 kernel: Aborting journal on device dm-3. Apr 3 18:38:16 nbp1-oss6 kernel: LDISKFS-fs (dm-42): Remounting filesystem read-only Apr 3 18:38:16 nbp1-oss6 kernel: LDISKFS-fs error (device dm-42): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 245660corrupted: 32768 blocks free in bitmap, 0 - in gd
These errors were on 2 different backend RAID devices. Note worthy items:
1 .The filesystem was +90% full and 1/2 of the data was deleted.
2. OSTs are formatted with " -E packed_meta_blocks=1 "
Attachments
- bt.2017-07-26-02.48.00
- 765 kB
- bt.2017-07-26-12.08.43
- 808 kB
- foreach.out
- 736 kB
- mballoc.c
- 145 kB
- ost258.dumpe2fs.after.fsck.gz
- 34.46 MB
- syslog.gp270808.error.gz
- 13.37 MB
- vmcore-dmesg.txt
- 512 kB
Issue Links
Activity
The patch 28550 will take effect before 28566, so if 28550 is applied, then 28566 is meaningless. But 28550 may do more things than the necessary fixes. I am afraid of some penitential side-effect.
The filesystem is stable with the workaround patch (/28489/). Can we run with this patch for sometime without any underlining filesystem issues? Or should we replace it with 28566 ASAP.
It is interesting to know that. Because 28489 is just a debug patch, I cannot imagine how it can resolve your issue. It may because your system has jumped over the groups with "BLOCK_UNINIT" flag and zero free blocks in GDP. If it is true, then applying 28566 will not show you more benefit. Since your system is stable running, you can replace the patches with 28566 when it 'corrupted' next time.
The filesystem is stable with the workaround patch (/28489/). Can we run with this patch for sometime without any underlining filesystem issues? Or should we replace it with 28566 ASAP.
I did a build with #28566 and #28550 yesterday. For testing purpose, do these two conflict?
I will undo #28550, but if these two do not collide, we can do testing with the builds I did yesterday.
Never mind. I just did another build with #28550 pulled out.
mhanafi, I have to say that this issue may be related with the improperly bitmap consistency verification in our ldiskfs patch without handling flex_bg case. I made a patch https://review.whamcloud.com/28566 to handle related issues. Would you pleas to try (no need other former patches). Thanks!
Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/28566
Subject: LU-9410 ldiskfs: no check mb bitmap if flex_bg enabled
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8332a30959750c603bc572db1fcde8bc92f82a40
here is part of dmesg. The high rate of messages caused the root drive scsi device to reset. But all but one server recovered. I had to turn down printk log level down to get the last one to recover.
LDISKFS-fs warning (device dm-33): ldiskfs_init_block_bitmap: Set free blocks as 32768 for group 262310 LDISKFS-fs warning (device dm-33): ldiskfs_init_block_bitmap: Set free blocks as 32768 for group 262311 LDISKFS-fs warning (device dm-33): ldiskfs_init_block_bitmap: Set free blocks as 32768 for group 262312 LDISKFS-fs warning (device dm-33): ldiskfs_init_block_bitmap: Set free blocks as 32768 for group 262313 LDISKFS-fs warning (device dm-33): ldiskfs_init_block_bitmap: Set free blocks as 32768 for group 262314 LNet: 12178:0:(lib-move.c:1487:lnet_parse_put()) Dropping PUT from 12345-10.149.2.156@o2ib313 portal 28 match 1575300167923792 offset 0 length 520: 4 LNet: 12178:0:(lib-move.c:1487:lnet_parse_put()) Skipped 978380 previous similar messages sd 0:0:1:0: attempting task abort! scmd(ffff880af433e0c0) sd 0:0:1:0: [sdb] CDB: Write(10): 2a 00 00 a0 08 08 00 00 08 00 scsi target0:0:1: handle(0x000a), sas_address(0x4433221102000000), phy(2) scsi target0:0:1: enclosure_logical_id(0x50030480198f7e01), slot(2) scsi target0:0:1: enclosure level(0x0000),connector name( ^C) sd 0:0:1:0: task abort: SUCCESS scmd(ffff880af433e0c0) sd 0:0:1:0: attempting task abort! scmd(ffff880a64ab46c0) sd 0:0:1:0: [sdb] CDB: Write(10): 2a 00 00 e0 08 08 00 00 08 00 scsi target0:0:1: handle(0x000a), sas_address(0x4433221102000000), phy(2) scsi target0:0:1: enclosure_logical_id(0x50030480198f7e01), slot(2) scsi target0:0:1: enclosure level(0x0000),connector name( ^C) sd 0:0:1:0: task abort: SUCCESS scmd(ffff880a64ab46c0) sd 0:0:1:0: attempting task abort! scmd(ffff880b21cec180) sd 0:0:1:0: [sdb] CDB: Write(10): 2a 00 00 c0 08 08 00 00 08 00 scsi target0:0:1: handle(0x000a), sas_address(0x4433221102000000), phy(2) DISKFS-fs (dm-23): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-34): mounted filesystem with ordered data mode. quota=on. Opts: mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-29): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-18): mounted filesystem with ordered data mode. quota=on. Opts: Lustre: nbp2-OST0081: Not available for connect from 10.151.43.107@o2ib (not set up) Lustre: Skipped 3 previous similar messages Lustre: nbp2-OST0081: Not available for connect from 10.151.29.130@o2ib (not set up) Lustre: Skipped 113 previous similar messages Lustre: nbp2-OST0081: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450 Lustre: nbp2-OST0081: Will be in recovery for at least 2:30, or until 14441 clients reconnect Lustre: nbp2-OST0081: Denying connection for new client 35b99837-9505-fc4d-270f-f2d1ca30372d (at 10.151.30.176@o2ib), waiting for all 14441 known clients (44 recovered, 1 in progress, and 0 evicted) to recover in 5:10
Here is /var/log/messages
Aug 11 17:58:25 nbp2-oss10 kernel: LNet: 12075:0:(lib-move.c:1487:lnet_parse_put()) Dropping PUT from 12345-10.151.30.120@o2ib portal 28 match 1575477031778096 offset 0 length 520: 4 Aug 11 17:58:25 nbp2-oss10 kernel: LNet: 12075:0:(lib-move.c:1487:lnet_parse_put()) Skipped 1037319 previous similar messages Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-30): Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-28): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-31): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-18): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-21): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-19): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-22): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-20): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-26): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-33): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-23): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-32): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:40 nbp2-oss10 kernel: LDISKFS-fs (dm-34): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:40 nbp2-oss10 kernel: LDISKFS-fs (dm-24): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:40 nbp2-oss10 kernel: LDISKFS-fs (dm-25): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:40 nbp2-oss10 kernel: Aug 11 18:03:41 nbp2-oss10 kernel: LDISKFS-fs (dm-29): Aug 11 18:03:41 nbp2-oss10 kernel: LDISKFS-fs (dm-35): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:41 nbp2-oss10 kernel: mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:49 nbp2-oss10 kernel: LDISKFS-fs (dm-27): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:50 nbp2-oss10 kernel: LustreError: 137-5: nbp2-OST0009_UUID: not available for connect from 10.151.50.143@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 18:03:50 nbp2-oss10 kernel: LustreError: Skipped 314 previous similar messages Aug 11 18:03:51 nbp2-oss10 kernel: Lustre: nbp2-OST00d1: Not available for connect from 10.151.9.177@o2ib (not set up) Aug 11 18:03:51 nbp2-oss10 kernel: Lustre: Skipped 11 previous similar messages Aug 11 18:03:51 nbp2-oss10 kernel: LustreError: 137-5: nbp2-OST0009_UUID: not available for connect from 10.151.8.85@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 18:03:51 nbp2-oss10 kernel: LustreError: Skipped 3632 previous similar messages Aug 11 18:03:51 nbp2-oss10 kernel: Lustre: nbp2-OST00d1: Not available for connect from 10.151.50.241@o2ib (not set up) Aug 11 18:03:51 nbp2-oss10 kernel: Lustre: Skipped 180 previous similar messages Aug 11 18:03:52 nbp2-oss10 kernel: LustreError: 137-5: nbp2-OST0135_UUID: not available for connect from 10.151.48.113@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 18:03:52 nbp2-oss10 kernel: LustreError: Skipped 6273 previous similar messages Aug 11 18:03:52 nbp2-oss10 kernel: Lustre: nbp2-OST00d1: Not available for connect from 10.151.7.158@o2ib (not set up) Aug 11 18:03:52 nbp2-oss10 kernel: Lustre: Skipped 402 previous similar messages Aug 11 18:03:52 nbp2-oss10 kernel: Lustre: nbp2-OST00d1: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450 Aug 11 18:03:52 nbp2-oss10 kernel: Lustre: nbp2-OST00d1: Will be in recovery for at least 2:30, or until 14452 clients reconnect
mhanafi
It looks different from the original one, would you please to show me more logs (dmesg, /var/log/messages) about the latest corruption ? Is the system still accessible after above warning?
Applied the new patch. After a full fsck mounting osts resulted in this many block groups getting corrected.
---------------- service603 ---------------- 4549 dm-33): ---------------- service604 ---------------- 4425 dm-32): ---------------- service606 ---------------- 4658 dm-29): ---------------- service610 ---------------- 4631 dm-33): ---------------- service611 ---------------- 4616 dm-28): ---------------- service616 ---------------- 4652 dm-35): ---------------- service617 ---------------- 4501 dm-21): ---------------- service619 ---------------- 4657 dm-25):
We need to rate limit the warnings.
I used systemtap to catch one of these bad groups and dump out the ldiskfs_group_desc struct.
mballoc.c:826: first_group: 274007 bg_free_blocks_count_hi: 0 bg_block_bitmap_hi: 0 bg_free_blocks_count_lo: 0 mballoc.c:826:$desc {.bg_block_bitmap_lo=328727, .bg_inode_bitmap_lo=930551, .bg_inode_table_lo=3450424, .bg_free_blocks_count_lo=0, .bg_free_inodes_count_lo=128, .bg_used_dirs_count_lo=0, .bg_flags=7, .bg_reserved=[...], .bg_itable_unused_lo=128, .bg_checksum=55256, .bg_block_bitmap_hi=0, .bg_inode_bitmap_hi=0, .bg_inode_table_hi=0, .bg_free_blocks_count_hi=0, .bg_free_inodes_count_hi=0, .bg_used_dirs_count_hi=0, .bg_itable_unused_hi=0, .bg_reserved2=[...]}
It also seem odd that dumpe2fs can produce different results for unused block groups. Sometimes it will show block_bitmap!=free_blocks and other time it will be ok.
---
in ldiskfs_valid_block_bitmap() I don't understand this
if (LDISKFS_HAS_INCOMPAT_FEATURE(sb, LDISKFS_FEATURE_INCOMPAT_FLEX_BG)) { /* with FLEX_BG, the inode/block bitmaps and itable * blocks may not be in the group at all * so the bitmap validation will be skipped for those groups * or it has to also read the block group where the bitmaps * are located to verify they are set. */ return 1; }
We have flex_bg enabled would this apply to us?
For the OST that are prone to the bitmap errors cat /proc/fs/ldiskfs/dm*/mb_groups will reproduce the errors.
Aug 14 18:37:14 nbp2-oss20 kernel: (/tmp/rpmbuild-lustre-jlan-PYDDD1xV/BUILD/lustre-2.7.3/ldiskfs/balloc.c, 179): ldiskfs_init_block_bitmap: #24877: init the group 270808 of total groups 583584: group_blocks 32768, free_blocks 32768, free_blocks_in_gdp 0, ret 32768
The logs shows that the ldiskfs_init_block_bitmap() initialized the bitmap, but the free blocks count in the group descriptor is still zero, that caused the subsequent ldiskfs_mb_check_ondisk_bitmap() failure. Currently, I can not say it is corruption, but more like logic issue. The patch will set the free block count based on the real free bits in the bitmap. It may be not the perfect solution, but we can try whether it can resolve your trouble or not.
Sorry I typed the patch number. I wanted to say it is stable with 28550.