Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.7.0
-
None
-
3
-
9223372036854775807
Description
We had 2 OSS and 3 different OST crash with bitmap corrupted messages.
Apr 3 18:38:16 nbp1-oss6 kernel: LDISKFS-fs error (device dm-42): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 245659corrupted: 32768 blocks free in bitmap, 0 - in gd Apr 3 18:38:16 nbp1-oss6 kernel: Apr 3 18:38:16 nbp1-oss6 kernel: Aborting journal on device dm-3. Apr 3 18:38:16 nbp1-oss6 kernel: LDISKFS-fs (dm-42): Remounting filesystem read-only Apr 3 18:38:16 nbp1-oss6 kernel: LDISKFS-fs error (device dm-42): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 245660corrupted: 32768 blocks free in bitmap, 0 - in gd
These errors were on 2 different backend RAID devices. Note worthy items:
1 .The filesystem was +90% full and 1/2 of the data was deleted.
2. OSTs are formatted with " -E packed_meta_blocks=1 "
The patch 28550 will take effect before 28566, so if 28550 is applied, then 28566 is meaningless. But 28550 may do more things than the necessary fixes. I am afraid of some penitential side-effect.
It is interesting to know that. Because 28489 is just a debug patch, I cannot imagine how it can resolve your issue. It may because your system has jumped over the groups with "BLOCK_UNINIT" flag and zero free blocks in GDP. If it is true, then applying 28566 will not show you more benefit. Since your system is stable running, you can replace the patches with 28566 when it 'corrupted' next time.