Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.5.3
-
3
-
16321
Description
Over the weekend we had an OST abort and get marked read-only:
[ 726.076561] LDISKFS-fs error (device dm-25): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 111692corrupted: 32768 blocks free in bitmap, 1024 - in gd
[ 726.116663]
[ 726.125085] Aborting journal on device dm-25-8.
[ 726.133359] LustreError: 17032:0:(ofd_obd.c:1095:ofd_destroy()) f1-OST00ff: error destroying object [0x100000000:0x16546e7:0x0]: 0
[ 726.176268] LDISKFS-fs (dm-25):
[ 726.179457] LDISKFS-fs error (device dm-25): ldiskfs_journal_start_sb: Detected aborted journal
[ 726.179459] LDISKFS-fs (dm-25): Remounting filesystem read-only
We rely on the /proc/fs/lustre/health_check file to notify us of these situations. Unfortunately, we never got a notification. I found a bug in the b2_5 implementation of the osd-ldiskfs osd_statfs() function. Code inspection leads me to believe it does not affect master, but I haven't tried it there. I will upload a patch momentarily.
Attachments
Issue Links
- is related to
-
LU-137 ioctl passthrough mechanism for Lustre OST/MDT mountpoints
- Resolved