[LU-5822] health_check file not updating properly - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.5.4
Affects Version/s: Lustre 2.5.3
Labels:
- patch

Severity:
3
Rank (Obsolete):
16321

Description

Over the weekend we had an OST abort and get marked read-only:

[  726.076561] LDISKFS-fs error (device dm-25): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 111692corrupted: 32768 blocks free in bitmap, 1024 - in gd
[  726.116663] 
[  726.125085] Aborting journal on device dm-25-8.
[  726.133359] LustreError: 17032:0:(ofd_obd.c:1095:ofd_destroy()) f1-OST00ff: error destroying object [0x100000000:0x16546e7:0x0]: 0
[  726.176268] LDISKFS-fs (dm-25): 
[  726.179457] LDISKFS-fs error (device dm-25): ldiskfs_journal_start_sb: Detected aborted journal
[  726.179459] LDISKFS-fs (dm-25): Remounting filesystem read-only

We rely on the /proc/fs/lustre/health_check file to notify us of these situations. Unfortunately, we never got a notification. I found a bug in the b2_5 implementation of the osd-ldiskfs osd_statfs() function. Code inspection leads me to believe it does not affect master, but I haven't tried it there. I will upload a patch momentarily.

Attachments

Issue Links

is related to

LU-137 ioctl passthrough mechanism for Lustre OST/MDT mountpoints

Resolved

Activity

People

Assignee:: James Nunez (Inactive)

Reporter:: Matt Ezell

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 28/Oct/14 8:43 PM

Updated:: 24/Apr/15 11:55 PM

Resolved:: 04/Dec/14 10:45 PM