Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5822

health_check file not updating properly

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.5.4
    • Lustre 2.5.3
    • 3
    • 16321

    Description

      Over the weekend we had an OST abort and get marked read-only:

      [  726.076561] LDISKFS-fs error (device dm-25): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 111692corrupted: 32768 blocks free in bitmap, 1024 - in gd
      [  726.116663] 
      [  726.125085] Aborting journal on device dm-25-8.
      [  726.133359] LustreError: 17032:0:(ofd_obd.c:1095:ofd_destroy()) f1-OST00ff: error destroying object [0x100000000:0x16546e7:0x0]: 0
      [  726.176268] LDISKFS-fs (dm-25): 
      [  726.179457] LDISKFS-fs error (device dm-25): ldiskfs_journal_start_sb: Detected aborted journal
      [  726.179459] LDISKFS-fs (dm-25): Remounting filesystem read-only

      We rely on the /proc/fs/lustre/health_check file to notify us of these situations. Unfortunately, we never got a notification. I found a bug in the b2_5 implementation of the osd-ldiskfs osd_statfs() function. Code inspection leads me to believe it does not affect master, but I haven't tried it there. I will upload a patch momentarily.

      Attachments

        Issue Links

          Activity

            People

              jamesanunez James Nunez (Inactive)
              ezell Matt Ezell
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: