Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5822

health_check file not updating properly

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.5.4
    • Lustre 2.5.3
    • 3
    • 16321

    Description

      Over the weekend we had an OST abort and get marked read-only:

      [  726.076561] LDISKFS-fs error (device dm-25): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 111692corrupted: 32768 blocks free in bitmap, 1024 - in gd
      [  726.116663] 
      [  726.125085] Aborting journal on device dm-25-8.
      [  726.133359] LustreError: 17032:0:(ofd_obd.c:1095:ofd_destroy()) f1-OST00ff: error destroying object [0x100000000:0x16546e7:0x0]: 0
      [  726.176268] LDISKFS-fs (dm-25): 
      [  726.179457] LDISKFS-fs error (device dm-25): ldiskfs_journal_start_sb: Detected aborted journal
      [  726.179459] LDISKFS-fs (dm-25): Remounting filesystem read-only

      We rely on the /proc/fs/lustre/health_check file to notify us of these situations. Unfortunately, we never got a notification. I found a bug in the b2_5 implementation of the osd-ldiskfs osd_statfs() function. Code inspection leads me to believe it does not affect master, but I haven't tried it there. I will upload a patch momentarily.

      Attachments

        Issue Links

          Activity

            [LU-5822] health_check file not updating properly
            pjones Peter Jones made changes -
            Link Original: This issue is related to LDEV-38 [ LDEV-38 ]
            pjones Peter Jones made changes -
            Link New: This issue is related to LDEV-39 [ LDEV-39 ]
            pjones Peter Jones made changes -
            Link New: This issue is related to DDN-151 [ DDN-151 ]
            pjones Peter Jones made changes -
            Link New: This issue is related to LDEV-38 [ LDEV-38 ]
            pjones Peter Jones made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]
            pjones Peter Jones added a comment -

            Landed for 2.5.4

            pjones Peter Jones added a comment - Landed for 2.5.4
            pjones Peter Jones made changes -
            Labels Original: mq414 patch New: patch

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12463/
            Subject: LU-5822 osd-ldiskfs: Correctly return OS_STATE_READONLY
            Project: fs/lustre-release
            Branch: b2_5
            Current Patch Set:
            Commit: 1ff49a78e443f935670daf0c84b5b989c02dca04

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12463/ Subject: LU-5822 osd-ldiskfs: Correctly return OS_STATE_READONLY Project: fs/lustre-release Branch: b2_5 Current Patch Set: Commit: 1ff49a78e443f935670daf0c84b5b989c02dca04
            pjones Peter Jones made changes -
            Labels Original: patch New: mq414 patch

            PS: I verified that this patch is not needed for master.

            adilger Andreas Dilger added a comment - PS: I verified that this patch is not needed for master.

            People

              jamesanunez James Nunez (Inactive)
              ezell Matt Ezell
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: