Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.10.1, Lustre 2.11.0
    • Lustre 2.7.0
    • None
    • 3
    • 9223372036854775807

    Description

      We had 2 OSS and 3 different OST crash with bitmap corrupted messages.

      Apr  3 18:38:16 nbp1-oss6 kernel: LDISKFS-fs error (device dm-42): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 245659corrupted: 32768 blocks free in bitmap, 0 - in gd
      Apr  3 18:38:16 nbp1-oss6 kernel: 
      Apr  3 18:38:16 nbp1-oss6 kernel: Aborting journal on device dm-3.
      Apr  3 18:38:16 nbp1-oss6 kernel: LDISKFS-fs (dm-42): Remounting filesystem read-only
      Apr  3 18:38:16 nbp1-oss6 kernel: LDISKFS-fs error (device dm-42): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 245660corrupted: 32768 blocks free in bitmap, 0 - in gd
      
      
      

      These errors were on 2 different backend RAID devices. Note worthy  items:
      1 .The filesystem was +90% full and 1/2 of the data was deleted.
      2. OSTs are formatted with " -E packed_meta_blocks=1 "

      Attachments

        1. bt.2017-07-26-02.48.00
          765 kB
        2. bt.2017-07-26-12.08.43
          808 kB
        3. foreach.out
          736 kB
        4. mballoc.c
          145 kB
        5. ost258.dumpe2fs.after.fsck.gz
          34.46 MB
        6. ost258.dumpe2fs.after.readonly.gz
          34.44 MB
        7. syslog.gp270808.error.gz
          13.37 MB
        8. vmcore-dmesg.txt
          512 kB

        Issue Links

          Activity

            [LU-9410] on-disk bitmap corrupted

            we appear to be missing https://review.whamcloud.com/#/c/16312/ (https://review.whamcloud.com/#/c/16679/) from 6.8 cent version. Does this cause any issue with debug patches or recommended action?

            We are planning to run with 16312 and suggested debug patches

            for some reason this patch never made in into the series/6.8

            Are we convinced this is a duplicate of LU-7114

            It is true that we missed such patch, Andreas has pointed it out in the first comment:
            https://jira.hpdd.intel.com/browse/LU-9410?focusedCommentId=193803&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-193803

            But this patch is mostly used for handling the case after the bitmap corruption happened. It allows the system to go ahead without failure right away, then the users can run e2fsck at the maintain windows. As mhanafi commented:
            https://jira.hpdd.intel.com/browse/LU-9410?focusedCommentId=205024&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-205024, it may not help too much for NASA case.

            yong.fan nasf (Inactive) added a comment - we appear to be missing https://review.whamcloud.com/#/c/16312/ ( https://review.whamcloud.com/#/c/16679/ ) from 6.8 cent version. Does this cause any issue with debug patches or recommended action? We are planning to run with 16312 and suggested debug patches for some reason this patch never made in into the series/6.8 Are we convinced this is a duplicate of LU-7114 It is true that we missed such patch, Andreas has pointed it out in the first comment: https://jira.hpdd.intel.com/browse/LU-9410?focusedCommentId=193803&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-193803 But this patch is mostly used for handling the case after the bitmap corruption happened. It allows the system to go ahead without failure right away, then the users can run e2fsck at the maintain windows. As mhanafi commented: https://jira.hpdd.intel.com/browse/LU-9410?focusedCommentId=205024&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-205024 , it may not help too much for NASA case.

            Do we should get the debugging without the kernel recompile?

            With my patch applied, you need NOT recompile the kernel. In fact, the CONFIG_EXT4_DEBUG is almost redundant since we can control the mb debug via the debug level.
            Considering the logs when corruption happened "32768 blocks free in bitmap", it seems that the BG was initialized by logic instead of loading from disk. On some degree, it can explain the in-ram corruption. So I added more debug information in my patch. I think it is worth to try.

            yong.fan nasf (Inactive) added a comment - Do we should get the debugging without the kernel recompile? With my patch applied, you need NOT recompile the kernel. In fact, the CONFIG_EXT4_DEBUG is almost redundant since we can control the mb debug via the debug level. Considering the logs when corruption happened "32768 blocks free in bitmap", it seems that the BG was initialized by logic instead of loading from disk. On some degree, it can explain the in-ram corruption. So I added more debug information in my patch. I think it is worth to try.

            The kernel was originally built with CONFIG_EXT4_DEBUG disabled and lustre server rpms were built for that.
            Mahmoud told me that he did not see /sys/kernel/debug/ldiskfs option, then I rebuilt the kernel and the lustre.

            So, we have both available for testing. If CONFIG_EXT4_DEBUG not needed in the kernel, how we can enable the mb-debug?

            jaylan Jay Lan (Inactive) added a comment - The kernel was originally built with CONFIG_EXT4_DEBUG disabled and lustre server rpms were built for that. Mahmoud told me that he did not see /sys/kernel/debug/ldiskfs option, then I rebuilt the kernel and the lustre. So, we have both available for testing. If CONFIG_EXT4_DEBUG not needed in the kernel, how we can enable the mb-debug?

            Do we should get the debugging without the kernel recompile?

             

            mhanafi Mahmoud Hanafi added a comment - Do we should get the debugging without the kernel recompile?  

            Kernel was rebuilt with CONFIG_EXT4_DEBUG on.

            Sorry, in my test, the CONFIG_EXT4_DEBUG is disabled by default, so I thought NASA may disable it by default also. Then to avoid rebuilding kernel, I made patch to remove the conditional compile for mb debug. On the other hand, I added more debug for initializing BG block bitmap case.

            yong.fan nasf (Inactive) added a comment - Kernel was rebuilt with CONFIG_EXT4_DEBUG on. Sorry, in my test, the CONFIG_EXT4_DEBUG is disabled by default, so I thought NASA may disable it by default also. Then to avoid rebuilding kernel, I made patch to remove the conditional compile for mb debug. On the other hand, I added more debug for initializing BG block bitmap case.

            After the kernel and lustre rebuild/install we don't see /sys/kernel/debug/ldiskfs option.

            What is the output:

            find /proc /sys -name mballoc-debug
            
            yong.fan nasf (Inactive) added a comment - After the kernel and lustre rebuild/install we don't see /sys/kernel/debug/ldiskfs option. What is the output: find /proc /sys -name mballoc-debug

            Kernel was rebuilt with CONFIG_EXT4_DEBUG on.

            jaylan Jay Lan (Inactive) added a comment - Kernel was rebuilt with CONFIG_EXT4_DEBUG on.
            mhanafi Mahmoud Hanafi added a comment - - edited

            After the kernel and lustre rebuild/install we don't see /sys/kernel/debug/ldiskfs option.

             nbp2-oss1 /boot # cat config-2.6.32-642.15.1.el6.20170609.x86_64.lustre273 |grep CONFIG_EXT4_DEBUG
            CONFIG_EXT4_DEBUG=y
            nbp2-oss1 /boot # ls -l /sys/kernel/debug/
            total 0

            mhanafi Mahmoud Hanafi added a comment - - edited After the kernel and lustre rebuild/install we don't see /sys/kernel/debug/ldiskfs option.  nbp2-oss1 /boot # cat config-2.6.32-642.15.1.el6.20170609.x86_64.lustre273 |grep CONFIG_EXT4_DEBUG CONFIG_EXT4_DEBUG=y nbp2-oss1 /boot # ls -l /sys/kernel/debug/ total 0
            Bob.C Bob Ciotti (Inactive) added a comment - - edited

            we appear to be missing https://review.whamcloud.com/#/c/16312/ (https://review.whamcloud.com/#/c/16679/) from 6.8 cent version. Does this cause any issue with debug patches or recommended action?

            We are planning to run with 16312 and suggested debug patches

            for some reason this patch never made in into the series/6.8

            Are we convinced this is a duplicate of LU-7114

            Bob.C Bob Ciotti (Inactive) added a comment - - edited we appear to be missing https://review.whamcloud.com/#/c/16312/ ( https://review.whamcloud.com/#/c/16679/ ) from 6.8 cent version. Does this cause any issue with debug patches or recommended action? We are planning to run with 16312 and suggested debug patches for some reason this patch never made in into the series/6.8 Are we convinced this is a duplicate of LU-7114

            1. Most of the OSTs that hit this bug have Flex block group size=64 vs others set to Flex block group size=256. The back end raid is set for 1MB stripe size. (8 data disk with 128MB per disk stripe). And we pack all metadata blocks in the front of the LUN. Could this be a factor?

            I am not sure for this.

            2. Does fsck in fact check for bitmap corruption on disk? if we don't see it fixing anything does that confirm that these are in memory corruptions?

            The e2fsck will verify the free blocks/inodes in the bitmap with the value recorded in the group descriptor.

            3. If these are in memory corruption can we get a debug patch that will re-read from disk before marking the bitmap as bad.

            4. Can you provide any other debug patch to help narrow the root cause?

            I made a debug patch (https://review.whamcloud.com/28489) with mb debug enable, please apply it with the former patch (https://review.whamcloud.com/#/c/28249/) together. Please NOTE: the mb debug switch is under /sys/kernel/debug/ldiskfs/mballoc-debug on the server node. It is disabled by default with the value 0. Please set it as '1' before you mount up the Lustre device.

            Thanks!

            yong.fan nasf (Inactive) added a comment - 1. Most of the OSTs that hit this bug have Flex block group size=64 vs others set to Flex block group size=256. The back end raid is set for 1MB stripe size. (8 data disk with 128MB per disk stripe). And we pack all metadata blocks in the front of the LUN. Could this be a factor? I am not sure for this. 2. Does fsck in fact check for bitmap corruption on disk? if we don't see it fixing anything does that confirm that these are in memory corruptions? The e2fsck will verify the free blocks/inodes in the bitmap with the value recorded in the group descriptor. 3. If these are in memory corruption can we get a debug patch that will re-read from disk before marking the bitmap as bad. 4. Can you provide any other debug patch to help narrow the root cause? I made a debug patch ( https://review.whamcloud.com/28489 ) with mb debug enable, please apply it with the former patch ( https://review.whamcloud.com/#/c/28249/ ) together. Please NOTE: the mb debug switch is under /sys/kernel/debug/ldiskfs/mballoc-debug on the server node. It is disabled by default with the value 0. Please set it as '1' before you mount up the Lustre device. Thanks!

            Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/28489
            Subject: LU-9410 ldiskfs: enable mb debug
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: d4e2f024d91f375085c24906313b7bc522464c20

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/28489 Subject: LU-9410 ldiskfs: enable mb debug Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: d4e2f024d91f375085c24906313b7bc522464c20

            People

              yong.fan nasf (Inactive)
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: