Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.10.1, Lustre 2.11.0
    • Lustre 2.7.0
    • None
    • 3
    • 9223372036854775807

    Description

      We had 2 OSS and 3 different OST crash with bitmap corrupted messages.

      Apr  3 18:38:16 nbp1-oss6 kernel: LDISKFS-fs error (device dm-42): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 245659corrupted: 32768 blocks free in bitmap, 0 - in gd
      Apr  3 18:38:16 nbp1-oss6 kernel: 
      Apr  3 18:38:16 nbp1-oss6 kernel: Aborting journal on device dm-3.
      Apr  3 18:38:16 nbp1-oss6 kernel: LDISKFS-fs (dm-42): Remounting filesystem read-only
      Apr  3 18:38:16 nbp1-oss6 kernel: LDISKFS-fs error (device dm-42): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 245660corrupted: 32768 blocks free in bitmap, 0 - in gd
      
      
      

      These errors were on 2 different backend RAID devices. Note worthy  items:
      1 .The filesystem was +90% full and 1/2 of the data was deleted.
      2. OSTs are formatted with " -E packed_meta_blocks=1 "

      Attachments

        1. bt.2017-07-26-02.48.00
          765 kB
        2. bt.2017-07-26-12.08.43
          808 kB
        3. foreach.out
          736 kB
        4. mballoc.c
          145 kB
        5. ost258.dumpe2fs.after.fsck.gz
          34.46 MB
        6. ost258.dumpe2fs.after.readonly.gz
          34.44 MB
        7. syslog.gp270808.error.gz
          13.37 MB
        8. vmcore-dmesg.txt
          512 kB

        Issue Links

          Activity

            [LU-9410] on-disk bitmap corrupted
            mhanafi Mahmoud Hanafi added a comment - - edited

            With the new build are we suppose to have mballoc-debug in /proc or /sys?

            because the find doesn't find anything.

             

            Never mind I figured this out. We need to mount debugfs for it to show up.

            mhanafi Mahmoud Hanafi added a comment - - edited With the new build are we suppose to have mballoc-debug in /proc or /sys? because the find doesn't find anything.   Never mind I figured this out. We need to mount debugfs for it to show up.

            LU-7114 will allow the system to go ahead without failure right away when found corrupted bitmap, but the corruption is still there. I would suggest to apply the patch https://review.whamcloud.com/#/c/28489/, it will give us more information the mb operations trace.

            yong.fan nasf (Inactive) added a comment - LU-7114 will allow the system to go ahead without failure right away when found corrupted bitmap, but the corruption is still there. I would suggest to apply the patch https://review.whamcloud.com/#/c/28489/ , it will give us more information the mb operations trace.

            So haven't put patch debug 28489 in place but are now running with "LU-7114" patch. It already has found bitmap errors.

            ug 12 01:05:43 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd
            Aug 12 01:05:43 nbp2-oss20 kernel: 
            Aug 12 01:05:43 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790
            Aug 12 01:05:43 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd
            Aug 12 01:05:43 nbp2-oss20 kernel: 
            Aug 12 01:05:43 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790
            Aug 12 01:05:44 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd
            Aug 12 01:05:45 nbp2-oss20 kernel: 
            Aug 12 01:05:45 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790
            Aug 12 01:05:45 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd
            Aug 12 01:05:45 nbp2-oss20 kernel: 
            Aug 12 01:05:45 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790
            Aug 12 01:05:46 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd
            Aug 12 01:05:47 nbp2-oss20 kernel: 
            Aug 12 01:05:47 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790
            Aug 12 01:05:47 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd
            Aug 12 01:05:47 nbp2-oss20 kernel: 
            Aug 12 01:05:47 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790
            Aug 12 01:05:49 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd
            Aug 12 01:05:50 nbp2-oss20 kernel: 
            Aug 12 01:05:50 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790
            Aug 12 01:05:50 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd
            Aug 12 01:05:50 nbp2-oss20 kernel: 
            Aug 12 01:05:50 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790
            Aug 12 01:05:53 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd
            Aug 12 01:05:54 nbp2-oss20 kernel: 
            Aug 12 01:05:54 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790
            Aug 12 01:05:54 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd
            Aug 12 01:05:54 nbp2-oss20 kernel: 
            Aug 12 01:05:54 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790
            Aug 12 01:05:59 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd
            Aug 12 01:05:59 nbp2-oss20 kernel: 
            Aug 12 01:05:59 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790
            Aug 12 01:05:59 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd
            Aug 12 01:05:59 nbp2-oss20 kernel: 
            Aug 12 01:05:59 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790
            Aug 12 01:06:05 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd
            Aug 12 01:06:05 nbp2-oss20 kernel: 
            Aug 12 01:06:05 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790
            Aug 12 01:06:05 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd
            Aug 12 01:06:05 nbp2-oss20 kernel: 
            Aug 12 01:06:05 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790
            Aug 12 01:06:12 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd
            
            
            
            
            

            Some time later

            Aug 12 04:05:12 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 276684 corrupted: 32768 blocks free in bitmap, 0 - in gd
            Aug 12 04:05:12 nbp2-oss20 kernel: 
            Aug 12 04:05:12 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 276685 corrupted: 32768 blocks free in bitmap, 0 - in gd
            Aug 12 04:07:56 nbp2-oss20 pcp-pmie[5801]: High 1-minute load average 354load@nbp2-oss20
            Aug 12 04:07:56 nbp2-oss20 - in gd
            Aug 12 04:07:56 nbp2-oss20 kernel: 
            Aug 12 04:07:56 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 304861 corrupted: 32768 blocks free in bitmap, 0 - in gd
            Aug 12 04:07:56 nbp2-oss20 kernel: 
            Aug 12 04:07:56 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 304862 corrupted: 32768 blocks free in bitmap, 0 - in gd
            Aug 12 04:07:56 nbp2-oss20 kernel: 
            Aug 12 04:07:56 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 304863 corrupted: 32768 blocks free in bitmap, 0 - in gd
            Aug 12 04:07:56 nbp2-oss20 kernel: 
            Aug 12 04:07:56 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 304864 corrupted: 32768 blocks free in bitmap, 0 - in gd
            Aug 12 04:07:56 nbp2-oss20 kernel: 
            .....
            

            It has marked 6727 uniq groups as bad for dm-21(ost319)

             

            mhanafi Mahmoud Hanafi added a comment - So haven't put patch debug 28489 in place but are now running with " LU-7114 " patch. It already has found bitmap errors. ug 12 01:05:43 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 01:05:43 nbp2-oss20 kernel: Aug 12 01:05:43 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790 Aug 12 01:05:43 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 01:05:43 nbp2-oss20 kernel: Aug 12 01:05:43 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790 Aug 12 01:05:44 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 01:05:45 nbp2-oss20 kernel: Aug 12 01:05:45 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790 Aug 12 01:05:45 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 01:05:45 nbp2-oss20 kernel: Aug 12 01:05:45 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790 Aug 12 01:05:46 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 01:05:47 nbp2-oss20 kernel: Aug 12 01:05:47 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790 Aug 12 01:05:47 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 01:05:47 nbp2-oss20 kernel: Aug 12 01:05:47 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790 Aug 12 01:05:49 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 01:05:50 nbp2-oss20 kernel: Aug 12 01:05:50 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790 Aug 12 01:05:50 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 01:05:50 nbp2-oss20 kernel: Aug 12 01:05:50 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790 Aug 12 01:05:53 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 01:05:54 nbp2-oss20 kernel: Aug 12 01:05:54 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790 Aug 12 01:05:54 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 01:05:54 nbp2-oss20 kernel: Aug 12 01:05:54 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790 Aug 12 01:05:59 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 01:05:59 nbp2-oss20 kernel: Aug 12 01:05:59 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790 Aug 12 01:05:59 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 01:05:59 nbp2-oss20 kernel: Aug 12 01:05:59 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790 Aug 12 01:06:05 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 01:06:05 nbp2-oss20 kernel: Aug 12 01:06:05 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790 Aug 12 01:06:05 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 01:06:05 nbp2-oss20 kernel: Aug 12 01:06:05 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_load_buddy: Error in loading buddy information for 275790 Aug 12 01:06:12 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 275790 corrupted: 32768 blocks free in bitmap, 0 - in gd Some time later Aug 12 04:05:12 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 276684 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 04:05:12 nbp2-oss20 kernel: Aug 12 04:05:12 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 276685 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 04:07:56 nbp2-oss20 pcp-pmie[5801]: High 1-minute load average 354load@nbp2-oss20 Aug 12 04:07:56 nbp2-oss20 - in gd Aug 12 04:07:56 nbp2-oss20 kernel: Aug 12 04:07:56 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 304861 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 04:07:56 nbp2-oss20 kernel: Aug 12 04:07:56 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 304862 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 04:07:56 nbp2-oss20 kernel: Aug 12 04:07:56 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 304863 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 04:07:56 nbp2-oss20 kernel: Aug 12 04:07:56 nbp2-oss20 kernel: LDISKFS-fs warning (device dm-21): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 304864 corrupted: 32768 blocks free in bitmap, 0 - in gd Aug 12 04:07:56 nbp2-oss20 kernel: ..... It has marked 6727 uniq groups as bad for dm-21(ost319)  

            https://review.whamcloud.com/28489 is refreshed, please try again. Thanks!

            yong.fan nasf (Inactive) added a comment - https://review.whamcloud.com/28489 is refreshed, please try again. Thanks!

            mballoc.c attached.

            jaylan Jay Lan (Inactive) added a comment - mballoc.c attached.

            Please attach the source file ldiskfs/mballoc.c, you can find it in your compile directory. Thanks!

            yong.fan nasf (Inactive) added a comment - Please attach the source file ldiskfs/mballoc.c, you can find it in your compile directory. Thanks!

            find /proc /sys -name mballoc-debug

            has not output

            mhanafi Mahmoud Hanafi added a comment - find /proc /sys -name mballoc-debug has not output

            What is the output with ldiskfs.ko insmod:

            find /proc /sys -name mballoc-debug
            
            yong.fan nasf (Inactive) added a comment - What is the output with ldiskfs.ko insmod: find /proc /sys -name mballoc-debug

            OK, running kernel without CONFIG_EXT4_DEBUG and lustre with your patches, how do we enable debugging if we do not see /sys/kernel/debug/ldiskfs/ ? Please elaborate. Thanks.

            jaylan Jay Lan (Inactive) added a comment - OK, running kernel without CONFIG_EXT4_DEBUG and lustre with your patches, how do we enable debugging if we do not see /sys/kernel/debug/ldiskfs/ ? Please elaborate. Thanks.

            we appear to be missing https://review.whamcloud.com/#/c/16312/ (https://review.whamcloud.com/#/c/16679/) from 6.8 cent version. Does this cause any issue with debug patches or recommended action?

            We are planning to run with 16312 and suggested debug patches

            for some reason this patch never made in into the series/6.8

            Are we convinced this is a duplicate of LU-7114

            It is true that we missed such patch, Andreas has pointed it out in the first comment:
            https://jira.hpdd.intel.com/browse/LU-9410?focusedCommentId=193803&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-193803

            But this patch is mostly used for handling the case after the bitmap corruption happened. It allows the system to go ahead without failure right away, then the users can run e2fsck at the maintain windows. As mhanafi commented:
            https://jira.hpdd.intel.com/browse/LU-9410?focusedCommentId=205024&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-205024, it may not help too much for NASA case.

            yong.fan nasf (Inactive) added a comment - we appear to be missing https://review.whamcloud.com/#/c/16312/ ( https://review.whamcloud.com/#/c/16679/ ) from 6.8 cent version. Does this cause any issue with debug patches or recommended action? We are planning to run with 16312 and suggested debug patches for some reason this patch never made in into the series/6.8 Are we convinced this is a duplicate of LU-7114 It is true that we missed such patch, Andreas has pointed it out in the first comment: https://jira.hpdd.intel.com/browse/LU-9410?focusedCommentId=193803&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-193803 But this patch is mostly used for handling the case after the bitmap corruption happened. It allows the system to go ahead without failure right away, then the users can run e2fsck at the maintain windows. As mhanafi commented: https://jira.hpdd.intel.com/browse/LU-9410?focusedCommentId=205024&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-205024 , it may not help too much for NASA case.

            Do we should get the debugging without the kernel recompile?

            With my patch applied, you need NOT recompile the kernel. In fact, the CONFIG_EXT4_DEBUG is almost redundant since we can control the mb debug via the debug level.
            Considering the logs when corruption happened "32768 blocks free in bitmap", it seems that the BG was initialized by logic instead of loading from disk. On some degree, it can explain the in-ram corruption. So I added more debug information in my patch. I think it is worth to try.

            yong.fan nasf (Inactive) added a comment - Do we should get the debugging without the kernel recompile? With my patch applied, you need NOT recompile the kernel. In fact, the CONFIG_EXT4_DEBUG is almost redundant since we can control the mb debug via the debug level. Considering the logs when corruption happened "32768 blocks free in bitmap", it seems that the BG was initialized by logic instead of loading from disk. On some degree, it can explain the in-ram corruption. So I added more debug information in my patch. I think it is worth to try.

            People

              yong.fan nasf (Inactive)
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: