Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.10.1, Lustre 2.11.0
    • Lustre 2.7.0
    • None
    • 3
    • 9223372036854775807

    Description

      We had 2 OSS and 3 different OST crash with bitmap corrupted messages.

      Apr  3 18:38:16 nbp1-oss6 kernel: LDISKFS-fs error (device dm-42): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 245659corrupted: 32768 blocks free in bitmap, 0 - in gd
      Apr  3 18:38:16 nbp1-oss6 kernel: 
      Apr  3 18:38:16 nbp1-oss6 kernel: Aborting journal on device dm-3.
      Apr  3 18:38:16 nbp1-oss6 kernel: LDISKFS-fs (dm-42): Remounting filesystem read-only
      Apr  3 18:38:16 nbp1-oss6 kernel: LDISKFS-fs error (device dm-42): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 245660corrupted: 32768 blocks free in bitmap, 0 - in gd
      
      
      

      These errors were on 2 different backend RAID devices. Note worthy  items:
      1 .The filesystem was +90% full and 1/2 of the data was deleted.
      2. OSTs are formatted with " -E packed_meta_blocks=1 "

      Attachments

        1. bt.2017-07-26-02.48.00
          765 kB
        2. bt.2017-07-26-12.08.43
          808 kB
        3. foreach.out
          736 kB
        4. mballoc.c
          145 kB
        5. ost258.dumpe2fs.after.fsck.gz
          34.46 MB
        6. ost258.dumpe2fs.after.readonly.gz
          34.44 MB
        7. syslog.gp270808.error.gz
          13.37 MB
        8. vmcore-dmesg.txt
          512 kB

        Issue Links

          Activity

            [LU-9410] on-disk bitmap corrupted

            Sorry I typed the patch number. I wanted to say it is stable with 28550.

            Then it is reasonable. As I explained above, 28550 may do more than the necessary fixes. But since it runs stable, you can keep it until next 'corruption'.

            yong.fan nasf (Inactive) added a comment - Sorry I typed the patch number. I wanted to say it is stable with 28550. Then it is reasonable. As I explained above, 28550 may do more than the necessary fixes. But since it runs stable, you can keep it until next 'corruption'.

            Sorry I typed the patch number. I wanted to say it is stable with 28550.

             

            mhanafi Mahmoud Hanafi added a comment - Sorry I typed the patch number. I wanted to say it is stable with 28550.  

            The patch 28550 will take effect before 28566, so if 28550 is applied, then 28566 is meaningless. But 28550 may do more things than the necessary fixes. I am afraid of some penitential side-effect.

            The filesystem is stable with the workaround patch (/28489/). Can we run with this patch for sometime without any underlining filesystem issues? Or should we replace it with 28566 ASAP.
            

            It is interesting to know that. Because 28489 is just a debug patch, I cannot imagine how it can resolve your issue. It may because your system has jumped over the groups with "BLOCK_UNINIT" flag and zero free blocks in GDP. If it is true, then applying 28566 will not show you more benefit. Since your system is stable running, you can replace the patches with 28566 when it 'corrupted' next time.

            yong.fan nasf (Inactive) added a comment - The patch 28550 will take effect before 28566, so if 28550 is applied, then 28566 is meaningless. But 28550 may do more things than the necessary fixes. I am afraid of some penitential side-effect. The filesystem is stable with the workaround patch (/28489/). Can we run with this patch for sometime without any underlining filesystem issues? Or should we replace it with 28566 ASAP. It is interesting to know that. Because 28489 is just a debug patch, I cannot imagine how it can resolve your issue. It may because your system has jumped over the groups with "BLOCK_UNINIT" flag and zero free blocks in GDP. If it is true, then applying 28566 will not show you more benefit. Since your system is stable running, you can replace the patches with 28566 when it 'corrupted' next time.

            The filesystem is stable with the workaround patch (/28489/). Can we run with this patch for sometime without any underlining filesystem issues? Or should we replace it with 28566 ASAP.

            mhanafi Mahmoud Hanafi added a comment - The filesystem is stable with the workaround patch ( /28489/ ). Can we run with this patch for sometime without any underlining filesystem issues? Or should we replace it with 28566 ASAP.
            jaylan Jay Lan (Inactive) added a comment - - edited

            I did a build with #28566 and #28550 yesterday. For testing purpose, do these two conflict?
            I will undo #28550, but if these two do not collide, we can do testing with the builds I did yesterday.

            Never mind. I just did another build with #28550 pulled out.

            jaylan Jay Lan (Inactive) added a comment - - edited I did a build with #28566 and #28550 yesterday. For testing purpose, do these two conflict? I will undo #28550, but if these two do not collide, we can do testing with the builds I did yesterday. Never mind. I just did another build with #28550 pulled out.

            mhanafi, I have to say that this issue may be related with the improperly bitmap consistency verification in our ldiskfs patch without handling flex_bg case. I made a patch https://review.whamcloud.com/28566 to handle related issues. Would you pleas to try (no need other former patches). Thanks!

            yong.fan nasf (Inactive) added a comment - mhanafi , I have to say that this issue may be related with the improperly bitmap consistency verification in our ldiskfs patch without handling flex_bg case. I made a patch https://review.whamcloud.com/28566 to handle related issues. Would you pleas to try (no need other former patches). Thanks!

            Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/28566
            Subject: LU-9410 ldiskfs: no check mb bitmap if flex_bg enabled
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 8332a30959750c603bc572db1fcde8bc92f82a40

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/28566 Subject: LU-9410 ldiskfs: no check mb bitmap if flex_bg enabled Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 8332a30959750c603bc572db1fcde8bc92f82a40
            mhanafi Mahmoud Hanafi added a comment - - edited

            here is part of dmesg. The high rate of messages caused the root drive scsi device to reset. But all but one server recovered. I had to turn down printk log level down to get the last one to recover.

            LDISKFS-fs warning (device dm-33): ldiskfs_init_block_bitmap: Set free blocks as 32768 for group 262310
            
            LDISKFS-fs warning (device dm-33): ldiskfs_init_block_bitmap: Set free blocks as 32768 for group 262311
            
            LDISKFS-fs warning (device dm-33): ldiskfs_init_block_bitmap: Set free blocks as 32768 for group 262312
            
            LDISKFS-fs warning (device dm-33): ldiskfs_init_block_bitmap: Set free blocks as 32768 for group 262313
            
            LDISKFS-fs warning (device dm-33): ldiskfs_init_block_bitmap: Set free blocks as 32768 for group 262314
            LNet: 12178:0:(lib-move.c:1487:lnet_parse_put()) Dropping PUT from 12345-10.149.2.156@o2ib313 portal 28 match 1575300167923792 offset 0 length 520: 4
            LNet: 12178:0:(lib-move.c:1487:lnet_parse_put()) Skipped 978380 previous similar messages
            sd 0:0:1:0: attempting task abort! scmd(ffff880af433e0c0)
            sd 0:0:1:0: [sdb] CDB: Write(10): 2a 00 00 a0 08 08 00 00 08 00
            scsi target0:0:1: handle(0x000a), sas_address(0x4433221102000000), phy(2)
            scsi target0:0:1: enclosure_logical_id(0x50030480198f7e01), slot(2)
            scsi target0:0:1: enclosure level(0x0000),connector name(    ^C)
            sd 0:0:1:0: task abort: SUCCESS scmd(ffff880af433e0c0)
            sd 0:0:1:0: attempting task abort! scmd(ffff880a64ab46c0)
            sd 0:0:1:0: [sdb] CDB: Write(10): 2a 00 00 e0 08 08 00 00 08 00
            scsi target0:0:1: handle(0x000a), sas_address(0x4433221102000000), phy(2)
            scsi target0:0:1: enclosure_logical_id(0x50030480198f7e01), slot(2)
            scsi target0:0:1: enclosure level(0x0000),connector name(    ^C)
            sd 0:0:1:0: task abort: SUCCESS scmd(ffff880a64ab46c0)
            sd 0:0:1:0: attempting task abort! scmd(ffff880b21cec180)
            sd 0:0:1:0: [sdb] CDB: Write(10): 2a 00 00 c0 08 08 00 00 08 00
            scsi target0:0:1: handle(0x000a), sas_address(0x4433221102000000), phy(2)
            DISKFS-fs (dm-23): mounted filesystem with ordered data mode. quota=on. Opts: 
            LDISKFS-fs (dm-34): mounted filesystem with ordered data mode. quota=on. Opts: 
            mounted filesystem with ordered data mode. quota=on. Opts: 
            LDISKFS-fs (dm-29): mounted filesystem with ordered data mode. quota=on. Opts: 
            
            LDISKFS-fs (dm-18): mounted filesystem with ordered data mode. quota=on. Opts: 
            Lustre: nbp2-OST0081: Not available for connect from 10.151.43.107@o2ib (not set up)
            Lustre: Skipped 3 previous similar messages
            Lustre: nbp2-OST0081: Not available for connect from 10.151.29.130@o2ib (not set up)
            Lustre: Skipped 113 previous similar messages
            Lustre: nbp2-OST0081: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450
            Lustre: nbp2-OST0081: Will be in recovery for at least 2:30, or until 14441 clients reconnect
            Lustre: nbp2-OST0081: Denying connection for new client 35b99837-9505-fc4d-270f-f2d1ca30372d (at 10.151.30.176@o2ib), waiting for all 14441 known clients (44 recovered, 1 in progress, and 0 evicted) to recover in 5:10
            
            
            

            Here is /var/log/messages

            Aug 11 17:58:25 nbp2-oss10 kernel: LNet: 12075:0:(lib-move.c:1487:lnet_parse_put()) Dropping PUT from 12345-10.151.30.120@o2ib portal 28 match 1575477031778096 offset 0 length 520: 4
            Aug 11 17:58:25 nbp2-oss10 kernel: LNet: 12075:0:(lib-move.c:1487:lnet_parse_put()) Skipped 1037319 previous similar messages
            Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-30):
            Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-28): mounted filesystem with ordered data mode. quota=on. Opts:
            Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-31): mounted filesystem with ordered data mode. quota=on. Opts:
            Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-18): mounted filesystem with ordered data mode. quota=on. Opts:
            Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-21): mounted filesystem with ordered data mode. quota=on. Opts:
            Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-19): mounted filesystem with ordered data mode. quota=on. Opts:
            Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-22): mounted filesystem with ordered data mode. quota=on. Opts:
            Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-20): mounted filesystem with ordered data mode. quota=on. Opts:
            Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-26): mounted filesystem with ordered data mode. quota=on. Opts:
            Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-33): mounted filesystem with ordered data mode. quota=on. Opts:
            Aug 11 18:03:35 nbp2-oss10 kernel: mounted filesystem with ordered data mode. quota=on. Opts:
            Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-23): mounted filesystem with ordered data mode. quota=on. Opts:
            Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-32): mounted filesystem with ordered data mode. quota=on. Opts:
            Aug 11 18:03:40 nbp2-oss10 kernel: LDISKFS-fs (dm-34): mounted filesystem with ordered data mode. quota=on. Opts:
            Aug 11 18:03:40 nbp2-oss10 kernel: LDISKFS-fs (dm-24): mounted filesystem with ordered data mode. quota=on. Opts:
            Aug 11 18:03:40 nbp2-oss10 kernel: LDISKFS-fs (dm-25): mounted filesystem with ordered data mode. quota=on. Opts:
            Aug 11 18:03:40 nbp2-oss10 kernel: 
            Aug 11 18:03:41 nbp2-oss10 kernel: LDISKFS-fs (dm-29):
            Aug 11 18:03:41 nbp2-oss10 kernel: LDISKFS-fs (dm-35): mounted filesystem with ordered data mode. quota=on. Opts:
            Aug 11 18:03:41 nbp2-oss10 kernel: mounted filesystem with ordered data mode. quota=on. Opts:
            Aug 11 18:03:49 nbp2-oss10 kernel: LDISKFS-fs (dm-27): mounted filesystem with ordered data mode. quota=on. Opts:
            Aug 11 18:03:50 nbp2-oss10 kernel: LustreError: 137-5: nbp2-OST0009_UUID: not available for connect from 10.151.50.143@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server.
            Aug 11 18:03:50 nbp2-oss10 kernel: LustreError: Skipped 314 previous similar messages
            Aug 11 18:03:51 nbp2-oss10 kernel: Lustre: nbp2-OST00d1: Not available for connect from 10.151.9.177@o2ib (not set up)
            Aug 11 18:03:51 nbp2-oss10 kernel: Lustre: Skipped 11 previous similar messages
            Aug 11 18:03:51 nbp2-oss10 kernel: LustreError: 137-5: nbp2-OST0009_UUID: not available for connect from 10.151.8.85@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server.
            Aug 11 18:03:51 nbp2-oss10 kernel: LustreError: Skipped 3632 previous similar messages
            Aug 11 18:03:51 nbp2-oss10 kernel: Lustre: nbp2-OST00d1: Not available for connect from 10.151.50.241@o2ib (not set up)
            Aug 11 18:03:51 nbp2-oss10 kernel: Lustre: Skipped 180 previous similar messages
            Aug 11 18:03:52 nbp2-oss10 kernel: LustreError: 137-5: nbp2-OST0135_UUID: not available for connect from 10.151.48.113@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server.
            Aug 11 18:03:52 nbp2-oss10 kernel: LustreError: Skipped 6273 previous similar messages
            Aug 11 18:03:52 nbp2-oss10 kernel: Lustre: nbp2-OST00d1: Not available for connect from 10.151.7.158@o2ib (not set up)
            Aug 11 18:03:52 nbp2-oss10 kernel: Lustre: Skipped 402 previous similar messages
            Aug 11 18:03:52 nbp2-oss10 kernel: Lustre: nbp2-OST00d1: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450
            Aug 11 18:03:52 nbp2-oss10 kernel: Lustre: nbp2-OST00d1: Will be in recovery for at least 2:30, or until 14452 clients reconnect
            
            
            mhanafi Mahmoud Hanafi added a comment - - edited here is part of dmesg. The high rate of messages caused the root drive scsi device to reset. But all but one server recovered. I had to turn down printk log level down to get the last one to recover. LDISKFS-fs warning (device dm-33): ldiskfs_init_block_bitmap: Set free blocks as 32768 for group 262310 LDISKFS-fs warning (device dm-33): ldiskfs_init_block_bitmap: Set free blocks as 32768 for group 262311 LDISKFS-fs warning (device dm-33): ldiskfs_init_block_bitmap: Set free blocks as 32768 for group 262312 LDISKFS-fs warning (device dm-33): ldiskfs_init_block_bitmap: Set free blocks as 32768 for group 262313 LDISKFS-fs warning (device dm-33): ldiskfs_init_block_bitmap: Set free blocks as 32768 for group 262314 LNet: 12178:0:(lib-move.c:1487:lnet_parse_put()) Dropping PUT from 12345-10.149.2.156@o2ib313 portal 28 match 1575300167923792 offset 0 length 520: 4 LNet: 12178:0:(lib-move.c:1487:lnet_parse_put()) Skipped 978380 previous similar messages sd 0:0:1:0: attempting task abort! scmd(ffff880af433e0c0) sd 0:0:1:0: [sdb] CDB: Write(10): 2a 00 00 a0 08 08 00 00 08 00 scsi target0:0:1: handle(0x000a), sas_address(0x4433221102000000), phy(2) scsi target0:0:1: enclosure_logical_id(0x50030480198f7e01), slot(2) scsi target0:0:1: enclosure level(0x0000),connector name( ^C) sd 0:0:1:0: task abort: SUCCESS scmd(ffff880af433e0c0) sd 0:0:1:0: attempting task abort! scmd(ffff880a64ab46c0) sd 0:0:1:0: [sdb] CDB: Write(10): 2a 00 00 e0 08 08 00 00 08 00 scsi target0:0:1: handle(0x000a), sas_address(0x4433221102000000), phy(2) scsi target0:0:1: enclosure_logical_id(0x50030480198f7e01), slot(2) scsi target0:0:1: enclosure level(0x0000),connector name( ^C) sd 0:0:1:0: task abort: SUCCESS scmd(ffff880a64ab46c0) sd 0:0:1:0: attempting task abort! scmd(ffff880b21cec180) sd 0:0:1:0: [sdb] CDB: Write(10): 2a 00 00 c0 08 08 00 00 08 00 scsi target0:0:1: handle(0x000a), sas_address(0x4433221102000000), phy(2) DISKFS-fs (dm-23): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-34): mounted filesystem with ordered data mode. quota=on. Opts: mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-29): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-18): mounted filesystem with ordered data mode. quota=on. Opts: Lustre: nbp2-OST0081: Not available for connect from 10.151.43.107@o2ib (not set up) Lustre: Skipped 3 previous similar messages Lustre: nbp2-OST0081: Not available for connect from 10.151.29.130@o2ib (not set up) Lustre: Skipped 113 previous similar messages Lustre: nbp2-OST0081: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450 Lustre: nbp2-OST0081: Will be in recovery for at least 2:30, or until 14441 clients reconnect Lustre: nbp2-OST0081: Denying connection for new client 35b99837-9505-fc4d-270f-f2d1ca30372d (at 10.151.30.176@o2ib), waiting for all 14441 known clients (44 recovered, 1 in progress, and 0 evicted) to recover in 5:10 Here is /var/log/messages Aug 11 17:58:25 nbp2-oss10 kernel: LNet: 12075:0:(lib-move.c:1487:lnet_parse_put()) Dropping PUT from 12345-10.151.30.120@o2ib portal 28 match 1575477031778096 offset 0 length 520: 4 Aug 11 17:58:25 nbp2-oss10 kernel: LNet: 12075:0:(lib-move.c:1487:lnet_parse_put()) Skipped 1037319 previous similar messages Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-30): Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-28): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-31): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-18): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-21): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-19): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-22): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-20): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-26): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-33): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-23): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:35 nbp2-oss10 kernel: LDISKFS-fs (dm-32): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:40 nbp2-oss10 kernel: LDISKFS-fs (dm-34): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:40 nbp2-oss10 kernel: LDISKFS-fs (dm-24): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:40 nbp2-oss10 kernel: LDISKFS-fs (dm-25): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:40 nbp2-oss10 kernel: Aug 11 18:03:41 nbp2-oss10 kernel: LDISKFS-fs (dm-29): Aug 11 18:03:41 nbp2-oss10 kernel: LDISKFS-fs (dm-35): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:41 nbp2-oss10 kernel: mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:49 nbp2-oss10 kernel: LDISKFS-fs (dm-27): mounted filesystem with ordered data mode. quota=on. Opts: Aug 11 18:03:50 nbp2-oss10 kernel: LustreError: 137-5: nbp2-OST0009_UUID: not available for connect from 10.151.50.143@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 18:03:50 nbp2-oss10 kernel: LustreError: Skipped 314 previous similar messages Aug 11 18:03:51 nbp2-oss10 kernel: Lustre: nbp2-OST00d1: Not available for connect from 10.151.9.177@o2ib (not set up) Aug 11 18:03:51 nbp2-oss10 kernel: Lustre: Skipped 11 previous similar messages Aug 11 18:03:51 nbp2-oss10 kernel: LustreError: 137-5: nbp2-OST0009_UUID: not available for connect from 10.151.8.85@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 18:03:51 nbp2-oss10 kernel: LustreError: Skipped 3632 previous similar messages Aug 11 18:03:51 nbp2-oss10 kernel: Lustre: nbp2-OST00d1: Not available for connect from 10.151.50.241@o2ib (not set up) Aug 11 18:03:51 nbp2-oss10 kernel: Lustre: Skipped 180 previous similar messages Aug 11 18:03:52 nbp2-oss10 kernel: LustreError: 137-5: nbp2-OST0135_UUID: not available for connect from 10.151.48.113@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 18:03:52 nbp2-oss10 kernel: LustreError: Skipped 6273 previous similar messages Aug 11 18:03:52 nbp2-oss10 kernel: Lustre: nbp2-OST00d1: Not available for connect from 10.151.7.158@o2ib (not set up) Aug 11 18:03:52 nbp2-oss10 kernel: Lustre: Skipped 402 previous similar messages Aug 11 18:03:52 nbp2-oss10 kernel: Lustre: nbp2-OST00d1: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450 Aug 11 18:03:52 nbp2-oss10 kernel: Lustre: nbp2-OST00d1: Will be in recovery for at least 2:30, or until 14452 clients reconnect

            mhanafi
            It looks different from the original one, would you please to show me more logs (dmesg, /var/log/messages) about the latest corruption ? Is the system still accessible after above warning?

            yong.fan nasf (Inactive) added a comment - mhanafi It looks different from the original one, would you please to show me more logs (dmesg, /var/log/messages) about the latest corruption ? Is the system still accessible after above warning?

            Applied the new patch. After a full fsck mounting osts resulted in this many block groups getting corrected.

            ----------------
            service603
            ----------------
             4549 dm-33):
            
            ----------------
            service604
            ----------------
             4425 dm-32):
            
            ----------------
            service606
            ----------------
             4658 dm-29):
            
            ----------------
            service610
            ----------------
             4631 dm-33):
            
            ----------------
            service611
            ----------------
             4616 dm-28):
            
            ----------------
            service616
            ----------------
             4652 dm-35):
            
            ----------------
            service617
            ----------------
             4501 dm-21):
            
            ----------------
            service619
            ----------------
             4657 dm-25):
            
            

            We need to rate limit the warnings.

            mhanafi Mahmoud Hanafi added a comment - Applied the new patch. After a full fsck mounting osts resulted in this many block groups getting corrected. ---------------- service603 ---------------- 4549 dm-33): ---------------- service604 ---------------- 4425 dm-32): ---------------- service606 ---------------- 4658 dm-29): ---------------- service610 ---------------- 4631 dm-33): ---------------- service611 ---------------- 4616 dm-28): ---------------- service616 ---------------- 4652 dm-35): ---------------- service617 ---------------- 4501 dm-21): ---------------- service619 ---------------- 4657 dm-25): We need to rate limit the warnings.
            mhanafi Mahmoud Hanafi added a comment - - edited

            I used systemtap to catch one of these bad groups and dump out the ldiskfs_group_desc struct.

            mballoc.c:826: first_group: 274007 bg_free_blocks_count_hi: 0 bg_block_bitmap_hi: 0 bg_free_blocks_count_lo: 0
            mballoc.c:826:$desc {.bg_block_bitmap_lo=328727, .bg_inode_bitmap_lo=930551, .bg_inode_table_lo=3450424, .bg_free_blocks_count_lo=0, .bg_free_inodes_count_lo=128, .bg_used_dirs_count_lo=0, .bg_flags=7, .bg_reserved=[...], .bg_itable_unused_lo=128, .bg_checksum=55256, .bg_block_bitmap_hi=0, .bg_inode_bitmap_hi=0, .bg_inode_table_hi=0, .bg_free_blocks_count_hi=0, .bg_free_inodes_count_hi=0, .bg_used_dirs_count_hi=0, .bg_itable_unused_hi=0, .bg_reserved2=[...]}
            
            
            

             

            It also seem odd that dumpe2fs can produce different results for unused block groups. Sometimes it will show block_bitmap!=free_blocks and other time it will be ok.

             ---

            in ldiskfs_valid_block_bitmap() I don't understand this

             if (LDISKFS_HAS_INCOMPAT_FEATURE(sb, LDISKFS_FEATURE_INCOMPAT_FLEX_BG)) {
             /* with FLEX_BG, the inode/block bitmaps and itable
             * blocks may not be in the group at all
             * so the bitmap validation will be skipped for those groups
             * or it has to also read the block group where the bitmaps
             * are located to verify they are set.
             */
             return 1;
             }
            
            

            We have flex_bg enabled would this apply to us?

             

            For the OST that are prone to the bitmap errors cat /proc/fs/ldiskfs/dm*/mb_groups will reproduce the errors.

             

            mhanafi Mahmoud Hanafi added a comment - - edited I used systemtap to catch one of these bad groups and dump out the ldiskfs_group_desc struct. mballoc.c:826: first_group: 274007 bg_free_blocks_count_hi: 0 bg_block_bitmap_hi: 0 bg_free_blocks_count_lo: 0 mballoc.c:826:$desc {.bg_block_bitmap_lo=328727, .bg_inode_bitmap_lo=930551, .bg_inode_table_lo=3450424, .bg_free_blocks_count_lo=0, .bg_free_inodes_count_lo=128, .bg_used_dirs_count_lo=0, .bg_flags=7, .bg_reserved=[...], .bg_itable_unused_lo=128, .bg_checksum=55256, .bg_block_bitmap_hi=0, .bg_inode_bitmap_hi=0, .bg_inode_table_hi=0, .bg_free_blocks_count_hi=0, .bg_free_inodes_count_hi=0, .bg_used_dirs_count_hi=0, .bg_itable_unused_hi=0, .bg_reserved2=[...]}   It also seem odd that dumpe2fs can produce different results for unused block groups. Sometimes it will show block_bitmap!=free_blocks and other time it will be ok.  --- in ldiskfs_valid_block_bitmap() I don't understand this if (LDISKFS_HAS_INCOMPAT_FEATURE(sb, LDISKFS_FEATURE_INCOMPAT_FLEX_BG)) { /* with FLEX_BG, the inode/block bitmaps and itable * blocks may not be in the group at all * so the bitmap validation will be skipped for those groups * or it has to also read the block group where the bitmaps * are located to verify they are set. */ return 1; } We have flex_bg enabled would this apply to us?   For the OST that are prone to the bitmap errors cat /proc/fs/ldiskfs/dm*/mb_groups will reproduce the errors.  

            People

              yong.fan nasf (Inactive)
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: