Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-486

ldiskfs_valid_block_bitmap: Invalid block bitmap

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Minor
    • None
    • None
    • Lustre 1.8.4ddn3.1
      Kernel 2.6.18-194.32.1.el5
      CentOS 5.x
    • 2
    • 23,959
    • 6112

    Description

      OSS throws LDISKFS-fs error stating that it encountered an invalid block bitmap. This results in the OST being remounted read-only and requiring a reboot of the OSS to recover. A subsequent 'e2fsck -fp <dev>' replays the journal and finds no errors on the OST.

      This issue has been seen spuriously during internal stress testing by Bernd and by some customers in the field. It has been seen by other Lustre users as well and reported on the lustre-discuss list. There is a bugzilla ticket open but it has not had any support activity since November 2010. I'm opening a Jira bug so this can be worked on.

      https://bugzilla.lustre.org/show_bug.cgi?id=23959

      Logs from the start of the invalid block bitmap:
      Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.521146] LDISKFS-fs error (device dm-21): ldiskfs_valid_block_bitmap: Invalid block bitmap - block_group = 57, block = 1867778
      Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.521155] Aborting journal on device dm-21-8.
      Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.521183] LDISKFS-fs error (device dm-21): ldiskfs_journal_start_sb: Detected aborted journal
      Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.521188] LDISKFS-fs (dm-21): Remounting filesystem read-only
      Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.521205] LustreError: 16643:0:(fsfilt-ldiskfs.c:1320:fsfilt_ldiskfs_write_record()) can't start transaction for 37 blocks (128 bytes)
      Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.521212] LustreError: 16643:0:(filter.c:192:filter_finish_transno()) wrote trans 21483454236 for client 1279815f-edd6-33ed-a1d2-a6685e1060af at #1606: err = -30
      Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.521219] LustreError: 16643:0:(filter_io_26.c:520:filter_direct_io()) can't close transaction: -30
      Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.521228] LDISKFS-fs error (device dm-21) in fsfilt_ldiskfs_commit: IO failure
      Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.521303] LustreError: 16465:0:(fsfilt-ldiskfs.c:496:fsfilt_ldiskfs_brw_start()) can't get handle for 582 credits: rc = -30
      Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.521314] LustreError: 16465:0:(filter_io_26.c:690:filter_commitrw_write()) error starting transaction: rc = -30
      Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.521329] LustreError: 16618:0:(filter_io_26.c:690:filter_commitrw_write()) error starting transaction: rc = -30
      Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.521337] LustreError: 16455:0:(filter_io_26.c:690:filter_commitrw_write()) error starting transaction: rc = -30
      Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.521347] LustreError: 16645:0:(filter_io_26.c:690:filter_commitrw_write()) error starting transaction: rc = -30
      Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.521687] LustreError: 16532:0:(filter_io_26.c:690:filter_commitrw_write()) error starting transaction: rc = -30
      Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.522014] LustreError: 16513:0:(fsfilt-ldiskfs.c:496:fsfilt_ldiskfs_brw_start()) can't get handle for 582 credits: rc = -30
      Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.522019] LustreError: 16513:0:(fsfilt-ldiskfs.c:496:fsfilt_ldiskfs_brw_start()) Skipped 4 previous similar messages
      Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.522026] LustreError: 16513:0:(filter_io_26.c:690:filter_commitrw_write()) error starting transaction: rc = -30
      Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.523770] LustreError: 16578:0:(filter_io_26.c:690:filter_commitrw_write()) error starting transaction: rc = -30
      Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.524759] LDISKFS-fs (dm-21): Remounting filesystem read-only
      Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.529215] LDISKFS-fs error (device dm-21) in ldiskfs_ext_new_extent_cb: Journal has aborted
      Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.529237] LustreError: 16405:0:(fsfilt-ldiskfs.c:1320:fsfilt_ldiskfs_write_record()) can't start transaction for 37 blocks (128 bytes)
      Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.529244] LustreError: 16405:0:(filter.c:192:filter_finish_transno()) wrote trans 21483454237 for client f612f805-2ae3-e606-2bc6-074c557919a6 at #1608: err = -30
      Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.529250] LustreError: 16405:0:(filter_io_26.c:520:filter_direct_io()) can't close transaction: -30
      Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.529452] LustreError: 18162:0:(obd.h:1394:obd_transno_commit_cb()) lfs0-OST0018: transno 21483454237 commit error: 2
      Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.529690] LustreError: 16435:0:(filter_io_26.c:690:filter_commitrw_write()) error starting transaction: rc = -30
      Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.529703] LustreError: 16410:0:(filter_io_26.c:690:filter_commitrw_write()) error starting transaction: rc = -30

      Attachments

        Issue Links

          Activity

            People

              hongchao.zhang Hongchao Zhang
              dvasil@ddn.com David Vasil (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              22 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: