Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1557

Hitting Kernel Bug in mballoc.c

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • Lustre 1.8.x (1.8.0 - 1.8.5)
    • None
    • 3
    • 4048

    Description

      In the past week we have encountered 3 separate OSSs panic with the following message. Two of the incidents occurred with Chaos 4.4-2, lustre 1.8.5.0.3 but we had scheduled to update to the 4.4-4.1 release this week. When we encountered the second panic on Monday we pushed it up to then. But we just encountered the same panic with the newer operating environment.

      We are in the process of trying to identify the codes/users that were running at the time of the panics but I wanted to get this posted immediately. Unfortunately the lustre.log file was lost because of our diskless environment. I am working to correct that so that logs are sent to an NFS mount in the future.

      LDISKFS-fs error (device dm-0): ldiskfs_valid_block_bitmap: Invalid block bitmap - block_group = 38734, block = 126923574
      ----------- [cut here ] --------- [please bite here ] ---------
      Kernel BUG at fs/ldiskfs/mballoc.c:3714
      invalid opcode: 0000 [1] SMP
      last sysfs file: /class/scsi_host/host0/local_ib_port
      CPU 13
      Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) lustre(U) ldiskfs(U) jbd(U) lov(U) crc16(U) mdc(U) lquotr

      mlx4_en(U) joydev(U) mlx4_core(U) shpchp(U) pcspkr(U) cxgb4(U) serio_raw(U) i7core_edac(U) edac_mc(U) ehci_hcd(U) 8021)
      Pid: 8957, comm: ll_ost_io_175 Tainted: G 2.6.18-108chaos #1
      RIP: 0010:[<ffffffff888ddf66>] [<ffffffff888ddf66>] :ldiskfs:ldiskfs_mb_release_inode_pa+0x1eb/0x218
      RSP: 0018:ffff8102d2beb200 EFLAGS: 00010206
      RAX: 00000000000000d6 RBX: 00000000000001d6 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8032dd80
      RBP: ffff8102d2beb330 R08: ffffffff8032dd88 R09: 0000000000000020
      R10: ffffffff8049981c R11: 0000000000000000 R12: ffff8101693abdc8
      R13: 0000000000003fd6 R14: 0000000000003fd6 R15: 00000000000001d6
      FS: 00002ae81dc466e0(0000) GS:ffff81033fdd6d40(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 000000000046bc68 CR3: 0000000000201000 CR4: 00000000000006e0
      Process ll_ost_io_175 (pid: 8957, threadinfo ffff8102d2bea000, task ffff8102d2aaa080)
      Stack: 000000000000984f ffff8101a34293d0 ffff8102d2beb340 ffff8105fbdc8000
      ffff810198036568 ffff81063bb91c00 0000000000003a00 ffff8101269084f8
      0000000000000000 0000000000000001 07eee5c3000022fd 00000000002b00c9
      Call Trace:
      [<ffffffff8001a470>] __getblk+0x28/0x248
      [<ffffffff888e0712>] :ldiskfs:ldiskfs_mb_discard_inode_preallocations+0x209/0x2b0
      [<ffffffff889cc2af>] :fsfilt_ldiskfs:ldiskfs_ext_new_extent_cb+0x38f/0x620
      [<ffffffff888d9bd9>] :ldiskfs:ldiskfs_ext_walk_space+0x182/0x1fa
      [<ffffffff889cbf20>] :fsfilt_ldiskfs:ldiskfs_ext_new_extent_cb+0x0/0x620
      [<ffffffff889c8849>] :fsfilt_ldiskfs:fsfilt_map_nblocks+0xe9/0x120
      [<ffffffff80066c8c>] __mutex_unlock_slowpath+0x31/0x38
      [<ffffffff889c8a7a>] :fsfilt_ldiskfs:fsfilt_ldiskfs_map_ext_inode_pages+0x1fa/0x220
      [<ffffffff8002efc1>] __wake_up+0x43/0x50
      [<ffffffff888a911e>] :jbd:start_this_handle+0x321/0x3c7
      [<ffffffff889c8ae1>] :fsfilt_ldiskfs:fsfilt_ldiskfs_map_inode_pages+0x41/0xb0
      [<ffffffff88a057fa>] :obdfilter:filter_direct_io+0x46a/0xd50
      [<ffffffff887f09e0>] :lquota:filter_quota_acquire+0x0/0x120
      [<ffffffff88a08552>] :obdfilter:filter_commitrw_write+0x17a2/0x2b30
      [<ffffffff88a00038>] :obdfilter:filter_commitrw+0x58/0x2a0
      [<ffffffff889aac3f>] :ost:ost_brw_write+0x1c2f/0x2410
      [<ffffffff8872dea0>] :ptlrpc:lustre_msg_check_version_v2+0x10/0x30
      [<ffffffff80093b5a>] default_wake_function+0x0/0xf
      [<ffffffff889ae053>] :ost:ost_handle+0x2c33/0x5690
      [<ffffffff8015f9f8>] __next_cpu+0x19/0x28
      [<ffffffff8007a442>] smp_send_reschedule+0x4a/0x50
      [<ffffffff8872dce5>] :ptlrpc:lustre_msg_get_opc+0x35/0xf0
      [<ffffffff8873d41e>] :ptlrpc:ptlrpc_server_handle_request+0x96e/0xdc0
      [<ffffffff8873db7a>] :ptlrpc:ptlrpc_wait_event+0x30a/0x320
      [<ffffffff8873eaf6>] :ptlrpc:ptlrpc_main+0xf66/0x1110
      [<ffffffff8006101d>] child_rip+0xa/0x11
      [<ffffffff8873db90>] :ptlrpc:ptlrpc_main+0x0/0x1110
      [<ffffffff80061013>] child_rip+0x0/0x11

      Code: 0f 0b 68 69 98 8e 88 c2 82 0e 48 8b 85 e8 fe ff ff f0 44 01
      RIP [<ffffffff888ddf66>] :ldiskfs:ldiskfs_mb_release_inode_pa+0x1eb/0x218
      RSP <ffff8102d2beb200>
      LustreError: dumping log to /var/dumps/lustre-log.1340395410.9106

      Attachments

        Issue Links

          Activity

            People

              niu Niu Yawei (Inactive)
              jamervi Joe Mervini
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: