Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5040

kernel BUG at fs/jbd2/transaction.c:1033

    XMLWordPrintable

Details

    • 3
    • 13932

    Description

      mdt crashed with

      <4>------------[ cut here ]------------^M
      <2>kernel BUG at fs/jbd2/transaction.c:1033!^M
      [1]kdb> sr 8^M
      SysRq : Changing Loglevel^M
      Loglevel set to 8^M
      [1]kdb> sr p^M
      SysRq : Show Regs^M
      CPU 1 ^M
      Modules linked in: osp(U) lod(U) mdt(U) mgs(U) mgc(U) fsfilt_ldiskfs(U) osd_ldiskfs(U) ldiskfs(U) lquota(U) jbd2 mdd(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic sha256_generic crc32c_intel libcfs(U) dm_round_robin scsi_dh_rdac lpfc(U) scsi_transport_fc scsi_tgt nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc bonding 8021q garp stp llc ib_ucm(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ib_cm(U) ib_sa(U) ipv6 ib_uverbs(U) ib_umad(U) mlx4_ib(U) ib_mad(U) ib_core(U) dm_multipath tcp_bic power_meter dcdbas microcode iTCO_wdt iTCO_vendor_support shpchp mlx4_core(U) memtrack(U) ses enclosure sg tg3 hwmon ext3 jbd sd_mod crc_t10dif wmi megaraid_sas dm_mirror dm_region_hash dm_log dm_mod gru [last unloaded: scsi_wait_scan]^M
      ^M
      Pid: 13917, comm: mdt_rdpg02_017 Not tainted 2.6.32-358.23.2.el6.20140115.x86_64.lustre241 #1 Dell Inc. PowerEdge R720/0VWT90^M
      RIP: 0010:[<ffffffffa0bd88ad>]  [<ffffffffa0bd88ad>] jbd2_journal_dirty_metadata+0x10d/0x150 [jbd2]^M
      RSP: 0018:ffff880f537198a0  EFLAGS: 00010246^M
      RAX: ffff880f88da9cc0 RBX: ffff880eb8352d08 RCX: ffff880bf382b610^M
      RDX: 0000000000000000 RSI: ffff880bf382b610 RDI: 0000000000000000^M
      RBP: ffff880f537198c0 R08: 2010000000000000 R09: f3ee8046d0a58402^M
      R10: 0000000000000001 R11: ffff880863dd6e10 R12: ffff880f4897f518^M
      R13: ffff880bf382b610 R14: ffff881007dcc800 R15: 0000000000000008^M
      FS:  00007fffedaf3700(0000) GS:ffff88084c400000(0000) knlGS:0000000000000000^M
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b^M
      CR2: 000000000061c9b8 CR3: 0000000001a25000 CR4: 00000000000407e0^M
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000^M
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400^M
      Process mdt_rdpg02_017 (pid: 13917, threadinfo ffff880f53718000, task ffff880f5370aae0)^M
      Stack:^M
       ffff880eb8352d08 ffffffffa0ca92d0 ffff880bf382b610 0000000000000000^M
      <d> ffff880f53719900 ffffffffa0c680bb ffff880f537198f0 ffffffff810962ff^M
      <d> ffff8810213f3350 ffff880eb8352d08 0000000000000018 ffff880bf382b610^M
      all Trace:^M
       [<ffffffffa0c680bb>] __ldiskfs_handle_dirty_metadata+0x7b/0x100 [ldiskfs]^M
       [<ffffffff810962ff>] ? wake_up_bit+0x2f/0x40^M
       [<ffffffffa0c9ea55>] ldiskfs_quota_write+0x165/0x210 [ldiskfs]^M
       [<ffffffff811e2221>] v2_write_file_info+0xa1/0xe0^M
       [<ffffffff811de328>] dquot_acquire+0x138/0x140^M
       [<ffffffffa0c9d5f6>] ldiskfs_acquire_dquot+0x66/0xb0 [ldiskfs]^M
       [<ffffffff811e029c>] dqget+0x2ac/0x390^M
       [<ffffffff811e0848>] dquot_initialize+0x98/0x240^M
       [<ffffffffa0c9d812>] ldiskfs_dquot_initialize+0x62/0xc0 [ldiskfs]^M
       [<ffffffffa0cf8d6f>] osd_attr_set+0x12f/0x540 [osd_ldiskfs]^M
       [<ffffffffa0eb15cb>] lod_attr_set+0x12b/0x450 [lod]^M
       [<ffffffffa0b6d411>] mdd_attr_set_internal+0x151/0x230 [mdd]^M
       [<ffffffffa0b706ea>] mdd_attr_set+0x107a/0x1390 [mdd]^M
       [<ffffffffa06fd011>] ? lustre_pack_reply_v2+0x1e1/0x280 [ptlrpc]^M
       [<ffffffffa0e0e182>] mdt_mfd_close+0x502/0x6e0 [mdt]^M
       [<ffffffffa0e0f73a>] mdt_close+0x67a/0xab0 [mdt]^M
       [<ffffffffa0de7ad7>] mdt_handle_common+0x647/0x16d0 [mdt]^M
       [<ffffffffa0e21635>] mds_readpage_handle+0x15/0x20 [mdt]^M
       [<ffffffffa070d3d8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]^M
       [<ffffffffa04175de>] ? cfs_timer_arm+0xe/0x10 [libcfs]^M
       [<ffffffffa0428d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]^M
       [<ffffffffa0704739>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]^M
       [<ffffffff81055813>] ? __wake_up+0x53/0x70^M
       [<ffffffffa070e76e>] ptlrpc_main+0xace/0x1700 [ptlrpc]^M
       [<ffffffffa070dca0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]^M
       [<ffffffff8100c0ca>] child_rip+0xa/0x20^M
       [<ffffffffa070dca0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]^M
       [<ffffffffa070dca0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]^M
       [<ffffffff8100c0c0>] ? child_rip+0x0/0x20^M
      Code: c6 9c 03 00 00 4c 89 f7 e8 11 97 96 e0 48 8b 33 ba 01 00 00 00 4c 89 e7 e8 11 ec ff ff 4c 89 f0 66 ff 00 66 66 90 e9 73 ff ff ff <0f> 0b eb fe 0f 0b eb fe 0f 0b 66 
      Call Trace:^M
       [<ffffffffa0c680bb>] __ldiskfs_handle_dirty_metadata+0x7b/0x100 [ldiskfs]^M
       [<ffffffff810962ff>] ? wake_up_bit+0x2f/0x40^M
       [<ffffffffa0c9ea55>] ldiskfs_quota_write+0x165/0x210 [ldiskfs]^M
       [<ffffffff811e2221>] v2_write_file_info+0xa1/0xe0^M
       [<ffffffff811de328>] dquot_acquire+0x138/0x140^M
       [<ffffffffa0c9d5f6>] ldiskfs_acquire_dquot+0x66/0xb0 [ldiskfs]^M
       [<ffffffff811e029c>] dqget+0x2ac/0x390^M
       [<ffffffff811e0848>] dquot_initialize+0x98/0x240^M
       [<ffffffffa0c9d812>] ldiskfs_dquot_initialize+0x62/0xc0 [ldiskfs]^M
       [<ffffffffa0cf8d6f>] osd_attr_set+0x12f/0x540 [osd_ldiskfs]^M
       [<ffffffffa0eb15cb>] lod_attr_set+0x12b/0x450 [lod]^M
       [<ffffffffa0b6d411>] mdd_attr_set_internal+0x151/0x230 [mdd]^M
       [<ffffffffa0b706ea>] mdd_attr_set+0x107a/0x1390 [mdd]^M
       [<ffffffffa06fd011>] ? lustre_pack_reply_v2+0x1e1/0x280 [ptlrpc]^M
       [<ffffffffa0e0e182>] mdt_mfd_close+0x502/0x6e0 [mdt]^M
       [<ffffffffa0e0f73a>] mdt_close+0x67a/0xab0 [mdt]^M
       [<ffffffffa0de7ad7>] mdt_handle_common+0x647/0x16d0 [mdt]^M
       [<ffffffffa0e21635>] mds_readpage_handle+0x15/0x20 [mdt]^M
       [<ffffffffa070d3d8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]^M
       [<ffffffffa04175de>] ? cfs_timer_arm+0xe/0x10 [libcfs]^M
       [<ffffffffa0428d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]^M
       [<ffffffffa0704739>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]^M
       [<ffffffff81055813>] ? __wake_up+0x53/0x70^M
       [<ffffffffa070e76e>] ptlrpc_main+0xace/0x1700 [ptlrpc]^M
       [<ffffffffa070dca0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]^M
       [<ffffffff8100c0ca>] child_rip+0xa/0x20^M
       [<ffffffffa070dca0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]^M
       [<ffffffffa070dca0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]^M
       [<ffffffff8100c0c0>] ? child_rip+0x0/0x20^M
      

      After recover it crashed again at the same place.

      AFTER RECOVER

      Lustre: nbp7-MDT0000: recovery is timed out, evict stale exports^M
      Lustre: nbp7-MDT0000: disconnecting 30 stale clients^M
      LustreError: 5667:0:(mdt_lvb.c:157:mdt_lvbo_fill()) nbp7-MDT0000: expected 56 actual 0.^M
      Lustre: nbp7-MDT0000: Recovery over after 5:02, of 11832 clients 11802 recovered and 30 were evicted.^M
      ------------[ cut here ]------------^M
      kernel BUG at fs/jbd2/transaction.c:1033!^M
      

      Rebooted Ran fsck.

      Ran recovery Crashed again same place

      Rebooted Mounted with abort recover no crash so far.

      Attachments

        Issue Links

          Activity

            People

              bobijam Zhenyu Xu
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: