Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8252

MDS kernel panic after aborting journal

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.9.0
    • Lustre 2.5.3
    • None
    • Centos 6.5
      Linux 2.6.32-431.23.3.el6_lustre.x86_64
    • 3
    • 9223372036854775807

    Description

      We're having an issue with our mds crashing. This is after recovering from a full md filesystem. We've been deleting from storage to free up metadata space, but have run into these kernel panics.

      dmesg logs have the following:

      <2>LDISKFS-jfs error (device md0): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 0corrupted: 57 blocks free in bitmap, 6 - in gd
      <4>
      <3>Aborting journal on device md0-8.
      <2>LDISKFS-fs error (device md0): ldiskfs_journal_start_sb: Detected aborted journal
      <2>LDISKFS-fs error (device md0) in iam_txn_add: Journal has aborted
      <2>LDISKFS-fs (md0): Remounting filesystem read-only
      <2>LDISKFS-fs (md0): Remounting filesystem read-only
      <3>LustreError: 6919:0:(osd_io.c:1173:osd_ldiskfs_write_record()) journal_get_write_access() returned error -30
      <3>LustreError: 6919:0:(osd_handler.c:1054:osd_trans_stop()) Failure in transaction hook: -30
      <3>LustreError: 6919:0:(osd_handler.c:1063:osd_trans_stop()) Failure to stop transaction: -30
      <2>LDISKFS-fs error (device md0): ldiskfs_mb_new_blocks: Updating bitmap error: [err -30] [pa ffff8860350c8ba8] [phy 34992896] [logic 256] [len 256] [free 256] [error 1] [inode 1917]
      <3>LustreError: 8967:0:(osd_io.c:1166:osd_ldiskfs_write_record()) md0: error reading offset 2093056 (block 511): rc = -30
      <3>LustreError: 8967:0:(llog_osd.c:156:llog_osd_write_blob()) echo-MDT0000-osd: error writing log record: rc = -30
      <2>LDISKFS-fs error (device md0) in start_transaction: Journal has aborted
      <2>LDISKFS-fs error (device md0) in start_transaction: Journal has aborted
      <3>LustreError: 8967:0:(llog_cat.c:356:llog_cat_add_rec()) llog_write_rec -30: lh=ffff88601d1e4b40
      <4>
      <3>LustreError: 5801:0:(osd_handler.c:863:osd_trans_commit_cb()) transaction @0xffff882945fc28c0 commit error: 2
      <0>LustreError: 6145:0:(osp_sync.c:874:osp_sync_thread()) ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 11 changes, 31 in progress, 0 in flight: -5
      <0>LustreError: 6145:0:(osp_sync.c:874:osp_sync_thread()) LBUG
      <4>Pid: 6145, comm: osp-syn-98-0
      <4>
      <4>Call Trace:
      <4> [<ffffffffa03b3895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      <4> [<ffffffffa03b3e97>] lbug_with_loc+0x47/0xb0 [libcfs]
      <4> [<ffffffffa0eff2e3>] osp_sync_thread+0x753/0x7d0 [osp]
      <4> [<ffffffff81528df6>] ? schedule+0x176/0x3b0
      <4> [<ffffffffa0efeb90>] ? osp_sync_thread+0x0/0x7d0 [osp]
      <4> [<ffffffff8109abf6>] kthread+0x96/0xa0
      <4> [<ffffffff8100c20a>] child_rip+0xa/0x20
      <4> [<ffffffff8109ab60>] ? kthread+0x0/0xa0
      <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
      <4>
      <3>LustreError: 6135:0:(llog.c:159:llog_cancel_rec()) echo-OST005d-osc-MDT0000: fail to write header for llog #0x5552:1#00000000: rc = -30
      <3>LustreError: 6135:0:(llog_cat.c:538:llog_cat_cancel_records()) echo-OST005d-osc-MDT0000: fail to cancel 1 of 1 llog-records: rc = -30
      <3>LustreError: 6135:0:(osp_sync.c:721:osp_sync_process_committed()) echo-OST005d-osc-MDT0000: can't cancel record: -30
      <0>Kernel panic - not syncing: LBUG
      <4>Pid: 6145, comm: osp-syn-98-0 Not tainted 2.6.32-431.23.3.el6_lustre.x86_64 #1
      <4>Call Trace:
      <4> [<ffffffff8152896c>] ? panic+0xa7/0x16f
      <4> [<ffffffffa03b3eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
      <4> [<ffffffffa0eff2e3>] ? osp_sync_thread+0x753/0x7d0 [osp]
      <4> [<ffffffff81528df6>] ? schedule+0x176/0x3b0
      <4> [<ffffffffa0efeb90>] ? osp_sync_thread+0x0/0x7d0 [osp]
      <4> [<ffffffff8109abf6>] ? kthread+0x96/0xa0
      <4> [<ffffffff8100c20a>] ? child_rip+0xa/0x20
      <4> [<ffffffff8109ab60>] ? kthread+0x0/0xa0
      <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
      

      Attachments

        1. dir.3040644673.bin
          4 kB
        2. inode.3040644673.bin
          4 kB
        3. vmcore-dmesg.20160607.txt
          138 kB
        4. vmcore-dmesg.20160608.txt
          157 kB

        Issue Links

          Activity

            People

              ys Yang Sheng
              cyb Cory Brassington (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: