Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.5.3
-
None
-
Centos 6.5
Linux 2.6.32-431.23.3.el6_lustre.x86_64
-
3
-
9223372036854775807
Description
We're having an issue with our mds crashing. This is after recovering from a full md filesystem. We've been deleting from storage to free up metadata space, but have run into these kernel panics.
dmesg logs have the following:
<2>LDISKFS-jfs error (device md0): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 0corrupted: 57 blocks free in bitmap, 6 - in gd <4> <3>Aborting journal on device md0-8. <2>LDISKFS-fs error (device md0): ldiskfs_journal_start_sb: Detected aborted journal <2>LDISKFS-fs error (device md0) in iam_txn_add: Journal has aborted <2>LDISKFS-fs (md0): Remounting filesystem read-only <2>LDISKFS-fs (md0): Remounting filesystem read-only <3>LustreError: 6919:0:(osd_io.c:1173:osd_ldiskfs_write_record()) journal_get_write_access() returned error -30 <3>LustreError: 6919:0:(osd_handler.c:1054:osd_trans_stop()) Failure in transaction hook: -30 <3>LustreError: 6919:0:(osd_handler.c:1063:osd_trans_stop()) Failure to stop transaction: -30 <2>LDISKFS-fs error (device md0): ldiskfs_mb_new_blocks: Updating bitmap error: [err -30] [pa ffff8860350c8ba8] [phy 34992896] [logic 256] [len 256] [free 256] [error 1] [inode 1917] <3>LustreError: 8967:0:(osd_io.c:1166:osd_ldiskfs_write_record()) md0: error reading offset 2093056 (block 511): rc = -30 <3>LustreError: 8967:0:(llog_osd.c:156:llog_osd_write_blob()) echo-MDT0000-osd: error writing log record: rc = -30 <2>LDISKFS-fs error (device md0) in start_transaction: Journal has aborted <2>LDISKFS-fs error (device md0) in start_transaction: Journal has aborted <3>LustreError: 8967:0:(llog_cat.c:356:llog_cat_add_rec()) llog_write_rec -30: lh=ffff88601d1e4b40 <4> <3>LustreError: 5801:0:(osd_handler.c:863:osd_trans_commit_cb()) transaction @0xffff882945fc28c0 commit error: 2 <0>LustreError: 6145:0:(osp_sync.c:874:osp_sync_thread()) ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 11 changes, 31 in progress, 0 in flight: -5 <0>LustreError: 6145:0:(osp_sync.c:874:osp_sync_thread()) LBUG <4>Pid: 6145, comm: osp-syn-98-0 <4> <4>Call Trace: <4> [<ffffffffa03b3895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4> [<ffffffffa03b3e97>] lbug_with_loc+0x47/0xb0 [libcfs] <4> [<ffffffffa0eff2e3>] osp_sync_thread+0x753/0x7d0 [osp] <4> [<ffffffff81528df6>] ? schedule+0x176/0x3b0 <4> [<ffffffffa0efeb90>] ? osp_sync_thread+0x0/0x7d0 [osp] <4> [<ffffffff8109abf6>] kthread+0x96/0xa0 <4> [<ffffffff8100c20a>] child_rip+0xa/0x20 <4> [<ffffffff8109ab60>] ? kthread+0x0/0xa0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20 <4> <3>LustreError: 6135:0:(llog.c:159:llog_cancel_rec()) echo-OST005d-osc-MDT0000: fail to write header for llog #0x5552:1#00000000: rc = -30 <3>LustreError: 6135:0:(llog_cat.c:538:llog_cat_cancel_records()) echo-OST005d-osc-MDT0000: fail to cancel 1 of 1 llog-records: rc = -30 <3>LustreError: 6135:0:(osp_sync.c:721:osp_sync_process_committed()) echo-OST005d-osc-MDT0000: can't cancel record: -30 <0>Kernel panic - not syncing: LBUG <4>Pid: 6145, comm: osp-syn-98-0 Not tainted 2.6.32-431.23.3.el6_lustre.x86_64 #1 <4>Call Trace: <4> [<ffffffff8152896c>] ? panic+0xa7/0x16f <4> [<ffffffffa03b3eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] <4> [<ffffffffa0eff2e3>] ? osp_sync_thread+0x753/0x7d0 [osp] <4> [<ffffffff81528df6>] ? schedule+0x176/0x3b0 <4> [<ffffffffa0efeb90>] ? osp_sync_thread+0x0/0x7d0 [osp] <4> [<ffffffff8109abf6>] ? kthread+0x96/0xa0 <4> [<ffffffff8100c20a>] ? child_rip+0xa/0x20 <4> [<ffffffff8109ab60>] ? kthread+0x0/0xa0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
Looking into this issue, it appears that the MDT filesystem is becoming full because of the use of hard-link trees for backup. This results in each file having a large link xattr that spills into an external block, as well as multiple directories referencing each file.
It would normally not be possible to have more than about 4500 bytes used per inode, even with the external xattr block, but the extra directory trees are consuming this space.
A file taken at random has a link count of 10 and an external xattr block (the File ACL block):
As for the e2fsck problem, I haven't been able to debug it yet because the problem takes several hours to hit. In the meantime, I've fixed the (first) problem that was causing the MDT to be remounted read-only:
I manually marked all of the blocks in group 0 used, and changed the group summary to match, as well as recomputed the block group checksum, which is the safest workaround given that I don't know which block(s) are actually in use, or which of those values is correct. It isn't clear if there are more errors like this, but I verified the next few groups had consistent block counts in the bitmap and group descriptors.
There is an e2fsck read-only check running under GDB to hopefully be able to debug the problem, but it will take about 8h to hit the point of the prior corruption.