Details
-
Bug
-
Resolution: Duplicate
-
Blocker
-
None
-
Lustre 2.7.0
-
None
-
1
-
9223372036854775807
Description
This evening we have hit this LBUG on the MDT in our production file system, the file system is currently down as we hit the same bug every time we attempt to bring the MDT back, as soon as recovery finishes.
<0>LustreError: 722:0:(osd_handler.c:1017:osd_trans_start()) ASSERTION( get_current()->journal_info == ((void *)0) ) failed: <0>LustreError: 722:0:(osd_handler.c:1017:osd_trans_start()) LBUG <4>Pid: 722, comm: mdt01_017 <4> <4>Call Trace: <4> [<ffffffffa065f895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4> [<ffffffffa065fe97>] lbug_with_loc+0x47/0xb0 [libcfs] <4> [<ffffffffa17df24d>] osd_trans_start+0x25d/0x660 [osd_ldiskfs] <4> [<ffffffffa09b9b4a>] llog_osd_destroy+0x42a/0xd40 [obdclass] <4> [<ffffffffa09b2edc>] llog_cat_new_log+0x1ec/0x710 [obdclass] <4> [<ffffffffa09b350a>] llog_cat_add_rec+0x10a/0x450 [obdclass] <4> [<ffffffffa09ab1e9>] llog_add+0x89/0x1c0 [obdclass] <4> [<ffffffffa17f1976>] ? osd_attr_set+0x166/0x460 [osd_ldiskfs] <4> [<ffffffffa0d914e2>] mdd_changelog_store+0x122/0x290 [mdd] <4> [<ffffffffa0da4d0c>] mdd_changelog_data_store+0x16c/0x320 [mdd] <4> [<ffffffffa0dad9b3>] mdd_attr_set+0x12f3/0x1730 [mdd] <4> [<ffffffffa088a551>] mdt_reint_setattr+0xf81/0x13a0 [mdt] <4> [<ffffffffa087be1c>] ? mdt_root_squash+0x2c/0x3f0 [mdt] <4> [<ffffffffa08801dd>] mdt_reint_rec+0x5d/0x200 [mdt] <4> [<ffffffffa086423b>] mdt_reint_internal+0x4cb/0x7a0 [mdt] <4> [<ffffffffa08649ab>] mdt_reint+0x6b/0x120 [mdt] <4> [<ffffffffa0c6f56e>] tgt_request_handle+0x8be/0x1000 [ptlrpc] <4> [<ffffffffa0c1f5a1>] ptlrpc_main+0xe41/0x1960 [ptlrpc] <4> [<ffffffff8106c4f0>] ? pick_next_task_fair+0xd0/0x130 <4> [<ffffffffa0c1e760>] ? ptlrpc_main+0x0/0x1960 [ptlrpc] <4> [<ffffffff8109e66e>] kthread+0x9e/0xc0 <4> [<ffffffff8100c20a>] child_rip+0xa/0x20 <4> [<ffffffff8109e5d0>] ? kthread+0x0/0xc0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20 <4> <0>Kernel panic - not syncing: LBUG
The stack trace doesn't quite seem to be the same as for LU-6634 (which anyway doesn't have any fix suggested.)
Attachments
Issue Links
Activity
Resolution | New: Duplicate [ 3 ] | |
Status | Original: Open [ 1 ] | New: Resolved [ 5 ] |
Comment |
[ I think there is no need to destroy llog at all - if llog_write_rec() failed due to ENOSPC, then another new plain llog won't help. so we just need to ensure failed llog_write_rec() leaves the state consistent and once we have some space back it can be reused. ] |
James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/31478
Subject:
LU-7138sptlrpc: make srpc_info writableProject: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a2f7394ca5affe35c568ba2971c3bf3aeb2a7843