Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Blocker
Fix Version/s: None
Affects Version/s: Lustre 2.7.0
Labels:
None

Severity:
1
Rank (Obsolete):
9223372036854775807

This evening we have hit this LBUG on the MDT in our production file system, the file system is currently down as we hit the same bug every time we attempt to bring the MDT back, as soon as recovery finishes.

<0>LustreError: 722:0:(osd_handler.c:1017:osd_trans_start()) ASSERTION( get_current()->journal_info == ((void *)0) ) failed:
<0>LustreError: 722:0:(osd_handler.c:1017:osd_trans_start()) LBUG
<4>Pid: 722, comm: mdt01_017
<4>
<4>Call Trace:
<4> [<ffffffffa065f895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
<4> [<ffffffffa065fe97>] lbug_with_loc+0x47/0xb0 [libcfs]
<4> [<ffffffffa17df24d>] osd_trans_start+0x25d/0x660 [osd_ldiskfs]
<4> [<ffffffffa09b9b4a>] llog_osd_destroy+0x42a/0xd40 [obdclass]
<4> [<ffffffffa09b2edc>] llog_cat_new_log+0x1ec/0x710 [obdclass]
<4> [<ffffffffa09b350a>] llog_cat_add_rec+0x10a/0x450 [obdclass]
<4> [<ffffffffa09ab1e9>] llog_add+0x89/0x1c0 [obdclass]
<4> [<ffffffffa17f1976>] ? osd_attr_set+0x166/0x460 [osd_ldiskfs]
<4> [<ffffffffa0d914e2>] mdd_changelog_store+0x122/0x290 [mdd]
<4> [<ffffffffa0da4d0c>] mdd_changelog_data_store+0x16c/0x320 [mdd]
<4> [<ffffffffa0dad9b3>] mdd_attr_set+0x12f3/0x1730 [mdd]
<4> [<ffffffffa088a551>] mdt_reint_setattr+0xf81/0x13a0 [mdt]
<4> [<ffffffffa087be1c>] ? mdt_root_squash+0x2c/0x3f0 [mdt]
<4> [<ffffffffa08801dd>] mdt_reint_rec+0x5d/0x200 [mdt]
<4> [<ffffffffa086423b>] mdt_reint_internal+0x4cb/0x7a0 [mdt]
<4> [<ffffffffa08649ab>] mdt_reint+0x6b/0x120 [mdt]
<4> [<ffffffffa0c6f56e>] tgt_request_handle+0x8be/0x1000 [ptlrpc]
<4> [<ffffffffa0c1f5a1>] ptlrpc_main+0xe41/0x1960 [ptlrpc]
<4> [<ffffffff8106c4f0>] ? pick_next_task_fair+0xd0/0x130
<4> [<ffffffffa0c1e760>] ? ptlrpc_main+0x0/0x1960 [ptlrpc]
<4> [<ffffffff8109e66e>] kthread+0x9e/0xc0
<4> [<ffffffff8100c20a>] child_rip+0xa/0x20
<4> [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
<4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
<4>
<0>Kernel panic - not syncing: LBUG

The stack trace doesn't quite seem to be the same as for ~~LU-6634~~ (which anyway doesn't have any fix suggested.)

is related to

LU-6556 changelog catalog corruption if all possible records is define

Resolved

LU-6634 (osd_handler.c:901:osd_trans_start()) ASSERTION( get_current()->journal_info == ((void *)0) ) failed: when reaching Catalog full condition

Resolved

Assignee:: Oleg Drokin

Reporter:: Frederik Ferner (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 10/Sep/15 11:23 PM

Updated:: 01/Mar/18 4:38 PM

Resolved:: 04/Oct/15 7:34 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates