[LU-2075] Assertion triggered in lod_declare_object_create Created: 02/Oct/12  Updated: 19/Apr/13  Resolved: 08/Oct/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Critical
Reporter: Prakash Surya (Inactive) Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: topsequoia

Severity: 3
Rank (Obsolete): 4332

 Description   

Hit this assertion immediately after resuming IO workloads using 2.3.51-2chaos.

LustreError: 35701:0:(lod_object.c:750:lod_declare_object_create()) ASSERTION( !dt_object_exists(next) ) failed: 
LustreError: 35701:0:(lod_object.c:750:lod_declare_object_create()) LBUG
Pid: 35701, comm: mdt01_007

Call Trace:
 [<ffffffffa05b7965>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa05b7f77>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa0e5089c>] lod_declare_object_create+0x22c/0x4f0 [lod]
 [<ffffffffa0e8fccf>] mdd_declare_object_create_internal+0xaf/0x1d0 [mdd]
 [<ffffffffa0ea0c9a>] mdd_create+0x39a/0x1550 [mdd]
 [<ffffffffa0f0fc7c>] mdt_md_create+0x51c/0x750 [mdt]
 [<ffffffffa0f10063>] mdt_reint_create+0x1b3/0x710 [mdt]
 [<ffffffffa0f0db71>] mdt_reint_rec+0x41/0xe0 [mdt]
 [<ffffffffa0f06f53>] mdt_reint_internal+0x4f3/0x7f0 [mdt]
 [<ffffffffa0f07294>] mdt_reint+0x44/0xe0 [mdt]
 [<ffffffffa0ef8132>] mdt_handle_common+0x932/0x1770 [mdt]
 [<ffffffffa0ef9045>] mdt_regular_handle+0x15/0x20 [mdt]
 [<ffffffffa09507fc>] ptlrpc_server_handle_request+0x41c/0xe00 [ptlrpc]
 [<ffffffffa05b86be>] ? cfs_timer_arm+0xe/0x10 [libcfs]
 [<ffffffffa05ca13f>] ? lc_watchdog_touch+0x6f/0x180 [libcfs]
 [<ffffffffa0947bb7>] ? ptlrpc_wait_event+0xa7/0x2a0 [ptlrpc]
 [<ffffffff81051ba3>] ? __wake_up+0x53/0x70
 [<ffffffffa0951dd1>] ptlrpc_main+0xbf1/0x19e0 [ptlrpc]
 [<ffffffffa09511e0>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
 [<ffffffff8100c14a>] child_rip+0xa/0x20
 [<ffffffffa09511e0>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
 [<ffffffffa09511e0>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
 [<ffffffff8100c140>] ? child_rip+0x0/0x20

Kernel panic - not syncing: LBUG
Pid: 35701, comm: mdt01_007 Tainted: P        W  ----------------   2.6.32-220.23.1.1chaos.ch5.x86_64 #1
Call Trace:
 [<ffffffff814ee992>] ? panic+0x78/0x143
 [<ffffffffa05b7fcb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
 [<ffffffffa0e5089c>] ? lod_declare_object_create+0x22c/0x4f0 [lod]
 [<ffffffffa0e8fccf>] ? mdd_declare_object_create_internal+0xaf/0x1d0 [mdd]
 [<ffffffffa0ea0c9a>] ? mdd_create+0x39a/0x1550 [mdd]
 [<ffffffffa0f0fc7c>] ? mdt_md_create+0x51c/0x750 [mdt]
 [<ffffffffa0f10063>] ? mdt_reint_create+0x1b3/0x710 [mdt]
 [<ffffffffa0f0db71>] ? mdt_reint_rec+0x41/0xe0 [mdt]
 [<ffffffffa0f06f53>] ? mdt_reint_internal+0x4f3/0x7f0 [mdt]
 [<ffffffffa0f07294>] ? mdt_reint+0x44/0xe0 [mdt]
 [<ffffffffa0ef8132>] ? mdt_handle_common+0x932/0x1770 [mdt]
 [<ffffffffa0ef9045>] ? mdt_regular_handle+0x15/0x20 [mdt]
 [<ffffffffa09507fc>] ? ptlrpc_server_handle_request+0x41c/0xe00 [ptlrpc]
 [<ffffffffa05b86be>] ? cfs_timer_arm+0xe/0x10 [libcfs]
 [<ffffffffa05ca13f>] ? lc_watchdog_touch+0x6f/0x180 [libcfs]
 [<ffffffffa0947bb7>] ? ptlrpc_wait_event+0xa7/0x2a0 [ptlrpc]
 [<ffffffff81051ba3>] ? __wake_up+0x53/0x70
 [<ffffffffa0951dd1>] ? ptlrpc_main+0xbf1/0x19e0 [ptlrpc]
 [<ffffffffa09511e0>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
 [<ffffffff8100c14a>] ? child_rip+0xa/0x20
 [<ffffffffa09511e0>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
 [<ffffffffa09511e0>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
 [<ffffffff8100c140>] ? child_rip+0x0/0x20


 Comments   
Comment by Alex Zhuravlev [ 02/Oct/12 ]

Prakash, this is an existing setup ? could you please try a fresh one ?
(i'd like to get the direction of the investigation).

Comment by Prakash Surya (Inactive) [ 02/Oct/12 ]

This filesystem was updated from our previous Orion branch. I brought the MDS up last night and the OSTs this morning following LU-2069. The clients are also running the new 2.3.51-2chaos tag.

What you do mean by "try a fresh one"?

Comment by Alex Zhuravlev [ 02/Oct/12 ]

OK, got it. let me check one idea first.

by a fresh one i meant another filesystem, created with new master's utilities.
I'm going to try this locally.

Comment by Prakash Surya (Inactive) [ 02/Oct/12 ]

We don't really have hardware to try reformatting a new FS at the moment. The best I could do is a VM.

Comment by Alex Zhuravlev [ 02/Oct/12 ]

ok, let me dig this locally first. thanks.

Comment by Alex Zhuravlev [ 02/Oct/12 ]

the root cause is found... working on the fix.

Comment by Alex Zhuravlev [ 02/Oct/12 ]

please try with http://review.whamcloud.com/4158

Comment by Prakash Surya (Inactive) [ 02/Oct/12 ]

Sure, I'll pull that into our tree.

Comment by Ian Colle (Inactive) [ 08/Oct/12 ]

4158 landed to master

Generated at Sat Feb 10 01:22:09 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.