[LU-5069] Hit LBUG in DNE racer test: (lu_object.h:852:lu_object_attr()) ASSERTION( ((o)->lo_header->loh_attr & LOHA_EXISTS) != 0 ) failed Created: 15/May/14 Updated: 03/Jun/14 Resolved: 03/Jun/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0 |
| Fix Version/s: | Lustre 2.6.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Sarah Liu | Assignee: | Di Wang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
2MDS with 4MDT |
||
| Severity: | 3 |
| Rank (Obsolete): | 13997 |
| Description |
|
on MDS2 hit the LBUG LustreError: 2958:0:(mdd_dir.c:3954:mdd_migrate()) Skipped 15 previous similar messages LustreError: 2830:0:(lu_object.h:852:lu_object_attr()) ASSERTION( ((o)->lo_header->loh_attr & LOHA_EXISTS) != 0 ) failed: LustreError: 2830:0:(lu_object.h:852:lu_object_attr()) LBUG Pid: 2830, comm: mdt00_003 Call Trace: Message from [<ffffffffa0399895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] syslogd@client-1 [<ffffffffa0399e97>] lbug_with_loc+0x47/0xb0 [libcfs] 8 at May 15 14:1 [<ffffffffa0f3ae97>] mdd_is_subdir+0x277/0x280 [mdd] 4:19 ... kern [<ffffffffa0e0f2ef>] mdt_rename_sanity+0xff/0x4a0 [mdt] el:LustreError: [<ffffffffa0e1321c>] mdt_reint_rename_internal+0xdc/0x1a80 [mdt] 2830:0:(lu_objec [<ffffffffa06e46f8>] ? ldlm_lock_enqueue+0x1c8/0x930 [ptlrpc] t.h:852:lu_objec [<ffffffffa0703edb>] ? ldlm_cli_enqueue_local+0x28b/0x5e0 [ptlrpc] t_attr()) ASSERT [<ffffffffa0e14e04>] mdt_reint_rename_or_migrate+0x244/0x660 [mdt] ION( ((o)->lo_he [<ffffffffa0702bc0>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc] ader->loh_attr & [<ffffffffa0704230>] ? ldlm_completion_ast+0x0/0x930 [ptlrpc] LOHA_EXISTS) != [<ffffffffa0e15250>] mdt_reint_rename+0x10/0x20 [mdt] 0 ) failed: [<ffffffffa0e0d881>] mdt_reint_rec+0x41/0xe0 [mdt] [<ffffffffa0df2e93>] mdt_reint_internal+0x4c3/0x7c0 [mdt] [<ffffffffa0df371b>] mdt_reint+0x6b/0x120 [mdt] Message from [<ffffffffa078fe5c>] tgt_request_handle+0x23c/0xac0 [ptlrpc] syslogd@client- [<ffffffffa073faea>] ptlrpc_main+0xd1a/0x1980 [ptlrpc] 18 at May 15 14: [<ffffffffa073edd0>] ? ptlrpc_main+0x0/0x1980 [ptlrpc] 14:19 ... ker [<ffffffff8109ab56>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 nel:LustreError: [<ffffffff8109aac0>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 2830:0:(lu_objeLustreError: dumping log to /tmp/lustre-log.1400188459.2830 ct.h:852:lu_object_attr()) LBUG |
| Comments |
| Comment by Di Wang [ 15/May/14 ] |
|
It seems caused by this change rename so that it always has parent-child lock ordering, Signed-off-by: Vitaly Fertman <vitaly_fertman@xyratex.com> Hmm, do mdt_sanity_check without ldlm lock protection seems a bit risky. at least it needs to check whether object exist before mdo_is_subdir static int mdt_rename_sanity(struct mdt_thread_info *info,
const struct lu_fid *dir_fid,
const struct lu_fid *fid)
{
...............
<-------------------- check whether the object(dot) exists here.
rc = mdo_is_subdir(info->mti_env,
mdt_object_child(dst), fid,
&dst_fid);
mdt_object_put(info->mti_env, dst);
}
I will cook a patch. |
| Comment by Di Wang [ 15/May/14 ] |
| Comment by Peter Jones [ 03/Jun/14 ] |
|
Landed for 2.6 |