[LU-2895] recovery-small 24a: osp_precreate_get_fid(): ASSERTION( lu_fid_diff(&d->opd_pre_used_fid, &d->opd_pre_last_created_fid) < 0 ) failed: next fid [0x0:0x1:0x0] last created fid [0x0:0x1:0x0] Created: 01/Mar/13 Updated: 25/Mar/13 Resolved: 25/Mar/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | LB | ||
| Severity: | 3 |
| Rank (Obsolete): | 6979 |
| Description |
|
This issue was created by maloo for Li Wei <liwei@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/0c78ac7a-81ad-11e2-9f6b-52540035b04c. The sub-test test_24a failed with the following error:
Info required for matching: recovery-small 24a First, the test failed with return code 1. Second, the MDS hit an assertion failure afterward: 12:39:11:Lustre: DEBUG MARKER: == recovery-small test 24a: fsync error (should return error) == 12:39:10 (1361997550) 12:39:36:Lustre: lustre-OST0000-osc-MDT0000: slow creates, last=[0x0:0x1:0x0], next=[0x0:0x1:0x0], reserved=0, syn_changes=0, syn_rpc_in_progress=0, status=0 12:39:36:LustreError: 18701:0:(osp_precreate.c:1164:osp_precreate_get_fid()) ASSERTION( lu_fid_diff(&d->opd_pre_used_fid, &d->opd_pre_last_created_fid) < 0 ) failed: next fid [0x0:0x1:0x0] last created fid [0x0:0x1:0x0] 12:39:36:LustreError: 18701:0:(osp_precreate.c:1164:osp_precreate_get_fid()) LBUG 12:39:36:Pid: 18701, comm: mdt00_000 12:39:36: 12:39:37:Call Trace: 12:39:37: [<ffffffffa06f1895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 12:39:37: [<ffffffffa06f1e97>] lbug_with_loc+0x47/0xb0 [libcfs] 12:39:37: [<ffffffffa0ba4f63>] osp_precreate_get_fid+0x253/0x310 [osp] 12:39:37: [<ffffffffa07022e1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 12:39:37: [<ffffffffa0ba28cb>] osp_object_assign_fid+0x7b/0x160 [osp] 12:39:37: [<ffffffffa0ba30b6>] osp_object_create+0x406/0x890 [osp] 12:39:38: [<ffffffffa07022e1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 12:39:38: [<ffffffffa0b5e683>] lod_striping_create+0x193/0x320 [lod] 12:39:38: [<ffffffffa0b5e983>] lod_object_create+0x173/0x260 [lod] 12:39:38: [<ffffffffa0a1a580>] mdd_object_create_internal+0xe0/0x220 [mdd] 12:39:38: [<ffffffffa0a2d22c>] mdd_create+0x77c/0x11f0 [mdd] 12:39:38: [<ffffffffa07022e1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 12:39:38: [<ffffffffa0ac0ef9>] mdt_reint_open+0x12b9/0x1ab0 [mdt] 12:39:39: [<ffffffffa07022e1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 12:39:39: [<ffffffffa0aab7f1>] mdt_reint_rec+0x41/0xe0 [mdt] 12:39:39: [<ffffffffa0aa4e53>] mdt_reint_internal+0x4e3/0x7d0 [mdt] 12:39:39: [<ffffffffa0aa540d>] mdt_intent_reint+0x1ed/0x4f0 [mdt] 12:39:39: [<ffffffffa0aa0fde>] mdt_intent_policy+0x3ae/0x750 [mdt] 12:39:39: [<ffffffffa0fb0351>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] 12:39:39: [<ffffffffa0fd6447>] ldlm_handle_enqueue0+0x4f7/0x1080 [ptlrpc] 12:39:39: [<ffffffffa0aa14b6>] mdt_enqueue+0x46/0x110 [mdt] 12:39:39: [<ffffffffa0a96078>] mdt_handle_common+0x628/0x1620 [mdt] 12:39:39: [<ffffffffa0acd845>] mds_regular_handle+0x15/0x20 [mdt] 12:39:39: [<ffffffffa100802c>] ptlrpc_server_handle_request+0x41c/0xdf0 [ptlrpc] 12:39:39: [<ffffffffa06f25de>] ? cfs_timer_arm+0xe/0x10 [libcfs] 12:39:39: [<ffffffffa0fff759>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] 12:39:39: [<ffffffff8105fa40>] ? default_wake_function+0x0/0x20 12:39:39: [<ffffffffa1009576>] ptlrpc_main+0xb76/0x1870 [ptlrpc] 12:39:39: [<ffffffffa1008a00>] ? ptlrpc_main+0x0/0x1870 [ptlrpc] 12:39:39: [<ffffffff8100c0ca>] child_rip+0xa/0x20 12:39:39: [<ffffffffa1008a00>] ? ptlrpc_main+0x0/0x1870 [ptlrpc] 12:39:40: [<ffffffffa1008a00>] ? ptlrpc_main+0x0/0x1870 [ptlrpc] 12:39:40: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 12:39:40: 12:39:40:Kernel panic - not syncing: LBUG |
| Comments |
| Comment by Peter Jones [ 11/Mar/13 ] |
|
Niu Could you please look into this one? Thanks Peter |
| Comment by Niu Yawei (Inactive) [ 12/Mar/13 ] |
|
Looks the osp_precreate_reserve() should timedout: 12:39:36:Lustre: lustre-OST0000-osc-MDT0000: slow creates, last=[0x0:0x1:0x0], next=[0x0:0x1:0x0], reserved=0, syn_changes=0, syn_rpc_in_progress=0, status=0 but the osp_precreate_reserve() didn't return proper error for such case: if (cfs_time_aftereq(cfs_time_current(), expire)) break; The mds debug log is missed, so I'm not sure the exact reason of precreation isn't done (even the seq hasn't been initialized), but at least, we should return proper error in such case. |
| Comment by Niu Yawei (Inactive) [ 12/Mar/13 ] |
|
fix the LASSERT problem: http://review.whamcloud.com/5683 |
| Comment by Alex Zhuravlev [ 12/Mar/13 ] |
|
why rc taken from is not enough ? |
| Comment by Niu Yawei (Inactive) [ 12/Mar/13 ] |
|
Alex, the opd_pre_status was intialized as -EAGAIN, and it could be cleared to 0 in osp_precreate_thread() -> osp_statfs_update(), and the following osp_precreate_cleanup_orphans() wasn't success somehow, then we'll end up in above state: uninitialized opd_pre_last_created_fid & opd_pre_used_fid, opd_pre_status == 0, and osp_precreate_reserve() is possbile timedout with rc == 0. |
| Comment by Alex Zhuravlev [ 13/Mar/13 ] |
|
inability to setup FLD client or cleanup orphans should block any activity on this OSP, I'd think. |
| Comment by Peter Jones [ 25/Mar/13 ] |
|
Landed for 2.4 |