Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.12.3
-
3
-
9223372036854775807
Description
reproducer is replay-dual/25 with low rate.
[ 324.426593] general protection fault: 0000 [#1] SMP [ 324.431986] Modules linked in: osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) ... [ 324.477340] CPU: 0 PID: 11422 Comm: tgt_recover_0 Tainted: G OE ------------ 3.10.0-693.21.1.x3.2.152.x86_64 #1 [ 324.482771] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 324.486161] task: ffff88009e422f70 ti: ffff88009e428000 task.ti: ffff88009e428000 [ 324.490071] RIP: 0010:[<ffffffffc08b77cc>] [<ffffffffc08b77cc>] keys_fill+0x5c/0x180 [obdclass] [ 324.494522] RSP: 0018:ffff88009e42bad0 EFLAGS: 00010246 [ 324.497767] RAX: 5a5a5a5a5a5a5a5a RBX: 0000000000000000 RCX: ffff88009e42bfd8 [ 324.501529] RDX: ffff88009e42baf8 RSI: 0000000000000002 RDI: ffffffffc091d100 [ 324.505262] RBP: ffff88009e42baf0 R08: 000000000001b940 R09: ffff88013b001b00 [ 324.508969] R10: ffffffffc0ff8a17 R11: ffff88013586f400 R12: ffffffffc091d1c0 [ 324.512676] R13: ffff88009e4bb250 R14: 0000000000000014 R15: ffff8800bbbcf4c8 [ 324.517164] FS: 0000000000000000(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000 [ 324.524156] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 324.529233] CR2: 00007fd17b3053e4 CR3: 00000000bb84c000 CR4: 00000000000006f0 [ 324.532950] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 324.537805] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 324.543260] Call Trace: [ 324.546087] [<ffffffffc08bc351>] lu_context_refill+0x41/0x50 [obdclass] [ 324.551161] [<ffffffffc08bc3e4>] lu_env_refill+0x24/0x30 [obdclass] [ 324.554887] [<ffffffffc0ff8ab1>] ofd_lvbo_init+0x2b1/0x8ad [ofd] [ 324.559344] [<ffffffffc0b1e6a0>] ldlm_server_completion_ast+0x600/0x990 [ptlrpc] [ 324.565333] [<ffffffffc0b1e0a0>] ? ldlm_server_blocking_ast+0xa40/0xa40 [ptlrpc] [ 324.569117] [<ffffffffc0aefd08>] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] [ 324.572619] [<ffffffffc0b38452>] ptlrpc_set_wait+0x72/0x790 [ptlrpc] [ 324.576416] [<ffffffff811e4d1d>] ? kmem_cache_alloc_node_trace+0x11d/0x210 [ 324.580625] [<ffffffffc089a389>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [ 324.584984] [<ffffffffc0aefc60>] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] [ 324.589275] [<ffffffffc0b2ec92>] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] [ 324.593275] [<ffffffffc0af52a5>] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] [ 324.597178] [<ffffffffc0af682e>] __ldlm_reprocess_all+0x10e/0x350 [ptlrpc] [ 324.600564] [<ffffffffc0af6dd6>] ldlm_reprocess_res+0x26/0x30 [ptlrpc] [ 324.603862] [<ffffffffc07423f0>] cfs_hash_for_each_relax+0x250/0x450 [libcfs] [ 324.607537] [<ffffffffc0af6db0>] ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc] [ 324.612088] [<ffffffffc0af6db0>] ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc] [ 324.617688] [<ffffffffc0745785>] cfs_hash_for_each_nolock+0x75/0x1c0 [libcfs] [ 324.622078] [<ffffffffc0af6e1c>] ldlm_reprocess_recovery_done+0x3c/0x110 [ptlrpc] [ 324.626371] [<ffffffffc0b0a3f1>] target_recovery_thread+0xd21/0x11d0 [ptlrpc] [ 324.630650] [<ffffffffc0b096d0>] ? replay_request_or_update.isra.23+0x8c0/0x8c0 [ptlrpc] [ 324.635262] [<ffffffff810b4031>] kthread+0xd1/0xe0 [ 324.638630] [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40 [ 324.642747] [<ffffffff816c4577>] ret_from_fork+0x77/0xb0 [ 324.646459] [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40 [ 324.650524] Code: 5b 59 06 00 0f 1f 00 31 db eb 15 0f 1f 40 00 48 83 c
I think the situation was
handle_recovery_req use the environment in next way
req->rq_session.lc_thread = thread; req->rq_svc_thread = thread; req->rq_svc_thread->t_env->le_ses = &req->rq_session; /* thread context */ lu_context_enter(&thread->t_env->le_ctx); (void)handler(req); lu_context_exit(&thread->t_env->le_ctx); (edited)
after that request could be freed, and env->le_ses would point to freed memory
later ofd_lvbo_init took env from percpu chache, doing refill, found le_ses is not zero and tried to do keys_fill. But this part of memory has gone / invalid.
ofd_lvbo_init()-> env = lu_env_find(); info = ofd_info(env);-> lu_env_refill((void *)env);->lu_context_refill(env->le_ses)->keys_fill(ctxt)
Attachments
Issue Links
- mentioned in
-
Page Loading...