Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12853

general protection fault: 0000 RIP: keys_fill

    XMLWordPrintable

Details

    • 3
    • 9223372036854775807

    Description

      reproducer is replay-dual/25 with low rate.

      [  324.426593] general protection fault: 0000 [#1] SMP 
      [  324.431986] Modules linked in: osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) ...
      [  324.477340] CPU: 0 PID: 11422 Comm: tgt_recover_0 Tainted: G           OE  ------------   3.10.0-693.21.1.x3.2.152.x86_64 #1
      [  324.482771] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [  324.486161] task: ffff88009e422f70 ti: ffff88009e428000 task.ti: ffff88009e428000
      [  324.490071] RIP: 0010:[<ffffffffc08b77cc>]  [<ffffffffc08b77cc>] keys_fill+0x5c/0x180 [obdclass]
      [  324.494522] RSP: 0018:ffff88009e42bad0  EFLAGS: 00010246
      [  324.497767] RAX: 5a5a5a5a5a5a5a5a RBX: 0000000000000000 RCX: ffff88009e42bfd8
      [  324.501529] RDX: ffff88009e42baf8 RSI: 0000000000000002 RDI: ffffffffc091d100
      [  324.505262] RBP: ffff88009e42baf0 R08: 000000000001b940 R09: ffff88013b001b00
      [  324.508969] R10: ffffffffc0ff8a17 R11: ffff88013586f400 R12: ffffffffc091d1c0
      [  324.512676] R13: ffff88009e4bb250 R14: 0000000000000014 R15: ffff8800bbbcf4c8
      [  324.517164] FS:  0000000000000000(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
      [  324.524156] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [  324.529233] CR2: 00007fd17b3053e4 CR3: 00000000bb84c000 CR4: 00000000000006f0
      [  324.532950] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  324.537805] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  324.543260] Call Trace:
      [  324.546087]  [<ffffffffc08bc351>] lu_context_refill+0x41/0x50 [obdclass]
      [  324.551161]  [<ffffffffc08bc3e4>] lu_env_refill+0x24/0x30 [obdclass]
      [  324.554887]  [<ffffffffc0ff8ab1>] ofd_lvbo_init+0x2b1/0x8ad [ofd]
      [  324.559344]  [<ffffffffc0b1e6a0>] ldlm_server_completion_ast+0x600/0x990 [ptlrpc]
      [  324.565333]  [<ffffffffc0b1e0a0>] ? ldlm_server_blocking_ast+0xa40/0xa40 [ptlrpc]
      [  324.569117]  [<ffffffffc0aefd08>] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc]
      [  324.572619]  [<ffffffffc0b38452>] ptlrpc_set_wait+0x72/0x790 [ptlrpc]
      [  324.576416]  [<ffffffff811e4d1d>] ? kmem_cache_alloc_node_trace+0x11d/0x210
      [  324.580625]  [<ffffffffc089a389>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
      [  324.584984]  [<ffffffffc0aefc60>] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc]
      [  324.589275]  [<ffffffffc0b2ec92>] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc]
      [  324.593275]  [<ffffffffc0af52a5>] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc]
      [  324.597178]  [<ffffffffc0af682e>] __ldlm_reprocess_all+0x10e/0x350 [ptlrpc]
      [  324.600564]  [<ffffffffc0af6dd6>] ldlm_reprocess_res+0x26/0x30 [ptlrpc]
      [  324.603862]  [<ffffffffc07423f0>] cfs_hash_for_each_relax+0x250/0x450 [libcfs]
      [  324.607537]  [<ffffffffc0af6db0>] ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc]
      [  324.612088]  [<ffffffffc0af6db0>] ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc]
      [  324.617688]  [<ffffffffc0745785>] cfs_hash_for_each_nolock+0x75/0x1c0 [libcfs]
      [  324.622078]  [<ffffffffc0af6e1c>] ldlm_reprocess_recovery_done+0x3c/0x110 [ptlrpc]
      [  324.626371]  [<ffffffffc0b0a3f1>] target_recovery_thread+0xd21/0x11d0 [ptlrpc]
      [  324.630650]  [<ffffffffc0b096d0>] ? replay_request_or_update.isra.23+0x8c0/0x8c0 [ptlrpc]
      [  324.635262]  [<ffffffff810b4031>] kthread+0xd1/0xe0
      [  324.638630]  [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40
      [  324.642747]  [<ffffffff816c4577>] ret_from_fork+0x77/0xb0
      [  324.646459]  [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40
      [  324.650524] Code: 5b 59 06 00 0f 1f 00 31 db eb 15 0f 1f 40 00 48 83 c
       

      I think the situation was
      handle_recovery_req use the environment in next way

             req->rq_session.lc_thread = thread;
             req->rq_svc_thread = thread;
             req->rq_svc_thread->t_env->le_ses = &req->rq_session;
             /* thread context */
             lu_context_enter(&thread->t_env->le_ctx);
             (void)handler(req);
             lu_context_exit(&thread->t_env->le_ctx); (edited) 
      

      after that request could be freed, and env->le_ses would point to freed memory
      later ofd_lvbo_init took env from percpu chache, doing refill, found le_ses is not zero and tried to do keys_fill. But this part of memory has gone / invalid.

       ofd_lvbo_init()-> env = lu_env_find(); info = ofd_info(env);-> lu_env_refill((void *)env);->lu_context_refill(env->le_ses)->keys_fill(ctxt)
      

      Attachments

        Issue Links

          Activity

            People

              aboyko Alexander Boyko
              aboyko Alexander Boyko
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: