Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14182

cancel layout lock on replay - deadlock

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.15.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      layout locks are not replayed and instead cancelled as unused, what requires to take lov_conf_lock. the semaphore may be already taken by cl_lock_flush() which prepares a new IO which is not be able to be sent to MDS as it is in the recovery.

      PID: 17992 TASK: ffff9a1790ca4100 CPU: 7 COMMAND: "ptlrpcd_rcv"
      #0 [ffff9a1776a436d8] __schedule at ffffffffb9167747
      #1 [ffff9a1776a43760] schedule at ffffffffb9167c49
      #2 [ffff9a1776a43770] rwsem_down_write_failed at ffffffffb9169535
      #3 [ffff9a1776a43808] call_rwsem_down_write_failed at ffffffffb8d86fb7
      #4 [ffff9a1776a43850] down_write at ffffffffb9166f4d
      #5 [ffff9a1776a43868] lov_conf_lock at ffffffffc0ea50ae [lov]
      #6 [ffff9a1776a43888] lov_conf_set at ffffffffc0ea9143 [lov]
      #7 [ffff9a1776a438f8] cl_conf_set at ffffffffc0ba33c3 [obdclass]
      #8 [ffff9a1776a43928] ll_layout_conf at ffffffffc0fd1761 [lustre]
      #9 [ffff9a1776a439a8] ll_lock_cancel_bits at ffffffffc0ff4975 [lustre]
      #10 [ffff9a1776a43a28] ll_md_blocking_ast at ffffffffc0ff501f [lustre]
      #11 [ffff9a1776a43a60] ldlm_cancel_callback at ffffffffc0d4757a [ptlrpc]
      #12 [ffff9a1776a43ae0] ldlm_lock_cancel at ffffffffc0d47876 [ptlrpc]
      #13 [ffff9a1776a43b00] ldlm_cli_cancel_list_local at ffffffffc0d579c5 [ptlrpc]
      #14 [ffff9a1776a43b68] ldlm_cancel_lru_local at ffffffffc0d5815b [ptlrpc]
      #15 [ffff9a1776a43b88] ldlm_replay_locks at ffffffffc0d5aa47 [ptlrpc]
      #16 [ffff9a1776a43c00] ptlrpc_import_recovery_state_machine at ffffffffc0d9a145 [ptlrpc]
      #17 [ffff9a1776a43c48] ptlrpc_replay_interpret at ffffffffc0d6d771 [ptlrpc]
      #18 [ffff9a1776a43c98] ptlrpc_check_set at ffffffffc0d713d1 [ptlrpc]

      #0 [ffff9a1642fd75f8] __schedule at ffffffffb9167747
      #1 [ffff9a1642fd7680] schedule at ffffffffb9167c49
      #2 [ffff9a1642fd7690] osc_extent_wait at ffffffffc0f04ced [osc]
      #3 [ffff9a1642fd77e0] osc_cache_wait_range at ffffffffc0f07097 [osc]
      #4 [ffff9a1642fd78d8] osc_cache_writeback_range at ffffffffc0f0805e [osc]
      #5 [ffff9a1642fd7a20] mdc_lock_flush at ffffffffc0f5dd8d [mdc]
      #6 [ffff9a1642fd7a80] mdc_dlm_blocking_ast0 at ffffffffc0f5e108 [mdc]
      #7 [ffff9a1642fd7ac0] mdc_object_flush at ffffffffc0f5e458 [mdc]
      #8 [ffff9a1642fd7ad0] cl_object_flush at ffffffffc0ba34e3 [obdclass]
      #9 [ffff9a1642fd7b00] lov_flush_composite at ffffffffc0ea57e3 [lov]
      #10 [ffff9a1642fd7b28] lov_object_flush at ffffffffc0ea55ee [lov] <— lov_conf_freeze
      #11 [ffff9a1642fd7b50] cl_object_flush at ffffffffc0ba34e3 [obdclass]
      #12 [ffff9a1642fd7b80] ll_dom_lock_cancel at ffffffffc0ff4365 [lustre]
      #13 [ffff9a1642fd7bb8] ll_lock_cancel_bits at ffffffffc0ff499b [lustre]
      #14 [ffff9a1642fd7c38] ll_md_blocking_ast at ffffffffc0ff501f [lustre]
      #15 [ffff9a1642fd7c70] ldlm_cancel_callback at ffffffffc0d4757a [ptlrpc]
      #16 [ffff9a1642fd7cf0] ldlm_cli_cancel_local at ffffffffc0d53461 [ptlrpc]
      #17 [ffff9a1642fd7d18] ldlm_cli_cancel at ffffffffc0d5938c [ptlrpc]
      #18 [ffff9a1642fd7da8] ll_md_blocking_ast at ffffffffc0ff516a [lustre]
      #19 [ffff9a1642fd7de0] ldlm_handle_bl_callback at ffffffffc0d5d7d8 [ptlrpc]
      #20 [ffff9a1642fd7e10] ldlm_bl_thread_main at ffffffffc0d5e18d [ptlrpc]

      Attachments

        Activity

          People

            vitaly_fertman Vitaly Fertman
            vitaly_fertman Vitaly Fertman
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: