Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-948

Client recovery hang

    XMLWordPrintable

Details

    • 3
    • 4759

    Description

      When I rebooted two OSS to put a patch for bug LU-874 on the servers, quite a few of the clients have appear to have gotten deadlocked in recovery. Here's a backtrace of ptlrpcd-rcv on on client:

      crash> bt 5077
      PID: 5077   TASK: ffff88082da834c0  CPU: 8   COMMAND: "ptlrpcd-rcv"
       #0 [ffff88082da85430] schedule at ffffffff814ee3b2
       #1 [ffff88082da854f8] io_schedule at ffffffff814eeba3
       #2 [ffff88082da85518] sync_page at ffffffff81110fbd
       #3 [ffff88082da85528] __wait_on_bit_lock at ffffffff814ef40a
       #4 [ffff88082da85578] __lock_page at ffffffff81110f57
       #5 [ffff88082da855d8] vvp_page_own at ffffffffa093bf6a [lustre]
       #6 [ffff88082da855f8] cl_page_own0 at ffffffffa0601d3b [obdclass]
       #7 [ffff88082da85678] cl_page_own at ffffffffa0601fa0 [obdclass]
       #8 [ffff88082da85688] cl_page_gang_lookup at ffffffffa0603bb7 [obdclass]
       #9 [ffff88082da85758] cl_lock_page_out at ffffffffa06096fc [obdclass]
      #10 [ffff88082da85808] osc_lock_flush at ffffffffa0858e8f [osc]
      #11 [ffff88082da85858] osc_lock_cancel at ffffffffa0858f2a [osc]
      #12 [ffff88082da858d8] cl_lock_cancel0 at ffffffffa0604665 [obdclass]
      #13 [ffff88082da85928] cl_lock_cancel at ffffffffa06051ab [obdclass]
      #14 [ffff88082da85968] osc_ldlm_blocking_ast at ffffffffa0859cf8 [osc]
      #15 [ffff88082da859f8] ldlm_cancel_callback at ffffffffa06a1ba3 [ptlrpc]
      #16 [ffff88082da85a18] ldlm_lock_cancel at ffffffffa06a1c89 [ptlrpc]
      #17 [ffff88082da85a58] ldlm_cli_cancel_list_local at ffffffffa06bede8 [ptlrpc]
      #18 [ffff88082da85ae8] ldlm_cancel_lru_local at ffffffffa06bf255 [ptlrpc]
      #19 [ffff88082da85b08] ldlm_replay_locks at ffffffffa06bf385 [ptlrpc]
      #20 [ffff88082da85bb8] ptlrpc_import_recovery_state_machine at ffffffffa070ceea [ptlrpc]
      #21 [ffff88082da85c38] ptlrpc_connect_interpret at ffffffffa070db38 [ptlrpc]
      #22 [ffff88082da85d08] ptlrpc_check_set at ffffffffa06dd870 [ptlrpc]
      #23 [ffff88082da85de8] ptlrpcd_check at ffffffffa07113b8 [ptlrpc]
      #24 [ffff88082da85e48] ptlrpcd at ffffffffa071175b [ptlrpc]
      #25 [ffff88082da85f48] kernel_thread at ffffffff8100c14a
      

      I will need to do more investigation, but thats a start.

      Attachments

        Issue Links

          Activity

            People

              jay Jinshan Xiong (Inactive)
              morrone Christopher Morrone (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: