Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1395

MDS hangs after calltrace at ldlm_expired_completion_wait()

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 1.8.6
    • None
    • 3
    • 10343

    Description

      We saw the following call traces on MDS and it hanged after it.

      Apr 23 15:58:34 ALPL505 kernel: Call Trace:
      Apr 23 15:58:34 ALPL505 kernel:  [<ffffffff88953a00>] ldlm_expired_completion_wait+0x0/0x250 [ptlrpc]
      Apr 23 15:58:34 ALPL505 kernel:  [<ffffffff88955542>] ldlm_completion_ast+0x4c2/0x880 [ptlrpc]
      Apr 23 15:58:34 ALPL505 kernel:  [<ffffffff8893a709>] ldlm_lock_enqueue+0x9d9/0xb20 [ptlrpc]
      Apr 23 15:58:34 ALPL505 kernel:  [<ffffffff8008e421>] default_wake_function+0x0/0xe
      Apr 23 15:58:34 ALPL505 kernel:  [<ffffffff88935b6a>] ldlm_lock_addref_internal_nolock+0x3a/0x90 [ptlrpc]
      Apr 23 15:58:34 ALPL505 kernel:  [<ffffffff889540bb>] ldlm_cli_enqueue_local+0x46b/0x520 [ptlrpc]
      Apr 23 15:58:34 ALPL505 kernel:  [<ffffffff88caa157>] enqueue_ordered_locks+0x387/0x4d0 [mds]
      Apr 23 15:58:34 ALPL505 kernel:  [<ffffffff889519a0>] ldlm_blocking_ast+0x0/0x2a0 [ptlrpc]
      Apr 23 15:58:34 ALPL505 kernel:  [<ffffffff88955080>] ldlm_completion_ast+0x0/0x880 [ptlrpc]
      Apr 23 15:58:34 ALPL505 kernel:  [<ffffffff88caa8e9>] mds_get_parent_child_locked+0x649/0x960 [mds]
      Apr 23 15:58:34 ALPL505 kernel:  [<ffffffff88c9b652>] mds_getattr_lock+0x632/0xc90 [mds]
      Apr 23 15:58:34 ALPL505 kernel:  [<ffffffff88c96dda>] fixup_handle_for_resent_req+0x5a/0x2c0 [mds]
      Apr 23 15:58:34 ALPL505 kernel:  [<ffffffff88ca1d83>] mds_intent_policy+0x623/0xc20 [mds]
      Apr 23 15:58:34 ALPL505 kernel:  [<ffffffff8893c270>] ldlm_resource_putref_internal+0x230/0x460 [ptlrpc]
      Apr 23 15:58:34 ALPL505 kernel:  [<ffffffff88939eb6>] ldlm_lock_enqueue+0x186/0xb20 [ptlrpc]
      Apr 23 15:58:34 ALPL505 kernel:  [<ffffffff889367fd>] ldlm_lock_create+0x9bd/0x9f0 [ptlrpc]
      Apr 23 15:58:34 ALPL505 kernel:  [<ffffffff8895e870>] ldlm_server_blocking_ast+0x0/0x83d [ptlrpc]
      Apr 23 15:58:34 ALPL505 kernel:  [<ffffffff8895bb39>] ldlm_handle_enqueue+0xc09/0x1210 [ptlrpc]
      Apr 23 15:58:34 ALPL505 kernel:  [<ffffffff88ca0b30>] mds_handle+0x40e0/0x4d10 [mds]
      Apr 23 15:58:34 ALPL505 kernel:  [<ffffffff800774ed>] smp_send_reschedule+0x4e/0x53
      Apr 23 15:58:34 ALPL505 kernel:  [<ffffffff8008ddcd>] enqueue_task+0x41/0x56
      Apr 23 15:58:34 ALPL505 kernel:  [<ffffffff8897fd55>] lustre_msg_get_conn_cnt+0x35/0xf0 [ptlrpc]
      Apr 23 15:58:34 ALPL505 kernel:  [<ffffffff889896d9>] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc]
      Apr 23 15:58:34 ALPL505 kernel:  [<ffffffff88989e35>] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc]
      Apr 23 15:58:34 ALPL505 kernel:  [<ffffffff8008c85d>] __wake_up_common+0x3e/0x68
      Apr 23 15:58:34 ALPL505 kernel:  [<ffffffff8898adc6>] ptlrpc_main+0xf66/0x1120 [ptlrpc]
      Apr 23 15:58:34 ALPL505 kernel:  [<ffffffff8005dfb1>] child_rip+0xa/0x11
      Apr 23 15:58:34 ALPL505 kernel:  [<ffffffff88989e60>] ptlrpc_main+0x0/0x1120 [ptlrpc]
      Apr 23 15:58:34 ALPL505 kernel:  [<ffffffff8005dfa7>] child_rip+0x0/0x11
      

      this might be related to LU-59, but please review on this.

      Attachments

        Issue Links

          Activity

            People

              green Oleg Drokin
              ihara Shuichi Ihara (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: