Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: Lustre 2.5.3
    • Fix Version/s: Lustre 2.7.0
    • Labels:
      None
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      On MDT LNet service thread hung Dumped stack trace

      an 20 13:42:10 nbp8-mds1 kernel: LNet: Service thread pid 17862 was inactive for 424.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Jan 20 13:42:10 nbp8-mds1 kernel: Pid: 17862, comm: mdt00_057
      Jan 20 13:42:14 nbp8-mds1 kernel: 
      Jan 20 13:42:14 nbp8-mds1 kernel: Call Trace:
      Jan 20 13:42:14 nbp8-mds1 kernel: [<ffffffff8155a9c2>] schedule_timeout+0x192/0x2e0
      Jan 20 13:42:14 nbp8-mds1 kernel: [<ffffffff81083300>] ? process_timeout+0x0/0x10
      Jan 20 13:42:14 nbp8-mds1 kernel: [<ffffffffa078af70>] ? ldlm_expired_completion_wait+0x0/0x360 [ptlrpc]
      Jan 20 13:42:14 nbp8-mds1 kernel: [<ffffffffa078f7a1>] ldlm_completion_ast+0x4b1/0x920 [ptlrpc]
      Jan 20 13:42:14 nbp8-mds1 kernel: [<ffffffff81061fe0>] ? default_wake_function+0x0/0x20
      Jan 20 13:42:14 nbp8-mds1 kernel: [<ffffffffa078ef00>] ldlm_cli_enqueue_local+0x1f0/0x5e0 [ptlrpc]
      Jan 20 13:42:14 nbp8-mds1 kernel: [<ffffffffa078f2f0>] ? ldlm_completion_ast+0x0/0x920 [ptlrpc]
      Jan 20 13:42:14 nbp8-mds1 kernel: [<ffffffffa0e72de0>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jan 20 13:42:17 nbp8-mds1 kernel: [<ffffffffa0e7cde4>] mdt_object_lock0+0x394/0xb30 [mdt]
      Jan 20 13:42:17 nbp8-mds1 kernel: [<ffffffffa0e72de0>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jan 20 13:42:17 nbp8-mds1 kernel: [<ffffffffa078f2f0>] ? ldlm_completion_ast+0x0/0x920 [ptlrpc]
      Jan 20 13:42:17 nbp8-mds1 kernel: [<ffffffffa0e7d644>] mdt_object_lock+0x14/0x20 [mdt]
      Jan 20 13:42:17 nbp8-mds1 kernel: [<ffffffffa0e7d801>] mdt_object_find_lock+0x61/0x170 [mdt]
      Jan 20 13:42:17 nbp8-mds1 kernel: [<ffffffffa0ea982c>] mdt_reint_open+0x88c/0x21a0 [mdt]
      Jan 20 13:42:17 nbp8-mds1 kernel: [<ffffffffa04fb5d6>] ? upcall_cache_get_entry+0x296/0x880 [libcfs]
      Jan 20 13:42:17 nbp8-mds1 kernel: [<ffffffffa0652ef0>] ? lu_ucred+0x20/0x30 [obdclass]
      Jan 20 13:42:17 nbp8-mds1 kernel: [<ffffffffa0e71935>] ? mdt_ucred+0x15/0x20 [mdt]
      Jan 20 13:42:17 nbp8-mds1 kernel: [<ffffffffa0e8e51c>] ? mdt_root_squash+0x2c/0x3f0 [mdt]
      Jan 20 13:42:17 nbp8-mds1 kernel: [<ffffffffa07df766>] ? __req_capsule_get+0x166/0x710 [ptlrpc]
      Jan 20 13:42:17 nbp8-mds1 kernel: [<ffffffffa0e92481>] mdt_reint_rec+0x41/0xe0 [mdt]
      Jan 20 13:42:17 nbp8-mds1 kernel: [<ffffffffa0e77ed3>] mdt_reint_internal+0x4c3/0x780 [mdt]
      Jan 20 13:42:17 nbp8-mds1 kernel: [<ffffffffa0e7845e>] mdt_intent_reint+0x1ee/0x410 [mdt]
      Jan 20 13:42:17 nbp8-mds1 kernel: [<ffffffffa0e75c3e>] mdt_intent_policy+0x3ae/0x770 [mdt]
      Jan 20 13:42:17 nbp8-mds1 kernel: [<ffffffffa076f2c5>] ldlm_lock_enqueue+0x135/0x980 [ptlrpc]
      Jan 20 13:42:17 nbp8-mds1 kernel: [<ffffffffa0798ebb>] ldlm_handle_enqueue0+0x51b/0x10c0 [ptlrpc]
      Jan 20 13:42:17 nbp8-mds1 kernel: [<ffffffffa0e76106>] mdt_enqueue+0x46/0xe0 [mdt]
      Jan 20 13:42:17 nbp8-mds1 kernel: [<ffffffffa0e7aada>] mdt_handle_common+0x52a/0x1470 [mdt]
      Jan 20 13:42:17 nbp8-mds1 kernel: [<ffffffffa0eb74a5>] mds_regular_handle+0x15/0x20 [mdt]
      Jan 20 13:42:17 nbp8-mds1 kernel: [<ffffffffa07c80c5>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
      Jan 20 13:42:17 nbp8-mds1 kernel: [<ffffffffa04f08d5>] ? lc_watchdog_touch+0x65/0x170 [libcfs]
      Jan 20 13:42:17 nbp8-mds1 kernel: [<ffffffffa07c0a69>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
      Jan 20 13:42:17 nbp8-mds1 kernel: [<ffffffffa07ca89d>] ptlrpc_main+0xafd/0x1780 [ptlrpc]
      Jan 20 13:42:17 nbp8-mds1 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
      Jan 20 13:42:17 nbp8-mds1 kernel: [<ffffffffa07c9da0>] ? ptlrpc_main+0x0/0x1780 [ptlrpc]
      Jan 20 13:42:17 nbp8-mds1 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
      
      an 20 13:42:18 nbp8-mds1 kernel: Pid: 17780, comm: mdt00_044
      Jan 20 13:42:18 nbp8-mds1 kernel: 
      Jan 20 13:42:18 nbp8-mds1 kernel: Call Trace:
      Jan 20 13:42:18 nbp8-mds1 kernel: [<ffffffff8155a9c2>] schedule_timeout+0x192/0x2e0
      Jan 20 13:42:18 nbp8-mds1 kernel: [<ffffffff81083300>] ? process_timeout+0x0/0x10
      Jan 20 13:42:18 nbp8-mds1 kernel: [<ffffffffa078af70>] ? ldlm_expired_completion_wait+0x0/0x360 [ptlrpc]
      Jan 20 13:42:18 nbp8-mds1 kernel: [<ffffffffa078f7a1>] ldlm_completion_ast+0x4b1/0x920 [ptlrpc]
      Jan 20 13:42:18 nbp8-mds1 kernel: [<ffffffff81061fe0>] ? default_wake_function+0x0/0x20
      Jan 20 13:42:18 nbp8-mds1 kernel: [<ffffffffa078ef00>] ldlm_cli_enqueue_local+0x1f0/0x5e0 [ptlrpc]
      Jan 20 13:42:18 nbp8-mds1 kernel: [<ffffffffa078f2f0>] ? ldlm_completion_ast+0x0/0x920 [ptlrpc]
      Jan 20 13:42:18 nbp8-mds1 kernel: [<ffffffffa0e72de0>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jan 20 13:42:18 nbp8-mds1 kernel: [<ffffffffa0e7cc06>] mdt_object_lock0+0x1b6/0xb30 [mdt]
      Jan 20 13:42:18 nbp8-mds1 kernel: [<ffffffffa0e72de0>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jan 20 13:42:18 nbp8-mds1 kernel: [<ffffffffa078f2f0>] ? ldlm_completion_ast+0x0/0x920 [ptlrpc]
      Jan 20 13:42:18 nbp8-mds1 kernel: [<ffffffffa0e7d644>] mdt_object_lock+0x14/0x20 [mdt]
      Jan 20 13:42:18 nbp8-mds1 kernel: [<ffffffffa0e85b8e>] mdt_getattr_name_lock+0x8fe/0x19d0 [mdt]
      Jan 20 13:42:18 nbp8-mds1 kernel: [<ffffffffa07df766>] ? __req_capsule_get+0x166/0x710 [ptlrpc]
      Jan 20 13:42:18 nbp8-mds1 kernel: [<ffffffffa07ba7b4>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
      Jan 20 13:42:18 nbp8-mds1 kernel: [<ffffffffa0e86ef9>] mdt_intent_getattr+0x299/0x480 [mdt]
      Jan 20 13:42:18 nbp8-mds1 kernel: [<ffffffffa0e75c3e>] mdt_intent_policy+0x3ae/0x770 [mdt]
      Jan 20 13:42:18 nbp8-mds1 kernel: [<ffffffffa076f2c5>] ldlm_lock_enqueue+0x135/0x980 [ptlrpc]
      Jan 20 13:42:18 nbp8-mds1 kernel: [<ffffffffa0798ebb>] ldlm_handle_enqueue0+0x51b/0x10c0 [ptlrpc]
      Jan 20 13:42:18 nbp8-mds1 kernel: [<ffffffffa0e76106>] mdt_enqueue+0x46/0xe0 [mdt]
      Jan 20 13:42:18 nbp8-mds1 kernel: [<ffffffffa0e7aada>] mdt_handle_common+0x52a/0x1470 [mdt]
      Jan 20 13:42:18 nbp8-mds1 kernel: [<ffffffffa0eb74a5>] mds_regular_handle+0x15/0x20 [mdt]
      Jan 20 13:42:18 nbp8-mds1 kernel: [<ffffffffa07c80c5>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
      Jan 20 13:42:18 nbp8-mds1 kernel: [<ffffffff81061ff2>] ? default_wake_function+0x12/0x20
      Jan 20 13:42:18 nbp8-mds1 kernel: [<ffffffffa07ca89d>] ptlrpc_main+0xafd/0x1780 [ptlrpc]
      Jan 20 13:42:18 nbp8-mds1 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
      Jan 20 13:42:18 nbp8-mds1 kernel: [<ffffffffa07c9da0>] ? ptlrpc_main+0x0/0x1780 [ptlrpc]
      Jan 20 13:42:19 nbp8-mds1 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
      Jan 20 13:42:19 nbp8-mds1 kernel: 
      Jan 20 13:42:19 nbp8-mds1 kernel: Pid: 16840, comm: mdt03_017
      Jan 20 13:42:19 nbp8-mds1 kernel: 
      Jan 20 13:42:19 nbp8-mds1 kernel: Call Trace:
      Jan 20 13:42:19 nbp8-mds1 kernel: [<ffffffffa076a885>] ? _ldlm_lock_debug+0x2d5/0x660 [ptlrpc]
      Jan 20 13:42:19 nbp8-mds1 kernel: [<ffffffff8155a9c2>] schedule_timeout+0x192/0x2e0
      Jan 20 13:42:19 nbp8-mds1 kernel: [<ffffffff81083300>] ? process_timeout+0x0/0x10
      Jan 20 13:42:19 nbp8-mds1 kernel: [<ffffffffa078af70>] ? ldlm_expired_completion_wait+0x0/0x360 [ptlrpc]
      Jan 20 13:42:19 nbp8-mds1 kernel: [<ffffffffa078f7a1>] ldlm_completion_ast+0x4b1/0x920 [ptlrpc]
      Jan 20 13:42:20 nbp8-mds1 kernel: [<ffffffff81061fe0>] ? default_wake_function+0x0/0x20
      Jan 20 13:42:20 nbp8-mds1 kernel: [<ffffffffa078ef00>] ldlm_cli_enqueue_local+0x1f0/0x5e0 [ptlrpc]
      Jan 20 13:42:20 nbp8-mds1 kernel: [<ffffffffa078f2f0>] ? ldlm_completion_ast+0x0/0x920 [ptlrpc]
      Jan 20 13:42:20 nbp8-mds1 kernel: [<ffffffffa0e72de0>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jan 20 13:42:20 nbp8-mds1 kernel: [<ffffffffa0e7cc06>] mdt_object_lock0+0x1b6/0xb30 [mdt]
      Jan 20 13:42:20 nbp8-mds1 kernel: [<ffffffffa0e72de0>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jan 20 13:42:20 nbp8-mds1 kernel: [<ffffffffa078f2f0>] ? ldlm_completion_ast+0x0/0x920 [ptlrpc]
      Jan 20 13:42:20 nbp8-mds1 kernel: [<ffffffffa0e7d644>] mdt_object_lock+0x14/0x20 [mdt]
      Jan 20 13:42:20 nbp8-mds1 kernel: [<ffffffffa0e85b8e>] mdt_getattr_name_lock+0x8fe/0x19d0 [mdt]
      Jan 20 13:42:20 nbp8-mds1 kernel: [<ffffffffa07df766>] ? __req_capsule_get+0x166/0x710 [ptlrpc]
      Jan 20 13:42:20 nbp8-mds1 kernel: [<ffffffffa07ba7b4>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
      Jan 20 13:42:20 nbp8-mds1 kernel: [<ffffffffa0e86ef9>] mdt_intent_getattr+0x299/0x480 [mdt]
      Jan 20 13:42:20 nbp8-mds1 kernel: [<ffffffffa0e75c3e>] mdt_intent_policy+0x3ae/0x770 [mdt]
      Jan 20 13:42:20 nbp8-mds1 kernel: [<ffffffffa076f2c5>] ldlm_lock_enqueue+0x135/0x980 [ptlrpc]
      Jan 20 13:42:20 nbp8-mds1 kernel: [<ffffffffa0798ebb>] ldlm_handle_enqueue0+0x51b/0x10c0 [ptlrpc]
      Jan 20 13:42:20 nbp8-mds1 kernel: [<ffffffffa0e76106>] mdt_enqueue+0x46/0xe0 [mdt]
      Jan 20 13:42:20 nbp8-mds1 kernel: [<ffffffffa0e7aada>] mdt_handle_common+0x52a/0x1470 [mdt]
      Jan 20 13:42:20 nbp8-mds1 kernel: [<ffffffffa0eb74a5>] mds_regular_handle+0x15/0x20 [mdt]
      Jan 20 13:42:20 nbp8-mds1 kernel: [<ffffffffa07c80c5>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
      Jan 20 13:42:20 nbp8-mds1 kernel: [<ffffffffa04f08d5>] ? lc_watchdog_touch+0x65/0x170 [libcfs]
      Jan 20 13:42:20 nbp8-mds1 kernel: [<ffffffffa07c0a69>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
      Jan 20 13:42:20 nbp8-mds1 kernel: [<ffffffffa07ca89d>] ptlrpc_main+0xafd/0x1780 [ptlrpc]
      Jan 20 13:42:20 nbp8-mds1 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
      Jan 20 13:42:20 nbp8-mds1 kernel: [<ffffffffa07c9da0>] ? ptlrpc_main+0x0/0x1780 [ptlrpc]
      Jan 20 13:42:20 nbp8-mds1 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
      

      Looks the same as LU-7232

      I can upload debug logs if needed.

        Attachments

        1. lustre-log.1453326130.17862.gz
          0.3 kB
          Mahmoud Hanafi
        2. mds.dmesg
          237 kB
          Mahmoud Hanafi
        3. mds.processtrace
          2.01 MB
          Mahmoud Hanafi

          Issue Links

            Activity

              People

              • Assignee:
                bobijam Zhenyu Xu
                Reporter:
                mhanafi Mahmoud Hanafi
              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: