Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5886

DNE II Testing: Hung threads on MDS; evictions

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • None
    • None
    • 3
    • 16457

    Description

      During DNE II testing (same run as LU-5883), both of our MDSes reported hung threads here:
      Nov 6 20:22:21 perses-esf-mds001 kernel: Pid: 21117, comm: mdt02_008
      Nov 6 20:22:21 perses-esf-mds001 kernel:
      Nov 6 20:22:21 perses-esf-mds001 kernel: Call Trace:
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0aaa1a1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e32230>] ? ldlm_expired_completion_wait+0x0/0x360 [ptlrpc]
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e36df5>] ldlm_completion_ast+0x665/0x9a0 [ptlrpc]
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e3619e>] ldlm_cli_enqueue_local+0x21e/0x810 [ptlrpc]
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e36790>] ? ldlm_completion_ast+0x0/0x9a0 [ptlrpc]
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa1559d70>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa156240d>] mdt_object_local_lock+0x1bd/0xa80 [mdt]
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa1559d70>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e36790>] ? ldlm_completion_ast+0x0/0x9a0 [ptlrpc]
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa1562d35>] mdt_object_lock_internal+0x65/0x360 [mdt]
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa15630f4>] mdt_object_lock+0x14/0x20 [mdt]
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa156dd6c>] mdt_getattr_name_lock+0xd9c/0x1a50 [mdt]
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffff8128a0ea>] ? strlcpy+0x4a/0x60
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e63774>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e65d40>] ? lustre_swab_ldlm_reply+0x0/0x40 [ptlrpc]
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa156ef42>] mdt_intent_getattr+0x292/0x470 [mdt]
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa155bc44>] mdt_intent_policy+0x494/0xcf0 [mdt]
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e16549>] ldlm_lock_enqueue+0x129/0x9d0 [ptlrpc]
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e4235b>] ldlm_handle_enqueue0+0x51b/0x1340 [ptlrpc]
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0a9a4ce>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0ec3842>] tgt_enqueue+0x62/0x1d0 [ptlrpc]
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0ec40ce>] tgt_request_handle+0x71e/0xb10 [ptlrpc]
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e73964>] ptlrpc_main+0xe64/0x1990 [ptlrpc]
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffff81527c20>] ? thread_return+0x4e/0x76e
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e72b00>] ? ptlrpc_main+0x0/0x1990 [ptlrpc]
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffff8109aee6>] kthread+0x96/0xa0
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffff8109ae50>] ? kthread+0x0/0xa0
      Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
      Nov 6 20:22:21 perses-esf-mds001 kernel:

      Attachments

        Activity

          [LU-5886] DNE II Testing: Hung threads on MDS; evictions
          di.wang Di Wang added a comment -

          Patrick, could you please try current master to see if you can still reproduce the problem? Thanks.

          di.wang Di Wang added a comment - Patrick, could you please try current master to see if you can still reproduce the problem? Thanks.
          di.wang Di Wang added a comment -

          According to the console log, both MDTs are waiting for the lock of a remote directory.

          mds001

          Nov  6 20:22:21 perses-esf-mds001 kernel: Call Trace:
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0aaa1a1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e32230>] ? ldlm_expired_completion_wait+0x0/0x360 [ptlrpc]
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e36df5>] ldlm_completion_ast+0x665/0x9a0 [ptlrpc]
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e3619e>] ldlm_cli_enqueue_local+0x21e/0x810 [ptlrpc]
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e36790>] ? ldlm_completion_ast+0x0/0x9a0 [ptlrpc]
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa1559d70>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa156240d>] mdt_object_local_lock+0x1bd/0xa80 [mdt]
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa1559d70>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e36790>] ? ldlm_completion_ast+0x0/0x9a0 [ptlrpc]
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa1562d35>] mdt_object_lock_internal+0x65/0x360 [mdt]
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa15630f4>] mdt_object_lock+0x14/0x20 [mdt]
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa156dd6c>] mdt_getattr_name_lock+0xd9c/0x1a50 [mdt]
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffff8128a0ea>] ? strlcpy+0x4a/0x60
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e63774>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e65d40>] ? lustre_swab_ldlm_reply+0x0/0x40 [ptlrpc]
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa156ef42>] mdt_intent_getattr+0x292/0x470 [mdt]
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa155bc44>] mdt_intent_policy+0x494/0xcf0 [mdt]
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e16549>] ldlm_lock_enqueue+0x129/0x9d0 [ptlrpc]
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e4235b>] ldlm_handle_enqueue0+0x51b/0x1340 [ptlrpc]
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0a9a4ce>] ? cfs_timer_arm+0xe/0x10 [libcfs]
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0ec3842>] tgt_enqueue+0x62/0x1d0 [ptlrpc]
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0ec40ce>] tgt_request_handle+0x71e/0xb10 [ptlrpc]
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e73964>] ptlrpc_main+0xe64/0x1990 [ptlrpc]
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffff81527c20>] ? thread_return+0x4e/0x76e
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e72b00>] ? ptlrpc_main+0x0/0x1990 [ptlrpc]
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffff8109aee6>] kthread+0x96/0xa0
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffff8109ae50>] ? kthread+0x0/0xa0
          Nov  6 20:22:21 perses-esf-mds001 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
          Nov  6 20:22:21 perses-esf-mds001 kernel: 
          

          mds002

          Nov  6 20:22:16 perses-esf-mds002 kernel:  req@ffff8803330e1c00 x1484073163441132/t0(0) o36->ca4f794c-56f5-709e-3a90-0c25c515ac58@30@gni:0/0 lens 552/568 e 18 to 0 dl 1415326941 ref 2 fl Interpret:/0/0 rc 0/0
          Nov  6 20:22:21 perses-esf-mds002 kernel: LNet: Service thread pid 10838 was inactive for 600.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
          Nov  6 20:22:21 perses-esf-mds002 kernel: Pid: 10838, comm: mdt02_000
          Nov  6 20:22:21 perses-esf-mds002 kernel: 
          Nov  6 20:22:21 perses-esf-mds002 kernel: Call Trace:
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa0e01230>] ? ldlm_expired_completion_wait+0x0/0x360 [ptlrpc]
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa0e05df5>] ldlm_completion_ast+0x665/0x9a0 [ptlrpc]
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa0dffe76>] ldlm_cli_enqueue_fini+0x926/0xe60 [ptlrpc]
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa0e00761>] ldlm_cli_enqueue+0x3b1/0x860 [ptlrpc]
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa0e05790>] ? ldlm_completion_ast+0x0/0x9a0 [ptlrpc]
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa1502b60>] ? mdt_remote_blocking_ast+0x0/0x210 [mdt]
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa16d94a8>] osp_md_object_lock+0x188/0x210 [osp]
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa15f70bf>] lod_object_lock+0x13f/0x730 [lod]
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa166913b>] mdd_object_lock+0x3b/0xd0 [mdd]
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa150b07c>] mdt_remote_object_lock+0x14c/0x320 [mdt]
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa150bdee>] mdt_object_lock_internal+0x11e/0x360 [mdt]
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa150c0f4>] mdt_object_lock+0x14/0x20 [mdt]
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa1526f80>] mdt_reint_create+0x700/0xd40 [mdt]
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa0c216b0>] ? lu_ucred+0x20/0x30 [obdclass]
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa1501ed5>] ? mdt_ucred+0x15/0x20 [mdt]
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa151deec>] ? mdt_root_squash+0x2c/0x3f0 [mdt]
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa0e57a56>] ? __req_capsule_get+0x166/0x6e0 [ptlrpc]
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffff8128a0ea>] ? strlcpy+0x4a/0x60
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa15220fd>] mdt_reint_rec+0x5d/0x200 [mdt]
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa15061a3>] mdt_reint_internal+0x4d3/0x7b0 [mdt]
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa1506a0b>] mdt_reint+0x6b/0x120 [mdt]
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa0e930ce>] tgt_request_handle+0x71e/0xb10 [ptlrpc]
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa0e42964>] ptlrpc_main+0xe64/0x1990 [ptlrpc]
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffff81527c20>] ? thread_return+0x4e/0x76e
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa0e41b00>] ? ptlrpc_main+0x0/0x1990 [ptlrpc]
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffff8109aee6>] kthread+0x96/0xa0
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffff8109ae50>] ? kthread+0x0/0xa0
          Nov  6 20:22:21 perses-esf-mds002 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
          

          Unfortunately, the debug log you posted is about 1 hour earlier, could you be able to find these debug log

          Nov  6 20:22:21 perses-esf-mds002 kernel: 
          Nov  6 20:22:21 perses-esf-mds002 kernel: LustreError: dumping log to /tmp/lustre-log.1415326941.10838
          
          Nov  6 20:22:21 perses-esf-mds001 kernel: LustreError: dumping log to /tmp/lustre-log.1415326941.21117
          Nov  6 20:24:27 perses-esf-mds001 kernel: Lustre: lock timed out (enqueued at 1415326767, 300s ago)
          

          Though I am not sure there are valuable information there, but worth try. Thanks.

          di.wang Di Wang added a comment - According to the console log, both MDTs are waiting for the lock of a remote directory. mds001 Nov 6 20:22:21 perses-esf-mds001 kernel: Call Trace: Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0aaa1a1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e32230>] ? ldlm_expired_completion_wait+0x0/0x360 [ptlrpc] Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e36df5>] ldlm_completion_ast+0x665/0x9a0 [ptlrpc] Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffff81065df0>] ? default_wake_function+0x0/0x20 Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e3619e>] ldlm_cli_enqueue_local+0x21e/0x810 [ptlrpc] Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e36790>] ? ldlm_completion_ast+0x0/0x9a0 [ptlrpc] Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa1559d70>] ? mdt_blocking_ast+0x0/0x2a0 [mdt] Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa156240d>] mdt_object_local_lock+0x1bd/0xa80 [mdt] Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa1559d70>] ? mdt_blocking_ast+0x0/0x2a0 [mdt] Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e36790>] ? ldlm_completion_ast+0x0/0x9a0 [ptlrpc] Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa1562d35>] mdt_object_lock_internal+0x65/0x360 [mdt] Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa15630f4>] mdt_object_lock+0x14/0x20 [mdt] Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa156dd6c>] mdt_getattr_name_lock+0xd9c/0x1a50 [mdt] Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffff8128a0ea>] ? strlcpy+0x4a/0x60 Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e63774>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc] Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e65d40>] ? lustre_swab_ldlm_reply+0x0/0x40 [ptlrpc] Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa156ef42>] mdt_intent_getattr+0x292/0x470 [mdt] Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa155bc44>] mdt_intent_policy+0x494/0xcf0 [mdt] Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e16549>] ldlm_lock_enqueue+0x129/0x9d0 [ptlrpc] Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e4235b>] ldlm_handle_enqueue0+0x51b/0x1340 [ptlrpc] Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0a9a4ce>] ? cfs_timer_arm+0xe/0x10 [libcfs] Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0ec3842>] tgt_enqueue+0x62/0x1d0 [ptlrpc] Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0ec40ce>] tgt_request_handle+0x71e/0xb10 [ptlrpc] Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e73964>] ptlrpc_main+0xe64/0x1990 [ptlrpc] Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320 Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffff81527c20>] ? thread_return+0x4e/0x76e Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffffa0e72b00>] ? ptlrpc_main+0x0/0x1990 [ptlrpc] Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffff8109aee6>] kthread+0x96/0xa0 Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20 Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffff8109ae50>] ? kthread+0x0/0xa0 Nov 6 20:22:21 perses-esf-mds001 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20 Nov 6 20:22:21 perses-esf-mds001 kernel: mds002 Nov 6 20:22:16 perses-esf-mds002 kernel: req@ffff8803330e1c00 x1484073163441132/t0(0) o36->ca4f794c-56f5-709e-3a90-0c25c515ac58@30@gni:0/0 lens 552/568 e 18 to 0 dl 1415326941 ref 2 fl Interpret:/0/0 rc 0/0 Nov 6 20:22:21 perses-esf-mds002 kernel: LNet: Service thread pid 10838 was inactive for 600.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Nov 6 20:22:21 perses-esf-mds002 kernel: Pid: 10838, comm: mdt02_000 Nov 6 20:22:21 perses-esf-mds002 kernel: Nov 6 20:22:21 perses-esf-mds002 kernel: Call Trace: Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa0e01230>] ? ldlm_expired_completion_wait+0x0/0x360 [ptlrpc] Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa0e05df5>] ldlm_completion_ast+0x665/0x9a0 [ptlrpc] Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffff81065df0>] ? default_wake_function+0x0/0x20 Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa0dffe76>] ldlm_cli_enqueue_fini+0x926/0xe60 [ptlrpc] Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa0e00761>] ldlm_cli_enqueue+0x3b1/0x860 [ptlrpc] Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa0e05790>] ? ldlm_completion_ast+0x0/0x9a0 [ptlrpc] Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa1502b60>] ? mdt_remote_blocking_ast+0x0/0x210 [mdt] Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa16d94a8>] osp_md_object_lock+0x188/0x210 [osp] Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa15f70bf>] lod_object_lock+0x13f/0x730 [lod] Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa166913b>] mdd_object_lock+0x3b/0xd0 [mdd] Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa150b07c>] mdt_remote_object_lock+0x14c/0x320 [mdt] Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa150bdee>] mdt_object_lock_internal+0x11e/0x360 [mdt] Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa150c0f4>] mdt_object_lock+0x14/0x20 [mdt] Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa1526f80>] mdt_reint_create+0x700/0xd40 [mdt] Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa0c216b0>] ? lu_ucred+0x20/0x30 [obdclass] Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa1501ed5>] ? mdt_ucred+0x15/0x20 [mdt] Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa151deec>] ? mdt_root_squash+0x2c/0x3f0 [mdt] Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa0e57a56>] ? __req_capsule_get+0x166/0x6e0 [ptlrpc] Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffff8128a0ea>] ? strlcpy+0x4a/0x60 Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa15220fd>] mdt_reint_rec+0x5d/0x200 [mdt] Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa15061a3>] mdt_reint_internal+0x4d3/0x7b0 [mdt] Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa1506a0b>] mdt_reint+0x6b/0x120 [mdt] Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa0e930ce>] tgt_request_handle+0x71e/0xb10 [ptlrpc] Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa0e42964>] ptlrpc_main+0xe64/0x1990 [ptlrpc] Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320 Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffff81527c20>] ? thread_return+0x4e/0x76e Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffffa0e41b00>] ? ptlrpc_main+0x0/0x1990 [ptlrpc] Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffff8109aee6>] kthread+0x96/0xa0 Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20 Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffff8109ae50>] ? kthread+0x0/0xa0 Nov 6 20:22:21 perses-esf-mds002 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20 Unfortunately, the debug log you posted is about 1 hour earlier, could you be able to find these debug log Nov 6 20:22:21 perses-esf-mds002 kernel: Nov 6 20:22:21 perses-esf-mds002 kernel: LustreError: dumping log to /tmp/lustre-log.1415326941.10838 Nov 6 20:22:21 perses-esf-mds001 kernel: LustreError: dumping log to /tmp/lustre-log.1415326941.21117 Nov 6 20:24:27 perses-esf-mds001 kernel: Lustre: lock timed out (enqueued at 1415326767, 300s ago) Though I am not sure there are valuable information there, but worth try. Thanks.

          Di,
          Would you be able to comment on this one?
          Thank you!

          jlevi Jodi Levi (Inactive) added a comment - Di, Would you be able to comment on this one? Thank you!

          Partial Lustre logs and messages logs from both MDSes are here:
          ftp.whamcloud.com/uploads/LU-5886/lustre_logs_and_messages.tar.gz

          paf Patrick Farrell (Inactive) added a comment - Partial Lustre logs and messages logs from both MDSes are here: ftp.whamcloud.com/uploads/ LU-5886 /lustre_logs_and_messages.tar.gz

          People

            di.wang Di Wang
            paf Patrick Farrell (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: