Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5255

MDS grinds to halt - reboot required

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.4.1
    • None
    • 3
    • 14663

    Description

      We have hit this at least 3 times in as many weeks on our MDS server. A reboot is required to bring the filesystem back to life. Here is the relevant part of /var/log/messages

      Jun 25 13:55:53 bmds1 kernel: LNet: Service thread pid 22737 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Jun 25 13:55:53 bmds1 kernel: Pid: 22737, comm: mdt00_021
      Jun 25 13:55:53 bmds1 kernel: 
      Jun 25 13:55:53 bmds1 kernel: Call Trace:
      Jun 25 13:55:53 bmds1 kernel: [<ffffffff8150f362>] schedule_timeout+0x192/0x2e0
      Jun 25 13:55:53 bmds1 kernel: [<ffffffff810811e0>] ? process_timeout+0x0/0x10
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa03e66d1>] cfs_waitq_timedwait+0x11/0x20 [libcfs]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa06be01d>] ldlm_completion_ast+0x4ed/0x960 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa06b9790>] ? ldlm_expired_completion_wait+0x0/0x390 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffff81063410>] ? default_wake_function+0x0/0x20
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa06bd758>] ldlm_cli_enqueue_local+0x1f8/0x5d0 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa06bdb30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa0dc7c7b>] mdt_object_lock0+0x33b/0xaf0 [mdt]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa06bdb30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa0dc84f4>] mdt_object_lock+0x14/0x20 [mdt]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa0df04f8>] mdt_object_open_lock+0x1c8/0x510 [mdt]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa0dd4bfa>] ? mdt_attr_get_complex+0x38a/0x770 [mdt]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa0df56b3>] mdt_open_by_fid_lock+0x443/0x7d0 [mdt]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa0df629b>] mdt_reint_open+0x56b/0x20c0 [mdt]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa040282e>] ? upcall_cache_get_entry+0x28e/0x860 [libcfs]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa06e6dcc>] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa057df50>] ? lu_ucred+0x20/0x30 [obdclass]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa0dc1015>] ? mdt_ucred+0x15/0x20 [mdt]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa0ddd15c>] ? mdt_root_squash+0x2c/0x410 [mdt]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa070e646>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa057df50>] ? lu_ucred+0x20/0x30 [obdclass]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa0de1911>] mdt_reint_rec+0x41/0xe0 [mdt]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa0dc6ae3>] mdt_reint_internal+0x4c3/0x780 [mdt]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa0dc706d>] mdt_intent_reint+0x1ed/0x520 [mdt]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa0dc4f1e>] mdt_intent_policy+0x39e/0x720 [mdt]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa069e831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa06c51ef>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa0dc53a6>] mdt_enqueue+0x46/0xe0 [mdt]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa0dcba97>] mdt_handle_common+0x647/0x16d0 [mdt]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa06e7bac>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa0e053f5>] mds_regular_handle+0x15/0x20 [mdt]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa06f73c8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa03e65de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa03f7d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa06ee729>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffff81063410>] ? default_wake_function+0x0/0x20
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa06f875e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      Jun 25 13:55:53 bmds1 kernel: 
      Jun 25 13:55:53 bmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1403700953.22737
      Jun 25 13:55:53 bmds1 kernel: Lustre: lock timed out (enqueued at 1403700753, 200s ago)
      Jun 25 13:55:53 bmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1403700953.6869
      Jun 25 13:55:53 bmds1 kernel: LNet: Service thread pid 6869 was inactive for 200.26s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Jun 25 13:55:53 bmds1 kernel: Pid: 6869, comm: mdt01_003
      Jun 25 13:55:53 bmds1 kernel: 
      Jun 25 13:55:53 bmds1 kernel: Call Trace:
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa03e676e>] ? cfs_waitq_del+0xe/0x10 [libcfs]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa03e66fe>] cfs_waitq_wait+0xe/0x10 [libcfs]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa06be0aa>] ldlm_completion_ast+0x57a/0x960 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa06b9790>] ? ldlm_expired_completion_wait+0x0/0x390 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffff81063410>] ? default_wake_function+0x0/0x20
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa06bd758>] ldlm_cli_enqueue_local+0x1f8/0x5d0 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa06bdb30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa0dc7c7b>] mdt_object_lock0+0x33b/0xaf0 [mdt]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa06bdb30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa0dc84f4>] mdt_object_lock+0x14/0x20 [mdt]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa0dd75a9>] mdt_getattr_name_lock+0xe19/0x1980 [mdt]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa06e6135>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa070e646>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa06e83c4>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa0dd83ad>] mdt_intent_getattr+0x29d/0x490 [mdt]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa0dc4f1e>] mdt_intent_policy+0x39e/0x720 [mdt]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa069e831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa06c51ef>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa0dc53a6>] mdt_enqueue+0x46/0xe0 [mdt]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa0dcba97>] mdt_handle_common+0x647/0x16d0 [mdt]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa06e7bac>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa0e053f5>] mds_regular_handle+0x15/0x20 [mdt]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa06f73c8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa03e65de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa03f7d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa06ee729>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffff81055ad3>] ? __wake_up+0x53/0x70
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa06f875e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 13:55:53 bmds1 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      Jun 25 13:55:53 bmds1 kernel: 
      Jun 25 13:56:43 bmds1 kernel: LNet: Service thread pid 22446 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Jun 25 13:56:43 bmds1 kernel: Pid: 22446, comm: mdt02_016
      Jun 25 13:56:43 bmds1 kernel: 
      Jun 25 13:56:43 bmds1 kernel: Call Trace:
      Jun 25 13:56:43 bmds1 kernel: Lustre: lock timed out (enqueued at 1403700803, 200s ago)
      Jun 25 13:56:43 bmds1 kernel: Lustre: Skipped 1 previous similar message
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa06b7643>] ? ldlm_cli_cancel_local+0x2b3/0x470 [ptlrpc]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa03f62d1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa03e66fe>] ? cfs_waitq_wait+0xe/0x10 [libcfs]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa06be0aa>] ? ldlm_completion_ast+0x57a/0x960 [ptlrpc]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa06b9790>] ? ldlm_expired_completion_wait+0x0/0x390 [ptlrpc]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffff81063410>] ? default_wake_function+0x0/0x20
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa06bd758>] ? ldlm_cli_enqueue_local+0x1f8/0x5d0 [ptlrpc]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa06bdb30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa0dc7c7b>] ? mdt_object_lock0+0x33b/0xaf0 [mdt]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa06bdb30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa0dc84f4>] ? mdt_object_lock+0x14/0x20 [mdt]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa0dd75a9>] ? mdt_getattr_name_lock+0xe19/0x1980 [mdt]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa06e6135>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa070e646>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa06e83c4>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa0dd83ad>] ? mdt_intent_getattr+0x29d/0x490 [mdt]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa0dc4f1e>] ? mdt_intent_policy+0x39e/0x720 [mdt]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa069e831>] ? ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa06c51ef>] ? ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa0dc53a6>] ? mdt_enqueue+0x46/0xe0 [mdt]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa0dcba97>] ? mdt_handle_common+0x647/0x16d0 [mdt]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa06e7bac>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa0e053f5>] ? mds_regular_handle+0x15/0x20 [mdt]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa06f73c8>] ? ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa03e65de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa03f7d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa06ee729>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffff81055ad3>] ? __wake_up+0x53/0x70
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa06f875e>] ? ptlrpc_main+0xace/0x1700 [ptlrpc]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 13:56:43 bmds1 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      Jun 25 13:56:43 bmds1 kernel: 
      Jun 25 13:56:43 bmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1403701003.22446
      Jun 25 14:06:00 bmds1 kernel: Lustre: 22504:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-207), not sending early reply
      Jun 25 14:06:00 bmds1 kernel:  req@ffff88217d305400 x1468581949625152/t0(0) o101->0bb08de7-0631-0e52-3ff1-946375c89d85@10.21.22.25@tcp:0/0 lens 576/3448 e 5 to 0 dl 1403701565 ref 2 fl Interpret:/0/0 rc 0/0
      Jun 25 14:06:00 bmds1 kernel: Lustre: 22402:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-207), not sending early reply
      Jun 25 14:06:00 bmds1 kernel:  req@ffff882d71613800 x1468571541943816/t0(0) o101->1bfa42e7-94df-b8e8-2a7b-b32039aac0ce@10.21.22.23@tcp:0/0 lens 576/1152 e 5 to 0 dl 1403701565 ref 2 fl Interpret:/0/0 rc 0/0
      Jun 25 14:06:50 bmds1 kernel: Lustre: 6920:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-207), not sending early reply
      Jun 25 14:06:50 bmds1 kernel:  req@ffff8833cbe53c00 x1468568450342728/t0(0) o101->b69b8925-7476-e90c-d5d5-edb6b51853ec@10.21.22.24@tcp:0/0 lens 608/3448 e 5 to 0 dl 1403701615 ref 2 fl Interpret:/0/0 rc 0/0
      Jun 25 14:07:56 bmds1 kernel: Lustre: bravo-MDT0000: Client 1bfa42e7-94df-b8e8-2a7b-b32039aac0ce (at 10.21.22.23@tcp) reconnecting
      Jun 25 14:07:56 bmds1 kernel: Lustre: bravo-MDT0000: Client 1bfa42e7-94df-b8e8-2a7b-b32039aac0ce (at 10.21.22.23@tcp) refused reconnection, still busy with 1 active RPCs
      Jun 25 14:07:56 bmds1 kernel: Lustre: bravo-MDT0000: Client 0bb08de7-0631-0e52-3ff1-946375c89d85 (at 10.21.22.25@tcp) reconnecting
      Jun 25 14:07:56 bmds1 kernel: Lustre: bravo-MDT0000: Client 0bb08de7-0631-0e52-3ff1-946375c89d85 (at 10.21.22.25@tcp) refused reconnection, still busy with 1 active RPCs
      Jun 25 14:08:21 bmds1 kernel: Lustre: bravo-MDT0000: Client 1bfa42e7-94df-b8e8-2a7b-b32039aac0ce (at 10.21.22.23@tcp) reconnecting
      Jun 25 14:08:21 bmds1 kernel: Lustre: bravo-MDT0000: Client 1bfa42e7-94df-b8e8-2a7b-b32039aac0ce (at 10.21.22.23@tcp) refused reconnection, still busy with 1 active RPCs
      Jun 25 14:08:46 bmds1 kernel: Lustre: bravo-MDT0000: Client 1bfa42e7-94df-b8e8-2a7b-b32039aac0ce (at 10.21.22.23@tcp) reconnecting
      Jun 25 14:08:46 bmds1 kernel: Lustre: Skipped 1 previous similar message
      Jun 25 14:08:46 bmds1 kernel: Lustre: bravo-MDT0000: Client 1bfa42e7-94df-b8e8-2a7b-b32039aac0ce (at 10.21.22.23@tcp) refused reconnection, still busy with 1 active RPCs
      Jun 25 14:08:46 bmds1 kernel: Lustre: Skipped 1 previous similar message
      Jun 25 14:09:11 bmds1 kernel: Lustre: bravo-MDT0000: Client 1bfa42e7-94df-b8e8-2a7b-b32039aac0ce (at 10.21.22.23@tcp) reconnecting
      Jun 25 14:09:11 bmds1 kernel: Lustre: Skipped 2 previous similar messages
      Jun 25 14:09:11 bmds1 kernel: Lustre: bravo-MDT0000: Client 1bfa42e7-94df-b8e8-2a7b-b32039aac0ce (at 10.21.22.23@tcp) refused reconnection, still busy with 1 active RPCs
      Jun 25 14:09:11 bmds1 kernel: Lustre: Skipped 2 previous similar messages
      Jun 25 14:09:36 bmds1 kernel: Lustre: bravo-MDT0000: Client 1bfa42e7-94df-b8e8-2a7b-b32039aac0ce (at 10.21.22.23@tcp) reconnecting
      Jun 25 14:09:36 bmds1 kernel: Lustre: Skipped 2 previous similar messages
      Jun 25 14:09:36 bmds1 kernel: Lustre: bravo-MDT0000: Client 1bfa42e7-94df-b8e8-2a7b-b32039aac0ce (at 10.21.22.23@tcp) refused reconnection, still busy with 1 active RPCs
      Jun 25 14:09:36 bmds1 kernel: Lustre: Skipped 2 previous similar messages
      Jun 25 14:09:44 bmds1 kernel: LustreError: 0:0:(ldlm_lockd.c:391:waiting_locks_callback()) ### lock callback timer expired after 113s: evicting client at 10.21.22.25@tcp  ns: mdt-bravo-MDT0000_UUID lock: ffff881d38f2f240/0x6ae950
      Jun 25 14:10:01 bmds1 kernel: Lustre: bravo-MDT0000: Client 1bfa42e7-94df-b8e8-2a7b-b32039aac0ce (at 10.21.22.23@tcp) reconnecting
      Jun 25 14:10:01 bmds1 kernel: Lustre: Skipped 2 previous similar messages
      Jun 25 14:10:01 bmds1 kernel: Lustre: bravo-MDT0000: Client 1bfa42e7-94df-b8e8-2a7b-b32039aac0ce (at 10.21.22.23@tcp) refused reconnection, still busy with 1 active RPCs
      Jun 25 14:10:01 bmds1 kernel: Lustre: Skipped 2 previous similar messages
      Jun 25 14:10:26 bmds1 kernel: Lustre: bravo-MDT0000: Client 1bfa42e7-94df-b8e8-2a7b-b32039aac0ce (at 10.21.22.23@tcp) reconnecting
      Jun 25 14:10:26 bmds1 kernel: Lustre: Skipped 1 previous similar message
      Jun 25 14:10:26 bmds1 kernel: Lustre: bravo-MDT0000: Client 1bfa42e7-94df-b8e8-2a7b-b32039aac0ce (at 10.21.22.23@tcp) refused reconnection, still busy with 1 active RPCs
      Jun 25 14:10:26 bmds1 kernel: Lustre: Skipped 1 previous similar message
      Jun 25 14:11:16 bmds1 kernel: Lustre: bravo-MDT0000: Client 1bfa42e7-94df-b8e8-2a7b-b32039aac0ce (at 10.21.22.23@tcp) reconnecting
      Jun 25 14:11:16 bmds1 kernel: Lustre: Skipped 3 previous similar messages
      Jun 25 14:11:16 bmds1 kernel: Lustre: bravo-MDT0000: Client 1bfa42e7-94df-b8e8-2a7b-b32039aac0ce (at 10.21.22.23@tcp) refused reconnection, still busy with 1 active RPCs
      Jun 25 14:11:16 bmds1 kernel: Lustre: Skipped 3 previous similar messages
      Jun 25 14:11:46 bmds1 kernel: LustreError: 0:0:(ldlm_lockd.c:391:waiting_locks_callback()) ### lock callback timer expired after 2200s: evicting client at 10.21.22.23@tcp  ns: mdt-bravo-MDT0000_UUID lock: ffff881656d94480/0x6ae90
      Jun 25 14:13:23 bmds1 kernel: Lustre: lock timed out (enqueued at 1403701803, 200s ago)
      Jun 25 14:15:26 bmds1 kernel: Lustre: lock timed out (enqueued at 1403701926, 200s ago)
      Jun 25 14:16:26 bmds1 kernel: Lustre: lock timed out (enqueued at 1403701986, 200s ago)
      Jun 25 14:16:52 bmds1 kernel: Lustre: lock timed out (enqueued at 1403702012, 200s ago)
      Jun 25 14:18:26 bmds1 kernel: Lustre: lock timed out (enqueued at 1403702106, 200s ago)
      Jun 25 14:19:26 bmds1 kernel: LNet: Service thread pid 6870 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Jun 25 14:19:26 bmds1 kernel: Pid: 6870, comm: mdt00_003
      Jun 25 14:19:26 bmds1 kernel: 
      Jun 25 14:19:26 bmds1 kernel: Call Trace:
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa06b7643>] ? ldlm_cli_cancel_local+0x2b3/0x470 [ptlrpc]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffff8150f362>] schedule_timeout+0x192/0x2e0
      Jun 25 14:19:26 bmds1 kernel: [<ffffffff810811e0>] ? process_timeout+0x0/0x10
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa03e66d1>] cfs_waitq_timedwait+0x11/0x20 [libcfs]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa06be01d>] ldlm_completion_ast+0x4ed/0x960 [ptlrpc]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa06b9790>] ? ldlm_expired_completion_wait+0x0/0x390 [ptlrpc]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffff81063410>] ? default_wake_function+0x0/0x20
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa06bd758>] ldlm_cli_enqueue_local+0x1f8/0x5d0 [ptlrpc]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa06bdb30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa0dc7c7b>] mdt_object_lock0+0x33b/0xaf0 [mdt]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa06bdb30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa0dc84f4>] mdt_object_lock+0x14/0x20 [mdt]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa0dd75a9>] mdt_getattr_name_lock+0xe19/0x1980 [mdt]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa06e6135>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa070e646>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa06e83c4>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa0dd83ad>] mdt_intent_getattr+0x29d/0x490 [mdt]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa0dc4f1e>] mdt_intent_policy+0x39e/0x720 [mdt]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa069e831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa06c51ef>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa0dc53a6>] mdt_enqueue+0x46/0xe0 [mdt]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa0dcba97>] mdt_handle_common+0x647/0x16d0 [mdt]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa06e7bac>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa0e053f5>] mds_regular_handle+0x15/0x20 [mdt]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa06f73c8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa03e65de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa03f7d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa06ee729>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffff81055ad3>] ? __wake_up+0x53/0x70
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa06f875e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:19:26 bmds1 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      Jun 25 14:19:26 bmds1 kernel: 
      Jun 25 14:19:26 bmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1403702366.6870
      Jun 25 14:19:26 bmds1 kernel: Lustre: lock timed out (enqueued at 1403702166, 200s ago)
      Jun 25 14:21:26 bmds1 kernel: Lustre: lock timed out (enqueued at 1403702286, 200s ago)
      Jun 25 14:22:27 bmds1 kernel: Lustre: lock timed out (enqueued at 1403702347, 200s ago)
      Jun 25 14:22:28 bmds1 kernel: LNet: Service thread pid 22401 was inactive for 262.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Jun 25 14:22:28 bmds1 kernel: Pid: 22401, comm: mdt00_014
      Jun 25 14:22:28 bmds1 kernel: 
      Jun 25 14:22:28 bmds1 kernel: Call Trace:
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa03f62d1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa03e66fe>] cfs_waitq_wait+0xe/0x10 [libcfs]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa06be0aa>] ldlm_completion_ast+0x57a/0x960 [ptlrpc]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa06b9790>] ? ldlm_expired_completion_wait+0x0/0x390 [ptlrpc]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffff81063410>] ? default_wake_function+0x0/0x20
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa06bd758>] ldlm_cli_enqueue_local+0x1f8/0x5d0 [ptlrpc]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa06bdb30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa0dc7c7b>] mdt_object_lock0+0x33b/0xaf0 [mdt]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa06bdb30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa0dc84f4>] mdt_object_lock+0x14/0x20 [mdt]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa0dd75a9>] mdt_getattr_name_lock+0xe19/0x1980 [mdt]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa06e6135>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa070e646>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa06e83c4>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa0dd83ad>] mdt_intent_getattr+0x29d/0x490 [mdt]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa0dc4f1e>] mdt_intent_policy+0x39e/0x720 [mdt]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa069e831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa06c51ef>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa0dc53a6>] mdt_enqueue+0x46/0xe0 [mdt]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa0dcba97>] mdt_handle_common+0x647/0x16d0 [mdt]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa06e7bac>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa0e053f5>] mds_regular_handle+0x15/0x20 [mdt]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa06f73c8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa03e65de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa03f7d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa06ee729>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffff81055ad3>] ? __wake_up+0x53/0x70
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa06f875e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:22:28 bmds1 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      Jun 25 14:22:28 bmds1 kernel: 
      Jun 25 14:22:28 bmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1403702548.22401
      Jun 25 14:22:33 bmds1 kernel: Lustre: 6944:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply
      Jun 25 14:22:33 bmds1 kernel:  req@ffff881dbaa01800 x1468581963167316/t0(0) o101->0bb08de7-0631-0e52-3ff1-946375c89d85@10.21.22.25@tcp:0/0 lens 608/3448 e 0 to 0 dl 1403702558 ref 2 fl Interpret:/0/0 rc 0/0
      Jun 25 14:22:39 bmds1 kernel: Lustre: bravo-MDT0000: Client 0bb08de7-0631-0e52-3ff1-946375c89d85 (at 10.21.22.25@tcp) reconnecting
      Jun 25 14:22:39 bmds1 kernel: Lustre: Skipped 3 previous similar messages
      Jun 25 14:22:39 bmds1 kernel: Lustre: bravo-MDT0000: Client 0bb08de7-0631-0e52-3ff1-946375c89d85 (at 10.21.22.25@tcp) refused reconnection, still busy with 1 active RPCs
      Jun 25 14:22:39 bmds1 kernel: Lustre: Skipped 3 previous similar messages
      Jun 25 14:23:04 bmds1 kernel: Lustre: bravo-MDT0000: Client 0bb08de7-0631-0e52-3ff1-946375c89d85 (at 10.21.22.25@tcp) reconnecting
      Jun 25 14:23:04 bmds1 kernel: Lustre: bravo-MDT0000: Client 0bb08de7-0631-0e52-3ff1-946375c89d85 (at 10.21.22.25@tcp) refused reconnection, still busy with 1 active RPCs
      Jun 25 14:23:29 bmds1 kernel: Lustre: bravo-MDT0000: Client 0bb08de7-0631-0e52-3ff1-946375c89d85 (at 10.21.22.25@tcp) reconnecting
      Jun 25 14:23:29 bmds1 kernel: Lustre: bravo-MDT0000: Client 0bb08de7-0631-0e52-3ff1-946375c89d85 (at 10.21.22.25@tcp) refused reconnection, still busy with 1 active RPCs
      Jun 25 14:23:29 bmds1 kernel: LNet: Service thread pid 6908 was inactive for 262.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Jun 25 14:23:29 bmds1 kernel: Pid: 6908, comm: mdt00_006
      Jun 25 14:23:29 bmds1 kernel: 
      Jun 25 14:23:29 bmds1 kernel: Call Trace:
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa03f62d1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa03e66fe>] cfs_waitq_wait+0xe/0x10 [libcfs]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa06be0aa>] ldlm_completion_ast+0x57a/0x960 [ptlrpc]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa06b9790>] ? ldlm_expired_completion_wait+0x0/0x390 [ptlrpc]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffff81063410>] ? default_wake_function+0x0/0x20
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa06bd758>] ldlm_cli_enqueue_local+0x1f8/0x5d0 [ptlrpc]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa06bdb30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa0dc7c7b>] mdt_object_lock0+0x33b/0xaf0 [mdt]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa06bdb30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa0dc84f4>] mdt_object_lock+0x14/0x20 [mdt]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa0dd75a9>] mdt_getattr_name_lock+0xe19/0x1980 [mdt]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa06e6135>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa070e646>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa06e83c4>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa0dd83ad>] mdt_intent_getattr+0x29d/0x490 [mdt]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa0dc4f1e>] mdt_intent_policy+0x39e/0x720 [mdt]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa069e831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa06c51ef>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa0dc53a6>] mdt_enqueue+0x46/0xe0 [mdt]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa0dcba97>] mdt_handle_common+0x647/0x16d0 [mdt]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa06e7bac>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa0e053f5>] mds_regular_handle+0x15/0x20 [mdt]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa06f73c8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa03e65de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa03f7d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa06ee729>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffff81055ad3>] ? __wake_up+0x53/0x70
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa06f875e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:23:29 bmds1 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      Jun 25 14:23:29 bmds1 kernel: 
      Jun 25 14:23:29 bmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1403702609.6908
      Jun 25 14:24:19 bmds1 kernel: Lustre: bravo-MDT0000: Client 0bb08de7-0631-0e52-3ff1-946375c89d85 (at 10.21.22.25@tcp) reconnecting
      Jun 25 14:24:19 bmds1 kernel: Lustre: Skipped 1 previous similar message
      Jun 25 14:24:19 bmds1 kernel: Lustre: bravo-MDT0000: Client 0bb08de7-0631-0e52-3ff1-946375c89d85 (at 10.21.22.25@tcp) refused reconnection, still busy with 1 active RPCs
      Jun 25 14:24:19 bmds1 kernel: Lustre: Skipped 1 previous similar message
      Jun 25 14:24:24 bmds1 kernel: LustreError: 0:0:(ldlm_lockd.c:391:waiting_locks_callback()) ### lock callback timer expired after 108s: evicting client at 10.21.22.25@tcp  ns: mdt-bravo-MDT0000_UUID lock: ffff882efc845000/0x6ae950
      Jun 25 14:24:24 bmds1 kernel: LustreError: 0:0:(ldlm_lockd.c:391:waiting_locks_callback()) Skipped 1 previous similar message
      Jun 25 14:24:26 bmds1 kernel: Lustre: lock timed out (enqueued at 1403702466, 200s ago)
      Jun 25 14:24:36 bmds1 kernel: Lustre: 22403:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply
      Jun 25 14:24:36 bmds1 kernel:  req@ffff880d89de9000 x1468571551937816/t0(0) o101->1bfa42e7-94df-b8e8-2a7b-b32039aac0ce@10.21.22.23@tcp:0/0 lens 576/3448 e 0 to 0 dl 1403702681 ref 2 fl Interpret:/0/0 rc 0/0
      Jun 25 14:25:32 bmds1 kernel: Lustre: bravo-MDT0000: Client 1bfa42e7-94df-b8e8-2a7b-b32039aac0ce (at 10.21.22.23@tcp) reconnecting
      Jun 25 14:25:32 bmds1 kernel: Lustre: Skipped 2 previous similar messages
      Jun 25 14:25:32 bmds1 kernel: Lustre: bravo-MDT0000: Client 1bfa42e7-94df-b8e8-2a7b-b32039aac0ce (at 10.21.22.23@tcp) refused reconnection, still busy with 8 active RPCs
      Jun 25 14:25:32 bmds1 kernel: Lustre: Skipped 2 previous similar messages
      Jun 25 14:25:36 bmds1 kernel: Lustre: 30120:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply
      Jun 25 14:25:36 bmds1 kernel:  req@ffff88204db56800 x1468571552939332/t0(0) o101->1bfa42e7-94df-b8e8-2a7b-b32039aac0ce@10.21.22.23@tcp:0/0 lens 576/3448 e 0 to 0 dl 1403702741 ref 2 fl Interpret:/0/0 rc 0/0
      Jun 25 14:26:02 bmds1 kernel: Lustre: 22450:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply
      Jun 25 14:26:02 bmds1 kernel:  req@ffff88404d9c8400 x1468568453283516/t0(0) o101->b69b8925-7476-e90c-d5d5-edb6b51853ec@10.21.22.24@tcp:0/0 lens 608/3448 e 0 to 0 dl 1403702767 ref 2 fl Interpret:/0/0 rc 0/0
      Jun 25 14:26:27 bmds1 kernel: LustreError: 0:0:(ldlm_lockd.c:391:waiting_locks_callback()) ### lock callback timer expired after 711s: evicting client at 10.21.22.23@tcp  ns: mdt-bravo-MDT0000_UUID lock: ffff88022e542900/0x6ae950
      Jun 25 14:27:36 bmds1 kernel: Lustre: 30242:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply
      Jun 25 14:27:36 bmds1 kernel:  req@ffff8815c622b400 x1468571555018544/t0(0) o101->1bfa42e7-94df-b8e8-2a7b-b32039aac0ce@10.21.22.23@tcp:0/0 lens 576/3448 e 0 to 0 dl 1403702861 ref 2 fl Interpret:/0/0 rc 0/0
      Jun 25 14:27:48 bmds1 kernel: Lustre: bravo-MDT0000: Client b69b8925-7476-e90c-d5d5-edb6b51853ec (at 10.21.22.24@tcp) reconnecting
      Jun 25 14:27:48 bmds1 kernel: Lustre: Skipped 6 previous similar messages
      Jun 25 14:27:48 bmds1 kernel: Lustre: bravo-MDT0000: Client b69b8925-7476-e90c-d5d5-edb6b51853ec (at 10.21.22.24@tcp) refused reconnection, still busy with 1 active RPCs
      Jun 25 14:27:48 bmds1 kernel: Lustre: Skipped 6 previous similar messages
      Jun 25 14:29:08 bmds1 kernel: LustreError: 0:0:(ldlm_lockd.c:391:waiting_locks_callback()) ### lock callback timer expired after 728s: evicting client at 10.21.22.24@tcp  ns: mdt-bravo-MDT0000_UUID lock: ffff883a8d51f000/0x6ae950
      Jun 25 14:29:40 bmds1 kernel: LNet: Service thread pid 22365 was inactive for 514.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Jun 25 14:29:40 bmds1 kernel: Pid: 22365, comm: mdt00_013
      Jun 25 14:29:40 bmds1 kernel: 
      Jun 25 14:29:40 bmds1 kernel: Call Trace:
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa03f62d1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa03e66fe>] cfs_waitq_wait+0xe/0x10 [libcfs]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa06be0aa>] ldlm_completion_ast+0x57a/0x960 [ptlrpc]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa06b9790>] ? ldlm_expired_completion_wait+0x0/0x390 [ptlrpc]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffff81063410>] ? default_wake_function+0x0/0x20
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa06bd758>] ldlm_cli_enqueue_local+0x1f8/0x5d0 [ptlrpc]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa06bdb30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa0dc7c7b>] mdt_object_lock0+0x33b/0xaf0 [mdt]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa06bdb30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa0dc84f4>] mdt_object_lock+0x14/0x20 [mdt]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa0dd75a9>] mdt_getattr_name_lock+0xe19/0x1980 [mdt]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa06e6135>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa070e646>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa06e83c4>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa0dd83ad>] mdt_intent_getattr+0x29d/0x490 [mdt]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa0dc4f1e>] mdt_intent_policy+0x39e/0x720 [mdt]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa069e831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa06c51ef>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa0dc53a6>] mdt_enqueue+0x46/0xe0 [mdt]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa0dcba97>] mdt_handle_common+0x647/0x16d0 [mdt]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa06e7bac>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa0e053f5>] mds_regular_handle+0x15/0x20 [mdt]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa06f73c8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa03e65de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa03f7d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa06ee729>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffff81055ad3>] ? __wake_up+0x53/0x70
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa06f875e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:29:40 bmds1 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      Jun 25 14:29:40 bmds1 kernel: 
      Jun 25 14:29:40 bmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1403702980.22365
      Jun 25 14:30:03 bmds1 kernel: LNet: Service thread pid 6954 was inactive for 1200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Jun 25 14:30:03 bmds1 kernel: Pid: 6954, comm: mdt01_011
      Jun 25 14:30:03 bmds1 kernel: 
      Jun 25 14:30:03 bmds1 kernel: Call Trace:
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa03f62d1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa03e66fe>] cfs_waitq_wait+0xe/0x10 [libcfs]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa06be0aa>] ldlm_completion_ast+0x57a/0x960 [ptlrpc]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa06b9790>] ? ldlm_expired_completion_wait+0x0/0x390 [ptlrpc]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffff81063410>] ? default_wake_function+0x0/0x20
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa06bd758>] ldlm_cli_enqueue_local+0x1f8/0x5d0 [ptlrpc]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa06bdb30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa0dc7c7b>] mdt_object_lock0+0x33b/0xaf0 [mdt]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa06bdb30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa0dc84f4>] mdt_object_lock+0x14/0x20 [mdt]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa0dd75a9>] mdt_getattr_name_lock+0xe19/0x1980 [mdt]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa06e6135>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa070e646>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa06e83c4>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa0dd83ad>] mdt_intent_getattr+0x29d/0x490 [mdt]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa0dc4f1e>] mdt_intent_policy+0x39e/0x720 [mdt]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa069e831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa06c51ef>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa0dc53a6>] mdt_enqueue+0x46/0xe0 [mdt]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa0dcba97>] mdt_handle_common+0x647/0x16d0 [mdt]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa06e7bac>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa0e053f5>] mds_regular_handle+0x15/0x20 [mdt]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa06f73c8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa03e65de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa03f7d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa06ee729>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffff81055ad3>] ? __wake_up+0x53/0x70
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa06f875e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:30:03 bmds1 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      Jun 25 14:30:03 bmds1 kernel: 
      Jun 25 14:30:03 bmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1403703003.6954
      Jun 25 14:30:40 bmds1 kernel: LNet: Service thread pid 6933 was inactive for 514.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Jun 25 14:30:40 bmds1 kernel: Pid: 6933, comm: mdt00_009
      Jun 25 14:30:40 bmds1 kernel: 
      Jun 25 14:30:40 bmds1 kernel: Call Trace:
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa03f62d1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa03e66fe>] cfs_waitq_wait+0xe/0x10 [libcfs]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa06be0aa>] ldlm_completion_ast+0x57a/0x960 [ptlrpc]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa06b9790>] ? ldlm_expired_completion_wait+0x0/0x390 [ptlrpc]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffff81063410>] ? default_wake_function+0x0/0x20
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa06bd758>] ldlm_cli_enqueue_local+0x1f8/0x5d0 [ptlrpc]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa06bdb30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa0dc7c7b>] mdt_object_lock0+0x33b/0xaf0 [mdt]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa06bdb30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa0dc84f4>] mdt_object_lock+0x14/0x20 [mdt]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa0dd75a9>] mdt_getattr_name_lock+0xe19/0x1980 [mdt]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa06e6135>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa070e646>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa06e83c4>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa0dd83ad>] mdt_intent_getattr+0x29d/0x490 [mdt]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa0dc4f1e>] mdt_intent_policy+0x39e/0x720 [mdt]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa069e831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa06c51ef>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa0dc53a6>] mdt_enqueue+0x46/0xe0 [mdt]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa0dcba97>] mdt_handle_common+0x647/0x16d0 [mdt]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa06e7bac>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa0e053f5>] mds_regular_handle+0x15/0x20 [mdt]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa06f73c8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa03e65de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa03f7d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa06ee729>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffff81055ad3>] ? __wake_up+0x53/0x70
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa06f875e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:30:40 bmds1 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      Jun 25 14:30:40 bmds1 kernel: 
      Jun 25 14:30:40 bmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1403703040.6933
      Jun 25 14:32:06 bmds1 kernel: LNet: Service thread pid 6937 was inactive for 1200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Jun 25 14:32:06 bmds1 kernel: Pid: 6937, comm: mdt00_010
      Jun 25 14:32:06 bmds1 kernel: 
      Jun 25 14:32:06 bmds1 kernel: Call Trace:
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa03f62d1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa03e66fe>] cfs_waitq_wait+0xe/0x10 [libcfs]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa06be0aa>] ldlm_completion_ast+0x57a/0x960 [ptlrpc]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa06b9790>] ? ldlm_expired_completion_wait+0x0/0x390 [ptlrpc]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffff81063410>] ? default_wake_function+0x0/0x20
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa06bd758>] ldlm_cli_enqueue_local+0x1f8/0x5d0 [ptlrpc]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa06bdb30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa0dc7c7b>] mdt_object_lock0+0x33b/0xaf0 [mdt]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa06bdb30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa0dc84f4>] mdt_object_lock+0x14/0x20 [mdt]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa0dd75a9>] mdt_getattr_name_lock+0xe19/0x1980 [mdt]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa06e6135>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa070e646>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa06e83c4>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa0dd83ad>] mdt_intent_getattr+0x29d/0x490 [mdt]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa0dc4f1e>] mdt_intent_policy+0x39e/0x720 [mdt]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa069e831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa06c51ef>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa0dc53a6>] mdt_enqueue+0x46/0xe0 [mdt]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa0dcba97>] mdt_handle_common+0x647/0x16d0 [mdt]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa06e7bac>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa0e053f5>] mds_regular_handle+0x15/0x20 [mdt]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa06f73c8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa03e65de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa03f7d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa06ee729>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffff81055ad3>] ? __wake_up+0x53/0x70
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa06f875e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:32:06 bmds1 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      Jun 25 14:32:06 bmds1 kernel: 
      Jun 25 14:32:06 bmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1403703126.6937
      Jun 25 14:32:29 bmds1 kernel: Lustre: 22454:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-383), not sending early reply
      Jun 25 14:32:29 bmds1 kernel:  req@ffff881e34688000 x1468571555875772/t0(0) o101->1bfa42e7-94df-b8e8-2a7b-b32039aac0ce@10.21.22.23@tcp:0/0 lens 576/3448 e 4 to 0 dl 1403703154 ref 2 fl Interpret:/0/0 rc 0/0
      Jun 25 14:32:52 bmds1 kernel: Lustre: lock timed out (enqueued at 1403702972, 200s ago)
      Jun 25 14:32:52 bmds1 kernel: Lustre: Skipped 1 previous similar message
      Jun 25 14:33:06 bmds1 kernel: LNet: Service thread pid 6910 was inactive for 1200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Jun 25 14:33:06 bmds1 kernel: Pid: 6910, comm: mdt00_008
      Jun 25 14:33:06 bmds1 kernel: 
      Jun 25 14:33:06 bmds1 kernel: Call Trace:
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa03f62d1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa03e66fe>] cfs_waitq_wait+0xe/0x10 [libcfs]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa06be0aa>] ldlm_completion_ast+0x57a/0x960 [ptlrpc]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa06b9790>] ? ldlm_expired_completion_wait+0x0/0x390 [ptlrpc]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffff81063410>] ? default_wake_function+0x0/0x20
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa06bd758>] ldlm_cli_enqueue_local+0x1f8/0x5d0 [ptlrpc]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa06bdb30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa0dc7c7b>] mdt_object_lock0+0x33b/0xaf0 [mdt]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa06bdb30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa0dc84f4>] mdt_object_lock+0x14/0x20 [mdt]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa0dd75a9>] mdt_getattr_name_lock+0xe19/0x1980 [mdt]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa06e6135>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa070e646>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa06e83c4>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa0dd83ad>] mdt_intent_getattr+0x29d/0x490 [mdt]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa0dc4f1e>] mdt_intent_policy+0x39e/0x720 [mdt]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa069e831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa06c51ef>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa0dc53a6>] mdt_enqueue+0x46/0xe0 [mdt]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa0dcba97>] mdt_handle_common+0x647/0x16d0 [mdt]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa06e7bac>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa0e053f5>] mds_regular_handle+0x15/0x20 [mdt]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa06f73c8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa03e65de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa03f7d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa06ee729>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffff81063410>] ? default_wake_function+0x0/0x20
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa06f875e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:33:06 bmds1 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      Jun 25 14:33:06 bmds1 kernel: 
      Jun 25 14:33:06 bmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1403703186.6910
      Jun 25 14:33:32 bmds1 kernel: LNet: Service thread pid 6612 was inactive for 1200.00s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
      Jun 25 14:33:32 bmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1403703212.6612
      Jun 25 14:34:56 bmds1 kernel: Lustre: 6957:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-410), not sending early reply
      Jun 25 14:34:56 bmds1 kernel:  req@ffff883b77b15400 x1468571557733836/t0(0) o101->1bfa42e7-94df-b8e8-2a7b-b32039aac0ce@10.21.22.23@tcp:0/0 lens 576/3448 e 2 to 0 dl 1403703301 ref 2 fl Interpret:/0/0 rc 0/0
      Jun 25 14:35:06 bmds1 kernel: LNet: Service thread pid 6905 was inactive for 1200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Jun 25 14:35:06 bmds1 kernel: Pid: 6905, comm: mdt00_005
      Jun 25 14:35:06 bmds1 kernel: 
      Jun 25 14:35:06 bmds1 kernel: Call Trace:
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa03f62d1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa03e66fe>] cfs_waitq_wait+0xe/0x10 [libcfs]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa06be0aa>] ldlm_completion_ast+0x57a/0x960 [ptlrpc]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa06b9790>] ? ldlm_expired_completion_wait+0x0/0x390 [ptlrpc]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffff81063410>] ? default_wake_function+0x0/0x20
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa06bd758>] ldlm_cli_enqueue_local+0x1f8/0x5d0 [ptlrpc]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa06bdb30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa0dc7c7b>] mdt_object_lock0+0x33b/0xaf0 [mdt]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa06bdb30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa0dc84f4>] mdt_object_lock+0x14/0x20 [mdt]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa0dd75a9>] mdt_getattr_name_lock+0xe19/0x1980 [mdt]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa06e6135>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa070e646>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa06e83c4>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa0dd83ad>] mdt_intent_getattr+0x29d/0x490 [mdt]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa0dc4f1e>] mdt_intent_policy+0x39e/0x720 [mdt]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa069e831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa06c51ef>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa0dc53a6>] mdt_enqueue+0x46/0xe0 [mdt]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa0dcba97>] mdt_handle_common+0x647/0x16d0 [mdt]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa06e7bac>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa0e053f5>] mds_regular_handle+0x15/0x20 [mdt]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa06f73c8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa03e65de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa03f7d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa06ee729>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffff81055ad3>] ? __wake_up+0x53/0x70
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa06f875e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:35:06 bmds1 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      Jun 25 14:35:06 bmds1 kernel: 
      Jun 25 14:35:06 bmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1403703306.6905
      Jun 25 14:35:57 bmds1 kernel: Lustre: 22588:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-410), not sending early reply
      Jun 25 14:35:57 bmds1 kernel:  req@ffff881fb5a31000 x1468571558663648/t0(0) o101->1bfa42e7-94df-b8e8-2a7b-b32039aac0ce@10.21.22.23@tcp:0/0 lens 576/3448 e 2 to 0 dl 1403703362 ref 2 fl Interpret:/0/0 rc 0/0
      Jun 25 14:42:02 bmds1 kernel: Lustre: 22454:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply
      Jun 25 14:42:02 bmds1 kernel:  req@ffff88113611a800 x1468571563425972/t0(0) o101->1bfa42e7-94df-b8e8-2a7b-b32039aac0ce@10.21.22.23@tcp:0/0 lens 608/3448 e 0 to 0 dl 1403703727 ref 2 fl Interpret:/0/0 rc 0/0
      Jun 25 14:42:08 bmds1 kernel: Lustre: bravo-MDT0000: Client 1bfa42e7-94df-b8e8-2a7b-b32039aac0ce (at 10.21.22.23@tcp) reconnecting
      Jun 25 14:42:08 bmds1 kernel: Lustre: Skipped 3 previous similar messages
      Jun 25 14:42:08 bmds1 kernel: Lustre: bravo-MDT0000: Client 1bfa42e7-94df-b8e8-2a7b-b32039aac0ce (at 10.21.22.23@tcp) refused reconnection, still busy with 1 active RPCs
      Jun 25 14:42:08 bmds1 kernel: Lustre: Skipped 3 previous similar messages
      Jun 25 14:49:32 bmds1 kernel: LNet: Service thread pid 6605 was inactive for 1200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Jun 25 14:49:32 bmds1 kernel: Pid: 6605, comm: mdt00_000
      Jun 25 14:49:32 bmds1 kernel: 
      Jun 25 14:49:32 bmds1 kernel: Call Trace:
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa03e66fe>] cfs_waitq_wait+0xe/0x10 [libcfs]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa06be0aa>] ldlm_completion_ast+0x57a/0x960 [ptlrpc]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa06b9790>] ? ldlm_expired_completion_wait+0x0/0x390 [ptlrpc]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffff81063410>] ? default_wake_function+0x0/0x20
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa06bd758>] ldlm_cli_enqueue_local+0x1f8/0x5d0 [ptlrpc]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa06bdb30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa0dc7c7b>] mdt_object_lock0+0x33b/0xaf0 [mdt]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa06bdb30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa0dc84f4>] mdt_object_lock+0x14/0x20 [mdt]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa0dd75a9>] mdt_getattr_name_lock+0xe19/0x1980 [mdt]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa06e6135>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa070e646>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa06e83c4>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa0dd83ad>] mdt_intent_getattr+0x29d/0x490 [mdt]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa0dc4f1e>] mdt_intent_policy+0x39e/0x720 [mdt]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa069e831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa06c51ef>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa0dc53a6>] mdt_enqueue+0x46/0xe0 [mdt]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa0dcba97>] mdt_handle_common+0x647/0x16d0 [mdt]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa06e7bac>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa0e053f5>] mds_regular_handle+0x15/0x20 [mdt]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa06f73c8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa03e65de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa03f7d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa06ee729>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffff81055ad3>] ? __wake_up+0x53/0x70
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa06f875e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:49:32 bmds1 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      Jun 25 14:49:32 bmds1 kernel: 
      Jun 25 14:49:32 bmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1403704172.6605
      Jun 25 14:49:34 bmds1 kernel: Pid: 22449, comm: mdt02_019
      Jun 25 14:49:34 bmds1 kernel: 
      Jun 25 14:49:34 bmds1 kernel: Call Trace:
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa03f62d1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa03e66fe>] cfs_waitq_wait+0xe/0x10 [libcfs]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa06be0aa>] ldlm_completion_ast+0x57a/0x960 [ptlrpc]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa06b9790>] ? ldlm_expired_completion_wait+0x0/0x390 [ptlrpc]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffff81063410>] ? default_wake_function+0x0/0x20
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa06bd758>] ldlm_cli_enqueue_local+0x1f8/0x5d0 [ptlrpc]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa06bdb30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa0dc7c7b>] mdt_object_lock0+0x33b/0xaf0 [mdt]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa06bdb30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa0dc84f4>] mdt_object_lock+0x14/0x20 [mdt]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa0dd75a9>] mdt_getattr_name_lock+0xe19/0x1980 [mdt]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa06e6135>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa070e646>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa06e83c4>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa0dd83ad>] mdt_intent_getattr+0x29d/0x490 [mdt]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa0dc4f1e>] mdt_intent_policy+0x39e/0x720 [mdt]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa069e831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa06c51ef>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa0dc53a6>] mdt_enqueue+0x46/0xe0 [mdt]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa0dcba97>] mdt_handle_common+0x647/0x16d0 [mdt]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa06e7bac>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa0e053f5>] mds_regular_handle+0x15/0x20 [mdt]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa06f73c8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa03e65de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa03f7d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa06ee729>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffff81055ad3>] ? __wake_up+0x53/0x70
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa06f875e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:49:34 bmds1 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      Jun 25 14:49:34 bmds1 kernel: 
      Jun 25 14:49:34 bmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1403704174.22449
      Jun 25 14:49:36 bmds1 kernel: Pid: 22504, comm: mdt01_021
      Jun 25 14:49:36 bmds1 kernel: 
      Jun 25 14:49:36 bmds1 kernel: Call Trace:
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa03f62d1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa03e66fe>] cfs_waitq_wait+0xe/0x10 [libcfs]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa06be0aa>] ldlm_completion_ast+0x57a/0x960 [ptlrpc]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa06b9790>] ? ldlm_expired_completion_wait+0x0/0x390 [ptlrpc]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffff81063410>] ? default_wake_function+0x0/0x20
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa06bd758>] ldlm_cli_enqueue_local+0x1f8/0x5d0 [ptlrpc]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa06bdb30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa0dc7c7b>] mdt_object_lock0+0x33b/0xaf0 [mdt]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa0dc1a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa06bdb30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa0dc84f4>] mdt_object_lock+0x14/0x20 [mdt]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa0dd75a9>] mdt_getattr_name_lock+0xe19/0x1980 [mdt]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa06e6135>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa070e646>] ? __req_capsule_get+0x166/0x700 [ptlrpc]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa06e83c4>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa0dd83ad>] mdt_intent_getattr+0x29d/0x490 [mdt]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa0dc4f1e>] mdt_intent_policy+0x39e/0x720 [mdt]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa069e831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa06c51ef>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa0dc53a6>] mdt_enqueue+0x46/0xe0 [mdt]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa0dcba97>] mdt_handle_common+0x647/0x16d0 [mdt]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa06e7bac>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa0e053f5>] mds_regular_handle+0x15/0x20 [mdt]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa06f73c8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa03e65de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa03f7d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa06ee729>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffff81055ad3>] ? __wake_up+0x53/0x70
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa06f875e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffffa06f7c90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      Jun 25 14:49:36 bmds1 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      Jun 25 14:49:36 bmds1 kernel: 
      Jun 25 14:49:36 bmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1403704176.22504
      Jun 25 14:50:53 bmds1 kernel: Lustre: bravo-MDT0000: Client 1bfa42e7-94df-b8e8-2a7b-b32039aac0ce (at 10.21.22.23@tcp) reconnecting
      Jun 25 14:50:53 bmds1 kernel: Lustre: Skipped 62 previous similar messages
      Jun 25 14:50:53 bmds1 kernel: Lustre: bravo-MDT0000: Client 1bfa42e7-94df-b8e8-2a7b-b32039aac0ce (at 10.21.22.23@tcp) refused reconnection, still busy with 1 active RPCs
      Jun 25 14:50:53 bmds1 kernel: Lustre: Skipped 62 previous similar messages
      Jun 25 15:00:55 bmds1 kernel: Lustre: bravo-MDT0000: Client b69b8925-7476-e90c-d5d5-edb6b51853ec (at 10.21.22.24@tcp) reconnecting
      Jun 25 15:00:55 bmds1 kernel: Lustre: Skipped 72 previous similar messages
      Jun 25 15:00:55 bmds1 kernel: Lustre: bravo-MDT0000: Client b69b8925-7476-e90c-d5d5-edb6b51853ec (at 10.21.22.24@tcp) refused reconnection, still busy with 1 active RPCs
      Jun 25 15:00:55 bmds1 kernel: Lustre: Skipped 72 previous similar messages
      Jun 25 15:10:57 bmds1 kernel: Lustre: bravo-MDT0000: Client 0bb08de7-0631-0e52-3ff1-946375c89d85 (at 10.21.22.25@tcp) reconnecting
      Jun 25 15:10:57 bmds1 kernel: Lustre: Skipped 72 previous similar messages
      Jun 25 15:10:57 bmds1 kernel: Lustre: bravo-MDT0000: Client 0bb08de7-0631-0e52-3ff1-946375c89d85 (at 10.21.22.25@tcp) refused reconnection, still busy with 1 active RPCs
      Jun 25 15:10:57 bmds1 kernel: Lustre: Skipped 72 previous similar messages
      Jun 25 15:18:37 bmds1 init: tty (/dev/tty2) main process (8061) killed by TERM signal
      Jun 25 15:18:37 bmds1 init: tty (/dev/tty3) main process (8063) killed by TERM signal
      Jun 25 15:18:37 bmds1 init: tty (/dev/tty4) main process (8065) killed by TERM signal
      Jun 25 15:18:37 bmds1 init: tty (/dev/tty5) main process (8069) killed by TERM signal
      Jun 25 15:18:37 bmds1 init: tty (/dev/tty6) main process (8071) killed by TERM signal
      

      Attachments

        1. cmds1-mem.png
          cmds1-mem.png
          62 kB
        2. cmds1-ram.png
          cmds1-ram.png
          24 kB
        3. meminfo.after
          1 kB
        4. meminfo.before
          1 kB
        5. messages
          352 kB
        6. messages.cmds1
          294 kB
        7. vmcore-dmesg.txt
          331 kB
        8. vmcore-dmesg.txt.gz
          38 kB

        Activity

          People

            bfaccini Bruno Faccini (Inactive)
            daire Daire Byrne (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: