=~=~=~=~=~=~=~=~=~=~=~= PuTTY log 2012.06.18 09:49:31 =~=~=~=~=~=~=~=~=~=~=~= cat /var/log/messages | grep "Jun 17" Jun 17 04:02:02 ALPL505 syslogd 1.4.1: restart. Jun 17 04:39:24 ALPL505 kernel: Lustre: Service thread pid 955 was inactive for 508.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 17 04:39:24 ALPL505 kernel: Pid: 955, comm: ll_mdt_60 Jun 17 04:39:24 ALPL505 kernel: Jun 17 04:39:24 ALPL505 kernel: Call Trace: Jun 17 04:39:24 ALPL505 kernel: [] lcw_cb+0x0/0x460 [libcfs] Jun 17 04:39:24 ALPL505 kernel: [] libcfs_debug_dumpstack+0x51/0x60 [libcfs] Jun 17 04:39:24 ALPL505 kernel: [] lcw_cb+0x33c/0x460 [libcfs] Jun 17 04:39:24 ALPL505 kernel: [] run_timer_softirq+0x193/0x241 Jun 17 04:39:24 ALPL505 kernel: [] __do_softirq+0x89/0x133 Jun 17 04:39:24 ALPL505 kernel: [] call_softirq+0x1c/0x28 Jun 17 04:39:24 ALPL505 kernel: [] do_softirq+0x2c/0x7d Jun 17 04:39:24 ALPL505 kernel: [] apic_timer_interrupt+0x66/0x6c Jun 17 04:39:24 ALPL505 kernel: [] cache_reap+0x0/0x217 Jun 17 04:39:24 ALPL505 kernel: [] _spin_unlock_irqrestore+0x8/0x9 Jun 17 04:39:24 ALPL505 kernel: [] __down_trylock+0x44/0x4e Jun 17 04:39:24 ALPL505 kernel: [] __down_failed_trylock+0x35/0x3a Jun 17 04:39:24 ALPL505 kernel: [] cache_reap+0x0/0x217 Jun 17 04:39:24 ALPL505 kernel: [] .text.lock.ldlm_resource+0x73/0x87 [ptlrpc] Jun 17 04:39:24 ALPL505 kernel: [] ldlm_pools_shrink+0x147/0x2f0 [ptlrpc] Jun 17 04:39:24 ALPL505 kernel: [] __down_read+0x12/0x92 Jun 17 04:39:24 ALPL505 kernel: [] __up_read+0x19/0x7f Jun 17 04:39:24 ALPL505 kernel: [] shrink_slab+0xdc/0x153 Jun 17 04:39:24 ALPL505 kernel: [] zone_reclaim+0x235/0x2cd Jun 17 04:39:24 ALPL505 kernel: [] get_page_from_freelist+0xbf/0x43a Jun 17 04:39:24 ALPL505 kernel: [] __alloc_pages+0x78/0x308 Jun 17 04:39:24 ALPL505 kernel: [] cache_grow+0x133/0x3c1 Jun 17 04:39:24 ALPL505 kernel: [] cache_alloc_refill+0x136/0x186 Jun 17 04:39:24 ALPL505 kernel: [] kmem_cache_alloc+0x6c/0x76 Jun 17 04:39:24 ALPL505 kernel: [] ldiskfs_alloc_inode+0x19/0x150 [ldiskfs] Jun 17 04:39:24 ALPL505 kernel: [] alloc_inode+0x17/0x192 Jun 17 04:39:24 ALPL505 kernel: [] iget_locked+0x6d/0x149 Jun 17 04:39:24 ALPL505 kernel: [] ldiskfs_iget+0x38/0x6f0 [ldiskfs] Jun 17 04:39:24 ALPL505 kernel: [] ldiskfs_lookup+0xbb/0x200 [ldiskfs] Jun 17 04:39:24 ALPL505 kernel: [] __lookup_hash+0x10b/0x12f Jun 17 04:39:24 ALPL505 kernel: [] lookup_one_len+0x53/0x61 Jun 17 04:39:24 ALPL505 kernel: [] mds_lookup+0xa4/0x760 [mds] Jun 17 04:39:24 ALPL505 kernel: [] upcall_cache_get_entry+0x920/0xa50 [lvfs] Jun 17 04:39:24 ALPL505 kernel: [] mds_get_parent_child_locked+0x33f/0x960 [mds] Jun 17 04:39:24 ALPL505 kernel: [] ldlm_lock_remove_from_lru+0x74/0xe0 [ptlrpc] Jun 17 04:39:24 ALPL505 kernel: [] lock_res_and_lock+0xba/0xd0 [ptlrpc] Jun 17 04:39:24 ALPL505 kernel: [] mds_getattr_lock+0x632/0xc90 [mds] Jun 17 04:39:24 ALPL505 kernel: [] fixup_handle_for_resent_req+0x5a/0x2c0 [mds] Jun 17 04:39:24 ALPL505 kernel: [] mds_intent_policy+0x623/0xc20 [mds] Jun 17 04:39:24 ALPL505 kernel: [] ldlm_resource_putref_internal+0x230/0x460 [ptlrpc] Jun 17 04:39:24 ALPL505 kernel: [] ldlm_lock_enqueue+0x186/0xb20 [ptlrpc] Jun 17 04:39:24 ALPL505 kernel: [] ldlm_lock_create+0x9bd/0x9f0 [ptlrpc] Jun 17 04:39:24 ALPL505 kernel: [] ldlm_server_blocking_ast+0x0/0x83d [ptlrpc] Jun 17 04:39:24 ALPL505 kernel: [] ldlm_handle_enqueue+0xc09/0x1210 [ptlrpc] Jun 17 04:39:24 ALPL505 kernel: [] mds_handle+0x40e0/0x4d10 [mds] Jun 17 04:39:24 ALPL505 kernel: [] smp_send_reschedule+0x4e/0x53 Jun 17 04:39:24 ALPL505 kernel: [] enqueue_task+0x41/0x56 Jun 17 04:39:24 ALPL505 kernel: [] lustre_msg_get_conn_cnt+0x35/0xf0 [ptlrpc] Jun 17 04:39:24 ALPL505 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Jun 17 04:39:24 ALPL505 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Jun 17 04:39:24 ALPL505 kernel: [] __wake_up_common+0x3e/0x68 Jun 17 04:39:24 ALPL505 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Jun 17 04:39:24 ALPL505 kernel: [] child_rip+0xa/0x11 Jun 17 04:39:24 ALPL505 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Jun 17 04:39:24 ALPL505 kernel: [] child_rip+0x0/0x11 Jun 17 04:39:24 ALPL505 kernel: Jun 17 04:40:40 ALPL505 kernel: Lustre: Service thread pid 955 completed after 583.68s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jun 17 05:45:08 ALPL505 kernel: LustreError: 1843:0:(mds_open.c:1645:mds_close()) @@@ no handle for file close ino 21989538: cookie 0x1e6d8ca7fa6bf800 req@ffff810287df6400 x1401981983149299/t0 o35->c9344f7b-1e2a-0615-0b51-cbf06bb316a5@NET_0x500000a030235_UUID:0/0 lens 408/4896 e 0 to 0 dl 1339883114 ref 1 fl Interpret:/0/0 rc 0/0 Jun 17 05:45:08 ALPL505 kernel: LustreError: 1843:0:(mds_open.c:1645:mds_close()) Skipped 1 previous similar message Jun 17 05:45:08 ALPL505 kernel: LustreError: 1843:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-116) req@ffff810287df6400 x1401981983149299/t0 o35->c9344f7b-1e2a-0615-0b51-cbf06bb316a5@NET_0x500000a030235_UUID:0/0 lens 408/2928 e 0 to 0 dl 1339883114 ref 1 fl Interpret:/0/0 rc -116/0 Jun 17 05:45:08 ALPL505 kernel: LustreError: 1843:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 1 previous similar message Jun 17 05:45:09 ALPL505 kernel: LustreError: 2131:0:(mds_open.c:1645:mds_close()) @@@ no handle for file close ino 21922157: cookie 0x1e6d8ca7fa68693c req@ffff810237e71c00 x1401981983149371/t0 o35->c9344f7b-1e2a-0615-0b51-cbf06bb316a5@NET_0x500000a030235_UUID:0/0 lens 408/4896 e 0 to 0 dl 1339883115 ref 1 fl Interpret:/0/0 rc 0/0 Jun 17 06:02:54 ALPL505 kernel: Lustre: Service thread pid 981 was inactive for 710.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 17 06:02:54 ALPL505 kernel: Pid: 981, comm: ll_mdt_86 Jun 17 06:02:54 ALPL505 kernel: Jun 17 06:02:54 ALPL505 kernel: Call Trace: Jun 17 06:02:54 ALPL505 kernel: [] thread_return+0x62/0xfe Jun 17 06:02:54 ALPL505 kernel: [] try_to_wake_up+0x472/0x484 Jun 17 06:02:54 ALPL505 kernel: [] cache_reap+0x0/0x217 Jun 17 06:02:54 ALPL505 kernel: [] cache_reap+0x0/0x217 Jun 17 06:02:54 ALPL505 kernel: [] __down_trylock+0x44/0x4e Jun 17 06:02:54 ALPL505 kernel: [] __down_failed_trylock+0x35/0x3a Jun 17 06:02:54 ALPL505 kernel: [] cache_reap+0x0/0x217 Jun 17 06:02:54 ALPL505 kernel: [] ldlm_pool_shrink+0x50/0xf0 [ptlrpc] Jun 17 06:02:54 ALPL505 kernel: [] .text.lock.ldlm_resource+0x73/0x87 [ptlrpc] Jun 17 06:02:54 ALPL505 kernel: [] ldlm_pools_shrink+0x1fd/0x2f0 [ptlrpc] Jun 17 06:02:54 ALPL505 kernel: [] cache_reap+0x0/0x217 Jun 17 06:02:54 ALPL505 kernel: [] shrink_slab+0xdc/0x153 Jun 17 06:02:54 ALPL505 kernel: [] zone_reclaim+0x235/0x2cd Jun 17 06:02:54 ALPL505 kernel: [] check_block_validity+0x45/0xa0 [ldiskfs] Jun 17 06:02:54 ALPL505 kernel: [] get_page_from_freelist+0xbf/0x43a Jun 17 06:02:54 ALPL505 kernel: [] __alloc_pages+0x78/0x308 Jun 17 06:02:54 ALPL505 kernel: [] cache_grow+0x133/0x3c1 Jun 17 06:02:54 ALPL505 kernel: [] cache_alloc_refill+0x136/0x186 Jun 17 06:02:54 ALPL505 kernel: [] kmem_cache_alloc+0x6c/0x76 Jun 17 06:02:54 ALPL505 kernel: [] ldiskfs_alloc_inode+0x19/0x150 [ldiskfs] Jun 17 06:02:54 ALPL505 kernel: [] alloc_inode+0x17/0x192 Jun 17 06:02:54 ALPL505 kernel: [] iget_locked+0x6d/0x149 Jun 17 06:02:54 ALPL505 kernel: [] ldiskfs_iget+0x38/0x6f0 [ldiskfs] Jun 17 06:02:54 ALPL505 kernel: [] ldiskfs_lookup+0xbb/0x200 [ldiskfs] Jun 17 06:02:54 ALPL505 kernel: [] __lookup_hash+0x10b/0x12f Jun 17 06:02:54 ALPL505 kernel: [] lookup_one_len+0x53/0x61 Jun 17 06:02:54 ALPL505 kernel: [] mds_lookup+0xa4/0x760 [mds] Jun 17 06:02:54 ALPL505 kernel: [] upcall_cache_get_entry+0x920/0xa50 [lvfs] Jun 17 06:02:54 ALPL505 kernel: [] mds_get_parent_child_locked+0x33f/0x960 [mds] Jun 17 06:02:54 ALPL505 kernel: [] ldlm_lock_remove_from_lru+0x74/0xe0 [ptlrpc] Jun 17 06:02:54 ALPL505 kernel: [] lock_res_and_lock+0xba/0xd0 [ptlrpc] Jun 17 06:02:54 ALPL505 kernel: [] mds_getattr_lock+0x632/0xc90 [mds] Jun 17 06:02:54 ALPL505 kernel: [] fixup_handle_for_resent_req+0x5a/0x2c0 [mds] Jun 17 06:02:54 ALPL505 kernel: [] mds_intent_policy+0x623/0xc20 [mds] Jun 17 06:02:54 ALPL505 kernel: [] ldlm_resource_putref_internal+0x230/0x460 [ptlrpc] Jun 17 06:02:54 ALPL505 kernel: [] ldlm_lock_enqueue+0x186/0xb20 [ptlrpc] Jun 17 06:02:54 ALPL505 kernel: [] ldlm_lock_create+0x9bd/0x9f0 [ptlrpc] Jun 17 06:02:54 ALPL505 kernel: [] ldlm_server_blocking_ast+0x0/0x83d [ptlrpc] Jun 17 06:02:54 ALPL505 kernel: [] ldlm_handle_enqueue+0xc09/0x1210 [ptlrpc] Jun 17 06:02:54 ALPL505 kernel: [] mds_handle+0x40e0/0x4d10 [mds] Jun 17 06:02:54 ALPL505 kernel: [] smp_send_reschedule+0x4e/0x53 Jun 17 06:02:54 ALPL505 kernel: [] enqueue_task+0x41/0x56 Jun 17 06:02:54 ALPL505 kernel: [] lustre_msg_get_conn_cnt+0x35/0xf0 [ptlrpc] Jun 17 06:02:54 ALPL505 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Jun 17 06:02:54 ALPL505 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Jun 17 06:02:54 ALPL505 kernel: [] __wake_up_common+0x3e/0x68 Jun 17 06:02:54 ALPL505 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Jun 17 06:02:54 ALPL505 kernel: [] child_rip+0xa/0x11 Jun 17 06:02:54 ALPL505 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Jun 17 06:02:54 ALPL505 kernel: [] child_rip+0x0/0x11 Jun 17 06:02:54 ALPL505 kernel: Jun 17 06:06:15 ALPL505 kernel: Lustre: 612:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-311), not sending early reply Jun 17 06:06:15 ALPL505 kernel: req@ffff81018a557450 x1381174265916714/t0 o101->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 560/2720 e 1 to 0 dl 1339884380 ref 2 fl Interpret:/0/0 rc 0/0 Jun 17 06:08:24 ALPL505 kernel: Lustre: 949:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 06:08:24 ALPL505 kernel: Lustre: 949:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 6 previous similar messages Jun 17 06:08:24 ALPL505 kernel: Lustre: 949:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 06:08:24 ALPL505 kernel: Lustre: 949:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 6 previous similar messages Jun 17 06:08:24 ALPL505 kernel: LustreError: 949:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8101878d7800 x1381174265935143/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339884604 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 06:08:24 ALPL505 kernel: LustreError: 949:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 5 previous similar messages Jun 17 06:08:38 ALPL505 kernel: LustreError: 987:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810633064c00 x1381174265935272/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339884618 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 06:08:38 ALPL505 kernel: LustreError: 987:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 1 previous similar message Jun 17 06:08:45 ALPL505 kernel: Lustre: 936:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 06:08:45 ALPL505 kernel: Lustre: 936:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 2 previous similar messages Jun 17 06:08:45 ALPL505 kernel: Lustre: 936:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 06:08:45 ALPL505 kernel: Lustre: 936:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 2 previous similar messages Jun 17 06:08:59 ALPL505 kernel: LustreError: 985:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8102ca96ac00 x1381174265935707/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339884639 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 06:08:59 ALPL505 kernel: LustreError: 985:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 2 previous similar messages Jun 17 06:09:27 ALPL505 kernel: Lustre: 629:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 06:09:27 ALPL505 kernel: Lustre: 629:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 5 previous similar messages Jun 17 06:09:27 ALPL505 kernel: Lustre: 629:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 06:09:27 ALPL505 kernel: Lustre: 629:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 5 previous similar messages Jun 17 06:09:34 ALPL505 kernel: LustreError: 940:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810637195400 x1381174265936402/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339884674 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 06:09:34 ALPL505 kernel: LustreError: 940:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 4 previous similar messages Jun 17 06:10:44 ALPL505 kernel: Lustre: 1049:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 06:10:44 ALPL505 kernel: Lustre: 1049:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 10 previous similar messages Jun 17 06:10:44 ALPL505 kernel: Lustre: 1049:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 06:10:44 ALPL505 kernel: Lustre: 1049:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 10 previous similar messages Jun 17 06:10:44 ALPL505 kernel: LustreError: 1049:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff81021119dc00 x1381174265937600/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339884744 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 06:10:44 ALPL505 kernel: LustreError: 1049:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 9 previous similar messages Jun 17 06:12:57 ALPL505 kernel: LustreError: 939:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8101d9b32400 x1381174265940064/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339884877 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 06:12:57 ALPL505 kernel: LustreError: 939:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 18 previous similar messages Jun 17 06:13:18 ALPL505 kernel: Lustre: 966:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 06:13:18 ALPL505 kernel: Lustre: 966:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 21 previous similar messages Jun 17 06:13:18 ALPL505 kernel: Lustre: 966:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 06:13:18 ALPL505 kernel: Lustre: 966:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 21 previous similar messages Jun 17 06:17:16 ALPL505 kernel: LustreError: 625:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8103d9565800 x1381174265944474/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339885136 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 06:17:16 ALPL505 kernel: LustreError: 625:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 36 previous similar messages Jun 17 06:18:19 ALPL505 kernel: Lustre: 629:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 06:18:19 ALPL505 kernel: Lustre: 629:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 42 previous similar messages Jun 17 06:18:19 ALPL505 kernel: Lustre: 629:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 06:18:19 ALPL505 kernel: Lustre: 629:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 42 previous similar messages Jun 17 06:25:54 ALPL505 kernel: LustreError: 1001:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8101b2a0f000 x1381174265953981/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339885654 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 06:25:54 ALPL505 kernel: LustreError: 1001:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 73 previous similar messages Jun 17 06:28:21 ALPL505 kernel: Lustre: 961:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 06:28:21 ALPL505 kernel: Lustre: 961:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 85 previous similar messages Jun 17 06:28:21 ALPL505 kernel: Lustre: 961:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 06:28:21 ALPL505 kernel: Lustre: 961:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 85 previous similar messages Jun 17 06:35:56 ALPL505 kernel: LustreError: 957:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff81063cf26000 x1381174265966639/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339886256 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 06:35:56 ALPL505 kernel: LustreError: 957:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 85 previous similar messages Jun 17 06:38:23 ALPL505 kernel: Lustre: 1025:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 06:38:23 ALPL505 kernel: Lustre: 1025:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 85 previous similar messages Jun 17 06:38:23 ALPL505 kernel: Lustre: 1025:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 06:38:23 ALPL505 kernel: Lustre: 1025:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 85 previous similar messages Jun 17 06:45:58 ALPL505 kernel: LustreError: 628:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8101a8bb8000 x1381174265977482/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339886858 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 06:45:58 ALPL505 kernel: LustreError: 628:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 85 previous similar messages Jun 17 06:48:25 ALPL505 kernel: Lustre: 940:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 06:48:25 ALPL505 kernel: Lustre: 940:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 85 previous similar messages Jun 17 06:48:25 ALPL505 kernel: Lustre: 940:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 06:48:25 ALPL505 kernel: Lustre: 940:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 85 previous similar messages Jun 17 06:53:38 ALPL505 kernel: Lustre: 981:0:(service.c:1434:ptlrpc_server_handle_request()) @@@ Request x1381174265916714 took longer than estimated (916+2838s); client may timeout. req@ffff81018a557450 x1381174265916714/t0 o101->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 560/808 e 1 to 0 dl 1339884380 ref 1 fl Complete:/0/0 rc 0/0 Jun 17 06:53:38 ALPL505 kernel: Lustre: Service thread pid 981 completed after 3753.49s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jun 17 07:11:47 ALPL505 kernel: Lustre: 988:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Jun 17 07:11:47 ALPL505 kernel: req@ffff81025d0a9000 x1381174266525664/t0 o101->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 568/2720 e 0 to 0 dl 1339888312 ref 2 fl Interpret:/0/0 rc 0/0 Jun 17 07:11:53 ALPL505 kernel: Lustre: 988:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 07:11:53 ALPL505 kernel: Lustre: 988:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 45 previous similar messages Jun 17 07:11:53 ALPL505 kernel: Lustre: 988:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 07:11:53 ALPL505 kernel: Lustre: 988:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 44 previous similar messages Jun 17 07:11:53 ALPL505 kernel: LustreError: 988:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff81062d96a000 x1381174266539508/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339888413 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 07:11:53 ALPL505 kernel: LustreError: 988:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 65 previous similar messages Jun 17 07:13:10 ALPL505 kernel: Lustre: 966:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 07:13:10 ALPL505 kernel: Lustre: 966:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 10 previous similar messages Jun 17 07:13:10 ALPL505 kernel: Lustre: 966:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 07:13:10 ALPL505 kernel: Lustre: 966:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 10 previous similar messages Jun 17 07:13:10 ALPL505 kernel: LustreError: 966:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810195f42c00 x1381174266540863/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339888490 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 07:13:10 ALPL505 kernel: LustreError: 966:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 10 previous similar messages Jun 17 07:15:45 ALPL505 kernel: Lustre: 1020:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 07:15:45 ALPL505 kernel: Lustre: 1020:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 21 previous similar messages Jun 17 07:15:45 ALPL505 kernel: Lustre: 1020:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 07:15:45 ALPL505 kernel: Lustre: 1020:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 21 previous similar messages Jun 17 07:15:45 ALPL505 kernel: LustreError: 1020:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8102dbd02800 x1381174266543513/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339888645 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 07:15:45 ALPL505 kernel: LustreError: 1020:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 21 previous similar messages Jun 17 07:19:17 ALPL505 kernel: Lustre: Service thread pid 1052 was inactive for 1200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 17 07:19:17 ALPL505 kernel: Pid: 1052, comm: ll_mdt_124 Jun 17 07:19:17 ALPL505 kernel: Jun 17 07:19:17 ALPL505 kernel: Call Trace: Jun 17 07:19:17 ALPL505 kernel: [] __down_trylock+0x44/0x4e Jun 17 07:19:17 ALPL505 kernel: [] __down_failed_trylock+0x35/0x3a Jun 17 07:19:17 ALPL505 kernel: [] ldlm_pool_shrink+0x50/0xf0 [ptlrpc] Jun 17 07:19:17 ALPL505 kernel: [] .text.lock.ldlm_resource+0x73/0x87 [ptlrpc] Jun 17 07:19:17 ALPL505 kernel: [] ldlm_pools_shrink+0x15c/0x2f0 [ptlrpc] Jun 17 07:19:17 ALPL505 kernel: [] __down_read+0x12/0x92 Jun 17 07:19:17 ALPL505 kernel: [] __up_read+0x19/0x7f Jun 17 07:19:17 ALPL505 kernel: [] shrink_slab+0xd0/0x153 Jun 17 07:19:17 ALPL505 kernel: [] zone_reclaim+0x235/0x2cd Jun 17 07:19:17 ALPL505 kernel: [] check_block_validity+0x45/0xa0 [ldiskfs] Jun 17 07:19:17 ALPL505 kernel: [] get_page_from_freelist+0xbf/0x43a Jun 17 07:19:17 ALPL505 kernel: [] __alloc_pages+0x78/0x308 Jun 17 07:19:17 ALPL505 kernel: [] cache_grow+0x133/0x3c1 Jun 17 07:19:17 ALPL505 kernel: [] cache_alloc_refill+0x136/0x186 Jun 17 07:19:17 ALPL505 kernel: [] kmem_cache_alloc+0x6c/0x76 Jun 17 07:19:17 ALPL505 kernel: [] ldiskfs_alloc_inode+0x19/0x150 [ldiskfs] Jun 17 07:19:17 ALPL505 kernel: [] alloc_inode+0x17/0x192 Jun 17 07:19:17 ALPL505 kernel: [] iget_locked+0x6d/0x149 Jun 17 07:19:17 ALPL505 kernel: [] ldiskfs_iget+0x38/0x6f0 [ldiskfs] Jun 17 07:19:17 ALPL505 kernel: [] ldiskfs_lookup+0xbb/0x200 [ldiskfs] Jun 17 07:19:17 ALPL505 kernel: [] __lookup_hash+0x10b/0x12f Jun 17 07:19:17 ALPL505 kernel: [] lookup_one_len+0x53/0x61 Jun 17 07:19:17 ALPL505 kernel: [] mds_lookup+0xa4/0x760 [mds] Jun 17 07:19:17 ALPL505 kernel: [] upcall_cache_get_entry+0x920/0xa50 [lvfs] Jun 17 07:19:17 ALPL505 kernel: [] mds_get_parent_child_locked+0x33f/0x960 [mds] Jun 17 07:19:17 ALPL505 kernel: [] mds_getattr_lock+0x632/0xc90 [mds] Jun 17 07:19:17 ALPL505 kernel: [] fixup_handle_for_resent_req+0x5a/0x2c0 [mds] Jun 17 07:19:17 ALPL505 kernel: [] mds_intent_policy+0x623/0xc20 [mds] Jun 17 07:19:17 ALPL505 kernel: [] ldlm_resource_putref_internal+0x230/0x460 [ptlrpc] Jun 17 07:19:17 ALPL505 kernel: [] ldlm_lock_enqueue+0x186/0xb20 [ptlrpc] Jun 17 07:19:17 ALPL505 kernel: [] ldlm_lock_create+0x9bd/0x9f0 [ptlrpc] Jun 17 07:19:17 ALPL505 kernel: [] ldlm_server_blocking_ast+0x0/0x83d [ptlrpc] Jun 17 07:19:17 ALPL505 kernel: [] ldlm_handle_enqueue+0xc09/0x1210 [ptlrpc] Jun 17 07:19:17 ALPL505 kernel: [] mds_handle+0x40e0/0x4d10 [mds] Jun 17 07:19:17 ALPL505 kernel: [] smp_send_reschedule+0x4e/0x53 Jun 17 07:19:17 ALPL505 kernel: [] enqueue_task+0x41/0x56 Jun 17 07:19:17 ALPL505 kernel: [] lustre_msg_get_conn_cnt+0x35/0xf0 [ptlrpc] Jun 17 07:19:17 ALPL505 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Jun 17 07:19:17 ALPL505 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Jun 17 07:19:17 ALPL505 kernel: [] __wake_up_common+0x3e/0x68 Jun 17 07:19:17 ALPL505 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Jun 17 07:19:17 ALPL505 kernel: [] child_rip+0xa/0x11 Jun 17 07:19:17 ALPL505 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Jun 17 07:19:17 ALPL505 kernel: [] child_rip+0x0/0x11 Jun 17 07:19:17 ALPL505 kernel: Jun 17 07:20:46 ALPL505 kernel: Lustre: 631:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 07:20:46 ALPL505 kernel: Lustre: 631:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 42 previous similar messages Jun 17 07:20:46 ALPL505 kernel: Lustre: 631:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 07:20:46 ALPL505 kernel: Lustre: 631:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 42 previous similar messages Jun 17 07:20:46 ALPL505 kernel: LustreError: 631:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810263868400 x1381174266549283/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339888946 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 07:20:46 ALPL505 kernel: LustreError: 631:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 42 previous similar messages Jun 17 07:30:48 ALPL505 kernel: Lustre: 1035:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 07:30:48 ALPL505 kernel: Lustre: 1035:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 85 previous similar messages Jun 17 07:30:48 ALPL505 kernel: Lustre: 1035:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 07:30:48 ALPL505 kernel: Lustre: 1035:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 85 previous similar messages Jun 17 07:30:48 ALPL505 kernel: LustreError: 1035:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff81014d2bac00 x1381174266562471/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339889548 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 07:30:48 ALPL505 kernel: LustreError: 1035:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 85 previous similar messages Jun 17 07:40:50 ALPL505 kernel: Lustre: 968:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 07:40:50 ALPL505 kernel: Lustre: 968:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 85 previous similar messages Jun 17 07:40:50 ALPL505 kernel: Lustre: 968:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 07:40:50 ALPL505 kernel: Lustre: 968:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 85 previous similar messages Jun 17 07:40:50 ALPL505 kernel: LustreError: 968:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8102aa02ec00 x1381174266574360/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339890150 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 07:40:50 ALPL505 kernel: LustreError: 968:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 85 previous similar messages Jun 17 07:50:52 ALPL505 kernel: Lustre: 930:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 07:50:52 ALPL505 kernel: Lustre: 930:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 85 previous similar messages Jun 17 07:50:52 ALPL505 kernel: Lustre: 930:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 07:50:52 ALPL505 kernel: Lustre: 930:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 85 previous similar messages Jun 17 07:50:52 ALPL505 kernel: LustreError: 930:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff81032a1b4400 x1381174266584605/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339890752 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 07:50:52 ALPL505 kernel: LustreError: 930:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 85 previous similar messages Jun 17 08:00:54 ALPL505 kernel: Lustre: 964:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 08:00:54 ALPL505 kernel: Lustre: 964:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 85 previous similar messages Jun 17 08:00:54 ALPL505 kernel: Lustre: 964:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 08:00:54 ALPL505 kernel: Lustre: 964:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 85 previous similar messages Jun 17 08:00:54 ALPL505 kernel: LustreError: 964:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff81023b198800 x1381174266595100/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339891354 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 08:00:54 ALPL505 kernel: LustreError: 964:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 85 previous similar messages Jun 17 08:10:56 ALPL505 kernel: Lustre: 630:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 08:10:56 ALPL505 kernel: Lustre: 630:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 85 previous similar messages Jun 17 08:10:56 ALPL505 kernel: Lustre: 630:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 08:10:56 ALPL505 kernel: Lustre: 630:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 85 previous similar messages Jun 17 08:10:56 ALPL505 kernel: LustreError: 630:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff81019ded1c00 x1381174266613246/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339891956 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 08:10:56 ALPL505 kernel: LustreError: 630:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 85 previous similar messages Jun 17 08:20:58 ALPL505 kernel: Lustre: 993:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 08:20:58 ALPL505 kernel: Lustre: 993:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 85 previous similar messages Jun 17 08:20:58 ALPL505 kernel: Lustre: 993:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 08:20:58 ALPL505 kernel: Lustre: 993:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 85 previous similar messages Jun 17 08:20:58 ALPL505 kernel: LustreError: 993:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8102b48d7800 x1381174266623806/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339892558 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 08:20:58 ALPL505 kernel: LustreError: 993:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 85 previous similar messages Jun 17 08:31:00 ALPL505 kernel: Lustre: 938:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 08:31:00 ALPL505 kernel: Lustre: 938:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 85 previous similar messages Jun 17 08:31:00 ALPL505 kernel: Lustre: 938:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 08:31:00 ALPL505 kernel: Lustre: 938:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 85 previous similar messages Jun 17 08:31:00 ALPL505 kernel: LustreError: 938:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff81023647fc00 x1381174266634112/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339893160 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 08:31:00 ALPL505 kernel: LustreError: 938:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 85 previous similar messages Jun 17 08:41:02 ALPL505 kernel: Lustre: 990:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 08:41:02 ALPL505 kernel: Lustre: 990:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 85 previous similar messages Jun 17 08:41:02 ALPL505 kernel: Lustre: 990:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 08:41:02 ALPL505 kernel: Lustre: 990:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 85 previous similar messages Jun 17 08:41:02 ALPL505 kernel: LustreError: 990:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105f8b06000 x1381174266646265/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339893762 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 08:41:02 ALPL505 kernel: LustreError: 990:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 85 previous similar messages Jun 17 08:51:04 ALPL505 kernel: Lustre: 608:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 08:51:04 ALPL505 kernel: Lustre: 608:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 85 previous similar messages Jun 17 08:51:04 ALPL505 kernel: Lustre: 608:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 08:51:04 ALPL505 kernel: Lustre: 608:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 85 previous similar messages Jun 17 08:51:04 ALPL505 kernel: LustreError: 608:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810631655400 x1381174266656529/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339894364 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 08:51:04 ALPL505 kernel: LustreError: 608:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 85 previous similar messages Jun 17 08:54:37 ALPL505 kernel: Lustre: 1052:0:(service.c:1434:ptlrpc_server_handle_request()) @@@ Request x1381174266525664 took longer than estimated (755+6165s); client may timeout. req@ffff81025d0a9000 x1381174266525664/t0 o101->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 568/808 e 0 to 0 dl 1339888312 ref 1 fl Complete:/0/0 rc 0/0 Jun 17 08:54:37 ALPL505 kernel: Lustre: Service thread pid 1052 completed after 6920.62s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jun 17 09:05:48 ALPL505 kernel: Lustre: Service thread pid 998 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 17 09:05:48 ALPL505 kernel: Pid: 998, comm: ll_mdt_101 Jun 17 09:05:48 ALPL505 kernel: Jun 17 09:05:48 ALPL505 kernel: Call Trace: Jun 17 09:05:48 ALPL505 kernel: [] lcw_cb+0x0/0x460 [libcfs] Jun 17 09:05:48 ALPL505 kernel: [] libcfs_debug_dumpstack+0x51/0x60 [libcfs] Jun 17 09:05:48 ALPL505 kernel: [] lcw_cb+0x33c/0x460 [libcfs] Jun 17 09:05:48 ALPL505 kernel: [] run_timer_softirq+0x193/0x241 Jun 17 09:05:48 ALPL505 kernel: [] __do_softirq+0x89/0x133 Jun 17 09:05:48 ALPL505 kernel: [] call_softirq+0x1c/0x28 Jun 17 09:05:48 ALPL505 kernel: [] do_softirq+0x2c/0x7d Jun 17 09:05:48 ALPL505 kernel: [] apic_timer_interrupt+0x66/0x6c Jun 17 09:05:48 ALPL505 kernel: [] blkdev_releasepage+0x0/0x40 Jun 17 09:05:48 ALPL505 kernel: [] cache_reap+0x0/0x217 Jun 17 09:05:48 ALPL505 kernel: [] __list_add+0x9/0x68 Jun 17 09:05:48 ALPL505 kernel: [] ldlm_pools_shrink+0x15c/0x2f0 [ptlrpc] Jun 17 09:05:48 ALPL505 kernel: [] __down_read+0x12/0x92 Jun 17 09:05:48 ALPL505 kernel: [] __up_read+0x19/0x7f Jun 17 09:05:48 ALPL505 kernel: [] shrink_slab+0xd0/0x153 Jun 17 09:05:48 ALPL505 kernel: [] zone_reclaim+0x235/0x2cd Jun 17 09:05:48 ALPL505 kernel: [] get_page_from_freelist+0xbf/0x43a Jun 17 09:05:48 ALPL505 kernel: [] __alloc_pages+0x78/0x308 Jun 17 09:05:48 ALPL505 kernel: [] cache_grow+0x133/0x3c1 Jun 17 09:05:48 ALPL505 kernel: [] cache_alloc_refill+0x136/0x186 Jun 17 09:05:48 ALPL505 kernel: [] kmem_cache_alloc+0x6c/0x76 Jun 17 09:05:48 ALPL505 kernel: [] ldiskfs_alloc_inode+0x19/0x150 [ldiskfs] Jun 17 09:05:48 ALPL505 kernel: [] alloc_inode+0x17/0x192 Jun 17 09:05:48 ALPL505 kernel: [] iget_locked+0x6d/0x149 Jun 17 09:05:48 ALPL505 kernel: [] ldiskfs_iget+0x38/0x6f0 [ldiskfs] Jun 17 09:05:48 ALPL505 kernel: [] ldiskfs_lookup+0xbb/0x200 [ldiskfs] Jun 17 09:05:48 ALPL505 kernel: [] __lookup_hash+0x10b/0x12f Jun 17 09:05:48 ALPL505 kernel: [] lookup_one_len+0x53/0x61 Jun 17 09:05:48 ALPL505 kernel: [] mds_lookup+0xa4/0x760 [mds] Jun 17 09:05:48 ALPL505 kernel: [] upcall_cache_get_entry+0x920/0xa50 [lvfs] Jun 17 09:05:48 ALPL505 kernel: [] mds_get_parent_child_locked+0x33f/0x960 [mds] Jun 17 09:05:48 ALPL505 kernel: [] ldlm_lock_remove_from_lru+0x74/0xe0 [ptlrpc] Jun 17 09:05:48 ALPL505 kernel: [] lock_res_and_lock+0xba/0xd0 [ptlrpc] Jun 17 09:05:48 ALPL505 kernel: [] mds_getattr_lock+0x632/0xc90 [mds] Jun 17 09:05:48 ALPL505 kernel: [] fixup_handle_for_resent_req+0x5a/0x2c0 [mds] Jun 17 09:05:48 ALPL505 kernel: [] mds_intent_policy+0x623/0xc20 [mds] Jun 17 09:05:48 ALPL505 kernel: [] ldlm_resource_putref_internal+0x230/0x460 [ptlrpc] Jun 17 09:05:48 ALPL505 kernel: [] ldlm_lock_enqueue+0x186/0xb20 [ptlrpc] Jun 17 09:05:48 ALPL505 kernel: [] ldlm_lock_create+0x9bd/0x9f0 [ptlrpc] Jun 17 09:05:48 ALPL505 kernel: [] ldlm_server_blocking_ast+0x0/0x83d [ptlrpc] Jun 17 09:05:48 ALPL505 kernel: [] ldlm_handle_enqueue+0xc09/0x1210 [ptlrpc] Jun 17 09:05:48 ALPL505 kernel: [] mds_handle+0x40e0/0x4d10 [mds] Jun 17 09:05:48 ALPL505 kernel: [] smp_send_reschedule+0x4e/0x53 Jun 17 09:05:48 ALPL505 kernel: [] enqueue_task+0x41/0x56 Jun 17 09:05:48 ALPL505 kernel: [] lustre_msg_get_conn_cnt+0x35/0xf0 [ptlrpc] Jun 17 09:05:48 ALPL505 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Jun 17 09:05:48 ALPL505 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Jun 17 09:05:48 ALPL505 kernel: [] __wake_up_common+0x3e/0x68 Jun 17 09:05:48 ALPL505 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Jun 17 09:05:48 ALPL505 kernel: [] child_rip+0xa/0x11 Jun 17 09:05:48 ALPL505 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Jun 17 09:05:48 ALPL505 kernel: [] child_rip+0x0/0x11 Jun 17 09:05:48 ALPL505 kernel: Jun 17 09:15:55 ALPL505 kernel: Lustre: 983:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-207), not sending early reply Jun 17 09:15:55 ALPL505 kernel: req@ffff8102670b6800 x1381174267575219/t0 o101->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 568/2720 e 5 to 0 dl 1339895760 ref 2 fl Interpret:/0/0 rc 0/0 Jun 17 09:17:51 ALPL505 kernel: Lustre: 625:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 09:17:51 ALPL505 kernel: Lustre: 625:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 31 previous similar messages Jun 17 09:17:51 ALPL505 kernel: Lustre: 625:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 09:17:51 ALPL505 kernel: Lustre: 625:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 30 previous similar messages Jun 17 09:17:51 ALPL505 kernel: LustreError: 625:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff81025ac46000 x1381174267703321/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339895971 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 09:17:51 ALPL505 kernel: LustreError: 625:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 30 previous similar messages Jun 17 09:19:08 ALPL505 kernel: Lustre: 983:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 09:19:08 ALPL505 kernel: Lustre: 983:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 10 previous similar messages Jun 17 09:19:08 ALPL505 kernel: Lustre: 983:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 09:19:08 ALPL505 kernel: Lustre: 983:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 10 previous similar messages Jun 17 09:19:08 ALPL505 kernel: LustreError: 983:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8101cb63b000 x1381174267704594/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339896048 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 09:19:08 ALPL505 kernel: LustreError: 983:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 10 previous similar messages Jun 17 09:21:42 ALPL505 kernel: Lustre: 957:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 09:21:42 ALPL505 kernel: Lustre: 957:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 21 previous similar messages Jun 17 09:21:42 ALPL505 kernel: Lustre: 957:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 09:21:42 ALPL505 kernel: Lustre: 957:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 21 previous similar messages Jun 17 09:21:42 ALPL505 kernel: LustreError: 957:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff81014efc4400 x1381174267707579/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339896202 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 09:21:42 ALPL505 kernel: LustreError: 957:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 21 previous similar messages Jun 17 09:26:43 ALPL505 kernel: Lustre: 1049:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 09:26:43 ALPL505 kernel: Lustre: 1049:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 42 previous similar messages Jun 17 09:26:43 ALPL505 kernel: Lustre: 1049:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 09:26:43 ALPL505 kernel: Lustre: 1049:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 42 previous similar messages Jun 17 09:26:43 ALPL505 kernel: LustreError: 1049:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8106315d7400 x1381174267712600/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339896503 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 09:26:43 ALPL505 kernel: LustreError: 1049:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 42 previous similar messages Jun 17 09:28:07 ALPL505 kernel: Lustre: 998:0:(service.c:1434:ptlrpc_server_handle_request()) @@@ Request x1381174267575219 took longer than estimated (812+727s); client may timeout. req@ffff8102670b6800 x1381174267575219/t0 o101->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 568/808 e 5 to 0 dl 1339895760 ref 1 fl Complete:/0/0 rc 0/0 Jun 17 09:28:07 ALPL505 kernel: Lustre: Service thread pid 998 completed after 1539.54s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jun 17 09:53:53 ALPL505 kernel: Lustre: Service thread pid 1057 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 17 09:53:53 ALPL505 kernel: Pid: 1057, comm: ll_mdt_127 Jun 17 09:53:53 ALPL505 kernel: Jun 17 09:53:53 ALPL505 kernel: Call Trace: Jun 17 09:53:53 ALPL505 kernel: [] __sched_text_start+0xf6/0xbce Jun 17 09:53:53 ALPL505 kernel: [] try_to_wake_up+0x472/0x484 Jun 17 09:53:53 ALPL505 kernel: [] cache_reap+0x0/0x217 Jun 17 09:53:53 ALPL505 kernel: [] cache_reap+0x0/0x217 Jun 17 09:53:53 ALPL505 kernel: [] __down_trylock+0x44/0x4e Jun 17 09:53:53 ALPL505 kernel: [] __down_failed_trylock+0x35/0x3a Jun 17 09:53:53 ALPL505 kernel: [] ldlm_pool_shrink+0x50/0xf0 [ptlrpc] Jun 17 09:53:53 ALPL505 kernel: [] ldlm_namespace_put+0x28/0x40 [ptlrpc] Jun 17 09:53:53 ALPL505 kernel: [] ldlm_pools_shrink+0x15c/0x2f0 [ptlrpc] Jun 17 09:53:53 ALPL505 kernel: [] __down_read+0x12/0x92 Jun 17 09:53:53 ALPL505 kernel: [] __up_read+0x19/0x7f Jun 17 09:53:53 ALPL505 kernel: [] shrink_slab+0xdc/0x153 Jun 17 09:53:53 ALPL505 kernel: [] zone_reclaim+0x235/0x2cd Jun 17 09:53:53 ALPL505 kernel: [] check_block_validity+0x45/0xa0 [ldiskfs] Jun 17 09:53:53 ALPL505 kernel: [] get_page_from_freelist+0xbf/0x43a Jun 17 09:53:53 ALPL505 kernel: [] __alloc_pages+0x78/0x308 Jun 17 09:53:53 ALPL505 kernel: [] cache_grow+0x133/0x3c1 Jun 17 09:53:53 ALPL505 kernel: [] cache_alloc_refill+0x136/0x186 Jun 17 09:53:53 ALPL505 kernel: [] kmem_cache_alloc+0x6c/0x76 Jun 17 09:53:53 ALPL505 kernel: [] ldiskfs_alloc_inode+0x19/0x150 [ldiskfs] Jun 17 09:53:53 ALPL505 kernel: [] alloc_inode+0x17/0x192 Jun 17 09:53:53 ALPL505 kernel: [] iget_locked+0x6d/0x149 Jun 17 09:53:53 ALPL505 kernel: [] ldiskfs_iget+0x38/0x6f0 [ldiskfs] Jun 17 09:53:53 ALPL505 kernel: [] ldiskfs_lookup+0xbb/0x200 [ldiskfs] Jun 17 09:53:53 ALPL505 kernel: [] __lookup_hash+0x10b/0x12f Jun 17 09:53:53 ALPL505 kernel: [] lookup_one_len+0x53/0x61 Jun 17 09:53:53 ALPL505 kernel: [] mds_lookup+0xa4/0x760 [mds] Jun 17 09:53:53 ALPL505 kernel: [] upcall_cache_get_entry+0x920/0xa50 [lvfs] Jun 17 09:53:53 ALPL505 kernel: [] mds_get_parent_child_locked+0x33f/0x960 [mds] Jun 17 09:53:53 ALPL505 kernel: [] mds_getattr_lock+0x632/0xc90 [mds] Jun 17 09:53:53 ALPL505 kernel: [] fixup_handle_for_resent_req+0x5a/0x2c0 [mds] Jun 17 09:53:53 ALPL505 kernel: [] mds_intent_policy+0x623/0xc20 [mds] Jun 17 09:53:53 ALPL505 kernel: [] ldlm_resource_putref_internal+0x230/0x460 [ptlrpc] Jun 17 09:53:53 ALPL505 kernel: [] ldlm_lock_enqueue+0x186/0xb20 [ptlrpc] Jun 17 09:53:53 ALPL505 kernel: [] ldlm_lock_create+0x9bd/0x9f0 [ptlrpc] Jun 17 09:53:53 ALPL505 kernel: [] ldlm_server_blocking_ast+0x0/0x83d [ptlrpc] Jun 17 09:53:53 ALPL505 kernel: [] ldlm_handle_enqueue+0xc09/0x1210 [ptlrpc] Jun 17 09:53:53 ALPL505 kernel: [] mds_handle+0x40e0/0x4d10 [mds] Jun 17 09:53:53 ALPL505 kernel: [] enqueue_task+0x41/0x56 Jun 17 09:53:53 ALPL505 kernel: [] __activate_task+0x56/0x6d Jun 17 09:53:53 ALPL505 kernel: [] lustre_msg_get_conn_cnt+0x35/0xf0 [ptlrpc] Jun 17 09:53:53 ALPL505 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Jun 17 09:53:53 ALPL505 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Jun 17 09:53:53 ALPL505 kernel: [] __wake_up_common+0x3e/0x68 Jun 17 09:53:53 ALPL505 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Jun 17 09:53:53 ALPL505 kernel: [] child_rip+0xa/0x11 Jun 17 09:53:53 ALPL505 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Jun 17 09:53:53 ALPL505 kernel: [] child_rip+0x0/0x11 Jun 17 09:53:53 ALPL505 kernel: Jun 17 10:04:00 ALPL505 kernel: Lustre: 1044:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-207), not sending early reply Jun 17 10:04:00 ALPL505 kernel: req@ffff8102aa02e800 x1381174270187499/t0 o101->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 560/2720 e 5 to 0 dl 1339898645 ref 2 fl Interpret:/0/0 rc 0/0 Jun 17 10:05:56 ALPL505 kernel: Lustre: 608:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 10:05:56 ALPL505 kernel: Lustre: 608:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 13 previous similar messages Jun 17 10:05:56 ALPL505 kernel: Lustre: 608:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 10:05:56 ALPL505 kernel: Lustre: 608:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 12 previous similar messages Jun 17 10:05:56 ALPL505 kernel: LustreError: 608:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff81014ed05800 x1381174270288039/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339898856 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 10:05:56 ALPL505 kernel: LustreError: 608:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 12 previous similar messages Jun 17 10:07:13 ALPL505 kernel: Lustre: 1056:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 10:07:13 ALPL505 kernel: Lustre: 1056:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 10 previous similar messages Jun 17 10:07:13 ALPL505 kernel: Lustre: 1056:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 10:07:13 ALPL505 kernel: Lustre: 1056:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 10 previous similar messages Jun 17 10:07:13 ALPL505 kernel: LustreError: 1056:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810515ee2c00 x1381174270289259/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339898933 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 10:07:13 ALPL505 kernel: LustreError: 1056:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 10 previous similar messages Jun 17 10:09:47 ALPL505 kernel: Lustre: 943:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 10:09:47 ALPL505 kernel: Lustre: 943:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 21 previous similar messages Jun 17 10:09:47 ALPL505 kernel: Lustre: 943:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 10:09:47 ALPL505 kernel: Lustre: 943:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 21 previous similar messages Jun 17 10:09:47 ALPL505 kernel: LustreError: 943:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810595fb4400 x1381174270291803/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339899087 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 10:09:47 ALPL505 kernel: LustreError: 943:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 21 previous similar messages Jun 17 10:14:27 ALPL505 kernel: Lustre: 1057:0:(service.c:1434:ptlrpc_server_handle_request()) @@@ Request x1381174270187499 took longer than estimated (812+622s); client may timeout. req@ffff8102aa02e800 x1381174270187499/t0 o101->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 560/808 e 5 to 0 dl 1339898645 ref 1 fl Complete:/0/0 rc 0/0 Jun 17 10:14:27 ALPL505 kernel: Lustre: Service thread pid 1057 completed after 1434.37s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jun 17 10:42:19 ALPL505 kernel: Lustre: 971:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-126), not sending early reply Jun 17 10:42:19 ALPL505 kernel: req@ffff81062eaec800 x1381174271261071/t0 o101->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 568/2720 e 0 to 0 dl 1339900944 ref 2 fl Interpret:/0/0 rc 0/0 Jun 17 10:42:25 ALPL505 kernel: Lustre: 632:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 10:42:25 ALPL505 kernel: Lustre: 632:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 41 previous similar messages Jun 17 10:42:25 ALPL505 kernel: Lustre: 632:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 10:42:25 ALPL505 kernel: Lustre: 632:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 40 previous similar messages Jun 17 10:42:25 ALPL505 kernel: LustreError: 632:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff81063eaf3400 x1381174271357042/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339901045 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 10:42:25 ALPL505 kernel: LustreError: 632:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 40 previous similar messages Jun 17 10:43:07 ALPL505 kernel: Lustre: 1022:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 10:43:07 ALPL505 kernel: Lustre: 1022:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 5 previous similar messages Jun 17 10:43:07 ALPL505 kernel: Lustre: 1022:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 10:43:07 ALPL505 kernel: Lustre: 1022:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 5 previous similar messages Jun 17 10:43:07 ALPL505 kernel: LustreError: 1022:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff81035d356000 x1381174271373429/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339901087 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 10:43:07 ALPL505 kernel: LustreError: 1022:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 5 previous similar messages Jun 17 10:44:24 ALPL505 kernel: Lustre: 948:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 10:44:24 ALPL505 kernel: Lustre: 948:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 10 previous similar messages Jun 17 10:44:24 ALPL505 kernel: Lustre: 948:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 10:44:24 ALPL505 kernel: Lustre: 948:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 10 previous similar messages Jun 17 10:44:24 ALPL505 kernel: LustreError: 948:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810513b10400 x1381174271374654/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339901164 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 10:44:24 ALPL505 kernel: LustreError: 948:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 10 previous similar messages Jun 17 10:46:58 ALPL505 kernel: Lustre: 1029:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 10:46:58 ALPL505 kernel: Lustre: 1029:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 21 previous similar messages Jun 17 10:46:58 ALPL505 kernel: Lustre: 1029:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 10:46:58 ALPL505 kernel: Lustre: 1029:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 21 previous similar messages Jun 17 10:46:58 ALPL505 kernel: LustreError: 1029:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810404f98000 x1381174271377313/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339901318 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 10:46:58 ALPL505 kernel: LustreError: 1029:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 21 previous similar messages Jun 17 10:49:35 ALPL505 kernel: Lustre: Service thread pid 998 was inactive for 1162.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 17 10:49:35 ALPL505 kernel: Pid: 998, comm: ll_mdt_101 Jun 17 10:49:35 ALPL505 kernel: Jun 17 10:49:35 ALPL505 kernel: Call Trace: Jun 17 10:49:35 ALPL505 kernel: [] __down_failed_trylock+0x35/0x3a Jun 17 10:49:35 ALPL505 kernel: [] ldlm_pool_shrink+0x31/0xf0 [ptlrpc] Jun 17 10:49:35 ALPL505 kernel: [] .text.lock.ldlm_resource+0x7d/0x87 [ptlrpc] Jun 17 10:49:35 ALPL505 kernel: [] ldlm_pools_shrink+0x147/0x2f0 [ptlrpc] Jun 17 10:49:35 ALPL505 kernel: [] __down_read+0x12/0x92 Jun 17 10:49:35 ALPL505 kernel: [] __up_read+0x19/0x7f Jun 17 10:49:35 ALPL505 kernel: [] shrink_slab+0xdc/0x153 Jun 17 10:49:35 ALPL505 kernel: [] zone_reclaim+0x235/0x2cd Jun 17 10:49:35 ALPL505 kernel: [] check_block_validity+0x45/0xa0 [ldiskfs] Jun 17 10:49:35 ALPL505 kernel: [] get_page_from_freelist+0xbf/0x43a Jun 17 10:49:35 ALPL505 kernel: [] __alloc_pages+0x78/0x308 Jun 17 10:49:35 ALPL505 kernel: [] cache_grow+0x133/0x3c1 Jun 17 10:49:35 ALPL505 kernel: [] cache_alloc_refill+0x136/0x186 Jun 17 10:49:35 ALPL505 kernel: [] kmem_cache_alloc+0x6c/0x76 Jun 17 10:49:35 ALPL505 kernel: [] ldiskfs_alloc_inode+0x19/0x150 [ldiskfs] Jun 17 10:49:35 ALPL505 kernel: [] alloc_inode+0x17/0x192 Jun 17 10:49:35 ALPL505 kernel: [] iget_locked+0x6d/0x149 Jun 17 10:49:35 ALPL505 kernel: [] ldiskfs_iget+0x38/0x6f0 [ldiskfs] Jun 17 10:49:35 ALPL505 kernel: [] ldiskfs_lookup+0xbb/0x200 [ldiskfs] Jun 17 10:49:35 ALPL505 kernel: [] __lookup_hash+0x10b/0x12f Jun 17 10:49:35 ALPL505 kernel: [] lookup_one_len+0x53/0x61 Jun 17 10:49:35 ALPL505 kernel: [] mds_lookup+0xa4/0x760 [mds] Jun 17 10:49:35 ALPL505 kernel: [] upcall_cache_get_entry+0x920/0xa50 [lvfs] Jun 17 10:49:35 ALPL505 kernel: [] mds_get_parent_child_locked+0x33f/0x960 [mds] Jun 17 10:49:35 ALPL505 kernel: [] mds_getattr_lock+0x632/0xc90 [mds] Jun 17 10:49:35 ALPL505 kernel: [] fixup_handle_for_resent_req+0x5a/0x2c0 [mds] Jun 17 10:49:35 ALPL505 kernel: [] mds_intent_policy+0x623/0xc20 [mds] Jun 17 10:49:35 ALPL505 kernel: [] ldlm_resource_putref_internal+0x230/0x460 [ptlrpc] Jun 17 10:49:35 ALPL505 kernel: [] ldlm_lock_enqueue+0x186/0xb20 [ptlrpc] Jun 17 10:49:35 ALPL505 kernel: [] ldlm_lock_create+0x9bd/0x9f0 [ptlrpc] Jun 17 10:49:35 ALPL505 kernel: [] ldlm_server_blocking_ast+0x0/0x83d [ptlrpc] Jun 17 10:49:35 ALPL505 kernel: [] ldlm_handle_enqueue+0xc09/0x1210 [ptlrpc] Jun 17 10:49:35 ALPL505 kernel: [] mds_handle+0x40e0/0x4d10 [mds] Jun 17 10:49:35 ALPL505 kernel: [] smp_send_reschedule+0x4e/0x53 Jun 17 10:49:35 ALPL505 kernel: [] enqueue_task+0x41/0x56 Jun 17 10:49:35 ALPL505 kernel: [] lustre_msg_get_conn_cnt+0x35/0xf0 [ptlrpc] Jun 17 10:49:35 ALPL505 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Jun 17 10:49:35 ALPL505 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Jun 17 10:49:35 ALPL505 kernel: [] __wake_up_common+0x3e/0x68 Jun 17 10:49:35 ALPL505 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Jun 17 10:49:35 ALPL505 kernel: [] child_rip+0xa/0x11 Jun 17 10:49:35 ALPL505 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Jun 17 10:49:35 ALPL505 kernel: [] child_rip+0x0/0x11 Jun 17 10:49:35 ALPL505 kernel: Jun 17 10:51:59 ALPL505 kernel: Lustre: 614:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 10:51:59 ALPL505 kernel: Lustre: 614:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 42 previous similar messages Jun 17 10:51:59 ALPL505 kernel: Lustre: 614:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 10:51:59 ALPL505 kernel: Lustre: 614:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 42 previous similar messages Jun 17 10:51:59 ALPL505 kernel: LustreError: 614:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810306b27000 x1381174271382445/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339901619 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 10:51:59 ALPL505 kernel: LustreError: 614:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 42 previous similar messages Jun 17 11:02:01 ALPL505 kernel: Lustre: 1026:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 11:02:01 ALPL505 kernel: Lustre: 1026:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 85 previous similar messages Jun 17 11:02:01 ALPL505 kernel: Lustre: 1026:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 11:02:01 ALPL505 kernel: Lustre: 1026:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 85 previous similar messages Jun 17 11:02:01 ALPL505 kernel: LustreError: 1026:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff81021ba73000 x1381174271393096/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339902221 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 11:02:01 ALPL505 kernel: LustreError: 1026:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 85 previous similar messages Jun 17 11:12:03 ALPL505 kernel: Lustre: 993:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 11:12:03 ALPL505 kernel: Lustre: 993:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 85 previous similar messages Jun 17 11:12:03 ALPL505 kernel: Lustre: 993:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 11:12:03 ALPL505 kernel: Lustre: 993:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 85 previous similar messages Jun 17 11:12:03 ALPL505 kernel: LustreError: 993:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810635c04400 x1381174271403197/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339902823 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 11:12:03 ALPL505 kernel: LustreError: 993:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 85 previous similar messages Jun 17 11:22:05 ALPL505 kernel: Lustre: 1022:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 11:22:05 ALPL505 kernel: Lustre: 1022:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 85 previous similar messages Jun 17 11:22:05 ALPL505 kernel: Lustre: 1022:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 11:22:05 ALPL505 kernel: Lustre: 1022:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 85 previous similar messages Jun 17 11:22:05 ALPL505 kernel: LustreError: 1022:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff81063bfdc400 x1381174271413974/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339903425 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 11:22:05 ALPL505 kernel: LustreError: 1022:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 85 previous similar messages Jun 17 11:32:07 ALPL505 kernel: Lustre: 973:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 11:32:07 ALPL505 kernel: Lustre: 973:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 85 previous similar messages Jun 17 11:32:07 ALPL505 kernel: Lustre: 973:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 11:32:07 ALPL505 kernel: Lustre: 973:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 85 previous similar messages Jun 17 11:32:07 ALPL505 kernel: LustreError: 973:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105f9eca800 x1381174271424154/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339904027 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 11:32:07 ALPL505 kernel: LustreError: 973:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 85 previous similar messages Jun 17 11:42:09 ALPL505 kernel: Lustre: 612:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 11:42:09 ALPL505 kernel: Lustre: 612:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 85 previous similar messages Jun 17 11:42:09 ALPL505 kernel: Lustre: 612:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 11:42:09 ALPL505 kernel: Lustre: 612:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 85 previous similar messages Jun 17 11:42:09 ALPL505 kernel: LustreError: 612:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810513b10000 x1381174271436070/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339904629 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 11:42:09 ALPL505 kernel: LustreError: 612:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 85 previous similar messages Jun 17 11:52:11 ALPL505 kernel: Lustre: 993:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 11:52:11 ALPL505 kernel: Lustre: 993:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 85 previous similar messages Jun 17 11:52:11 ALPL505 kernel: Lustre: 993:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 11:52:11 ALPL505 kernel: Lustre: 993:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 85 previous similar messages Jun 17 11:52:11 ALPL505 kernel: LustreError: 993:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff81051c6c4400 x1381174271451873/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339905231 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 11:52:11 ALPL505 kernel: LustreError: 993:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 85 previous similar messages Jun 17 12:02:13 ALPL505 kernel: Lustre: 611:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 12:02:13 ALPL505 kernel: Lustre: 611:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 85 previous similar messages Jun 17 12:02:13 ALPL505 kernel: Lustre: 611:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 12:02:13 ALPL505 kernel: Lustre: 611:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 85 previous similar messages Jun 17 12:02:13 ALPL505 kernel: LustreError: 611:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810278749400 x1381174271463299/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339905833 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 12:02:13 ALPL505 kernel: LustreError: 611:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 85 previous similar messages Jun 17 12:09:52 ALPL505 kernel: Lustre: 998:0:(service.c:1434:ptlrpc_server_handle_request()) @@@ Request x1381174271261071 took longer than estimated (731+5248s); client may timeout. req@ffff81062eaec800 x1381174271261071/t0 o101->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 568/808 e 0 to 0 dl 1339900944 ref 1 fl Complete:/0/0 rc 0/0 Jun 17 12:09:52 ALPL505 kernel: Lustre: Service thread pid 998 completed after 5979.20s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jun 17 12:27:04 ALPL505 kernel: Lustre: 932:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Jun 17 12:27:04 ALPL505 kernel: req@ffff810224f76000 x1381174272008911/t0 o101->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 568/2720 e 0 to 0 dl 1339907229 ref 2 fl Interpret:/0/0 rc 0/0 Jun 17 12:27:10 ALPL505 kernel: Lustre: 627:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 12:27:10 ALPL505 kernel: Lustre: 627:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 66 previous similar messages Jun 17 12:27:10 ALPL505 kernel: Lustre: 627:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 12:27:10 ALPL505 kernel: Lustre: 627:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 65 previous similar messages Jun 17 12:27:10 ALPL505 kernel: LustreError: 627:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8102866b4c00 x1381174272083222/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339907330 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 12:27:10 ALPL505 kernel: LustreError: 627:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 65 previous similar messages Jun 17 12:28:27 ALPL505 kernel: Lustre: 946:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 12:28:27 ALPL505 kernel: Lustre: 946:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 10 previous similar messages Jun 17 12:28:27 ALPL505 kernel: Lustre: 946:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 12:28:27 ALPL505 kernel: Lustre: 946:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 10 previous similar messages Jun 17 12:28:27 ALPL505 kernel: LustreError: 946:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8103d82a6c00 x1381174272084510/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339907407 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 12:28:27 ALPL505 kernel: LustreError: 946:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 10 previous similar messages Jun 17 12:31:01 ALPL505 kernel: Lustre: 1019:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 12:31:01 ALPL505 kernel: Lustre: 1019:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 21 previous similar messages Jun 17 12:31:01 ALPL505 kernel: Lustre: 1019:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 12:31:01 ALPL505 kernel: Lustre: 1019:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 21 previous similar messages Jun 17 12:31:01 ALPL505 kernel: LustreError: 1019:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8102866b4c00 x1381174272099536/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339907561 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 12:31:01 ALPL505 kernel: LustreError: 1019:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 21 previous similar messages Jun 17 12:34:34 ALPL505 kernel: Lustre: Service thread pid 611 was inactive for 1200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 17 12:34:34 ALPL505 kernel: Pid: 611, comm: ll_mdt_09 Jun 17 12:34:34 ALPL505 kernel: Jun 17 12:34:34 ALPL505 kernel: Call Trace: Jun 17 12:34:34 ALPL505 kernel: [] __down_failed_trylock+0x35/0x3a Jun 17 12:34:34 ALPL505 kernel: [] ldlm_pool_shrink+0x64/0xf0 [ptlrpc] Jun 17 12:34:34 ALPL505 kernel: [] .text.lock.ldlm_resource+0x7d/0x87 [ptlrpc] Jun 17 12:34:34 ALPL505 kernel: [] ldlm_pools_shrink+0x183/0x2f0 [ptlrpc] Jun 17 12:34:34 ALPL505 kernel: [] shrink_slab+0xdc/0x153 Jun 17 12:34:34 ALPL505 kernel: [] zone_reclaim+0x235/0x2cd Jun 17 12:34:34 ALPL505 kernel: [] check_block_validity+0x45/0xa0 [ldiskfs] Jun 17 12:34:34 ALPL505 kernel: [] get_page_from_freelist+0xbf/0x43a Jun 17 12:34:34 ALPL505 kernel: [] __alloc_pages+0x78/0x308 Jun 17 12:34:34 ALPL505 kernel: [] cache_grow+0x133/0x3c1 Jun 17 12:34:34 ALPL505 kernel: [] cache_alloc_refill+0x136/0x186 Jun 17 12:34:34 ALPL505 kernel: [] kmem_cache_alloc+0x6c/0x76 Jun 17 12:34:34 ALPL505 kernel: [] ldiskfs_alloc_inode+0x19/0x150 [ldiskfs] Jun 17 12:34:34 ALPL505 kernel: [] alloc_inode+0x17/0x192 Jun 17 12:34:34 ALPL505 kernel: [] iget_locked+0x6d/0x149 Jun 17 12:34:34 ALPL505 kernel: [] ldiskfs_iget+0x38/0x6f0 [ldiskfs] Jun 17 12:34:34 ALPL505 kernel: [] ldiskfs_lookup+0xbb/0x200 [ldiskfs] Jun 17 12:34:34 ALPL505 kernel: [] __lookup_hash+0x10b/0x12f Jun 17 12:34:34 ALPL505 kernel: [] lookup_one_len+0x53/0x61 Jun 17 12:34:34 ALPL505 kernel: [] mds_lookup+0xa4/0x760 [mds] Jun 17 12:34:34 ALPL505 kernel: [] upcall_cache_get_entry+0x920/0xa50 [lvfs] Jun 17 12:34:34 ALPL505 kernel: [] mds_get_parent_child_locked+0x33f/0x960 [mds] Jun 17 12:34:34 ALPL505 kernel: [] mds_getattr_lock+0x632/0xc90 [mds] Jun 17 12:34:34 ALPL505 kernel: [] fixup_handle_for_resent_req+0x5a/0x2c0 [mds] Jun 17 12:34:34 ALPL505 kernel: [] mds_intent_policy+0x623/0xc20 [mds] Jun 17 12:34:34 ALPL505 kernel: [] ldlm_resource_putref_internal+0x230/0x460 [ptlrpc] Jun 17 12:34:34 ALPL505 kernel: [] ldlm_lock_enqueue+0x186/0xb20 [ptlrpc] Jun 17 12:34:34 ALPL505 kernel: [] ldlm_lock_create+0x9bd/0x9f0 [ptlrpc] Jun 17 12:34:34 ALPL505 kernel: [] ldlm_server_blocking_ast+0x0/0x83d [ptlrpc] Jun 17 12:34:34 ALPL505 kernel: [] ldlm_handle_enqueue+0xc09/0x1210 [ptlrpc] Jun 17 12:34:34 ALPL505 kernel: [] mds_handle+0x40e0/0x4d10 [mds] Jun 17 12:34:34 ALPL505 kernel: [] smp_send_reschedule+0x4e/0x53 Jun 17 12:34:34 ALPL505 kernel: [] enqueue_task+0x41/0x56 Jun 17 12:34:34 ALPL505 kernel: [] lustre_msg_get_conn_cnt+0x35/0xf0 [ptlrpc] Jun 17 12:34:34 ALPL505 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Jun 17 12:34:34 ALPL505 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Jun 17 12:34:34 ALPL505 kernel: [] __wake_up_common+0x3e/0x68 Jun 17 12:34:34 ALPL505 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Jun 17 12:34:34 ALPL505 kernel: [] child_rip+0xa/0x11 Jun 17 12:34:34 ALPL505 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Jun 17 12:34:34 ALPL505 kernel: [] child_rip+0x0/0x11 Jun 17 12:34:34 ALPL505 kernel: Jun 17 12:36:02 ALPL505 kernel: Lustre: 938:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 12:36:02 ALPL505 kernel: Lustre: 938:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 42 previous similar messages Jun 17 12:36:02 ALPL505 kernel: Lustre: 938:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 12:36:02 ALPL505 kernel: Lustre: 938:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 42 previous similar messages Jun 17 12:36:02 ALPL505 kernel: LustreError: 938:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff81062d96ac00 x1381174272105865/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339907862 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 12:36:02 ALPL505 kernel: LustreError: 938:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 42 previous similar messages Jun 17 12:45:07 ALPL505 kernel: Lustre: 611:0:(service.c:1434:ptlrpc_server_handle_request()) @@@ Request x1381174272008911 took longer than estimated (755+1078s); client may timeout. req@ffff810224f76000 x1381174272008911/t0 o101->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 568/808 e 0 to 0 dl 1339907229 ref 1 fl Complete:/0/0 rc 0/0 Jun 17 12:45:07 ALPL505 kernel: Lustre: Service thread pid 611 completed after 1833.02s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jun 17 13:06:13 ALPL505 kernel: Lustre: Service thread pid 615 was inactive for 576.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 17 13:06:13 ALPL505 kernel: Pid: 615, comm: ll_mdt_13 Jun 17 13:06:13 ALPL505 kernel: Jun 17 13:06:13 ALPL505 kernel: Call Trace: Jun 17 13:06:13 ALPL505 kernel: [] __down_trylock+0x39/0x4e Jun 17 13:06:13 ALPL505 kernel: [] __down_failed_trylock+0x35/0x3a Jun 17 13:06:13 ALPL505 kernel: [] ldlm_pool_shrink+0x31/0xf0 [ptlrpc] Jun 17 13:06:13 ALPL505 kernel: [] .text.lock.ldlm_resource+0x73/0x87 [ptlrpc] Jun 17 13:06:13 ALPL505 kernel: [] ldlm_pools_shrink+0x29f/0x2f0 [ptlrpc] Jun 17 13:06:13 ALPL505 kernel: [] __down_read+0x12/0x92 Jun 17 13:06:13 ALPL505 kernel: [] __up_read+0x19/0x7f Jun 17 13:06:13 ALPL505 kernel: [] shrink_slab+0xd0/0x153 Jun 17 13:06:13 ALPL505 kernel: [] zone_reclaim+0x235/0x2cd Jun 17 13:06:13 ALPL505 kernel: [] check_block_validity+0x45/0xa0 [ldiskfs] Jun 17 13:06:13 ALPL505 kernel: [] get_page_from_freelist+0xbf/0x43a Jun 17 13:06:13 ALPL505 kernel: [] __alloc_pages+0x78/0x308 Jun 17 13:06:13 ALPL505 kernel: [] cache_grow+0x133/0x3c1 Jun 17 13:06:13 ALPL505 kernel: [] cache_alloc_refill+0x136/0x186 Jun 17 13:06:13 ALPL505 kernel: [] kmem_cache_alloc+0x6c/0x76 Jun 17 13:06:13 ALPL505 kernel: [] ldiskfs_alloc_inode+0x19/0x150 [ldiskfs] Jun 17 13:06:13 ALPL505 kernel: [] alloc_inode+0x17/0x192 Jun 17 13:06:13 ALPL505 kernel: [] iget_locked+0x6d/0x149 Jun 17 13:06:13 ALPL505 kernel: [] ldiskfs_iget+0x38/0x6f0 [ldiskfs] Jun 17 13:06:13 ALPL505 kernel: [] ldiskfs_lookup+0xbb/0x200 [ldiskfs] Jun 17 13:06:13 ALPL505 kernel: [] __lookup_hash+0x10b/0x12f Jun 17 13:06:13 ALPL505 kernel: [] lookup_one_len+0x53/0x61 Jun 17 13:06:13 ALPL505 kernel: [] mds_lookup+0xa4/0x760 [mds] Jun 17 13:06:13 ALPL505 kernel: [] upcall_cache_get_entry+0x920/0xa50 [lvfs] Jun 17 13:06:13 ALPL505 kernel: [] mds_get_parent_child_locked+0x33f/0x960 [mds] Jun 17 13:06:13 ALPL505 kernel: [] ldlm_lock_remove_from_lru+0x74/0xe0 [ptlrpc] Jun 17 13:06:13 ALPL505 kernel: [] lock_res_and_lock+0xba/0xd0 [ptlrpc] Jun 17 13:06:13 ALPL505 kernel: [] mds_getattr_lock+0x632/0xc90 [mds] Jun 17 13:06:13 ALPL505 kernel: [] fixup_handle_for_resent_req+0x5a/0x2c0 [mds] Jun 17 13:06:13 ALPL505 kernel: [] mds_intent_policy+0x623/0xc20 [mds] Jun 17 13:06:13 ALPL505 kernel: [] ldlm_resource_putref_internal+0x230/0x460 [ptlrpc] Jun 17 13:06:13 ALPL505 kernel: [] ldlm_lock_enqueue+0x186/0xb20 [ptlrpc] Jun 17 13:06:13 ALPL505 kernel: [] ldlm_lock_create+0x9bd/0x9f0 [ptlrpc] Jun 17 13:06:13 ALPL505 kernel: [] ldlm_server_blocking_ast+0x0/0x83d [ptlrpc] Jun 17 13:06:13 ALPL505 kernel: [] ldlm_handle_enqueue+0xc09/0x1210 [ptlrpc] Jun 17 13:06:13 ALPL505 kernel: [] mds_handle+0x40e0/0x4d10 [mds] Jun 17 13:06:13 ALPL505 kernel: [] smp_send_reschedule+0x4e/0x53 Jun 17 13:06:13 ALPL505 kernel: [] enqueue_task+0x41/0x56 Jun 17 13:06:13 ALPL505 kernel: [] lustre_msg_get_conn_cnt+0x35/0xf0 [ptlrpc] Jun 17 13:06:13 ALPL505 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Jun 17 13:06:13 ALPL505 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Jun 17 13:06:13 ALPL505 kernel: [] __wake_up_common+0x3e/0x68 Jun 17 13:06:13 ALPL505 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Jun 17 13:06:13 ALPL505 kernel: [] child_rip+0xa/0x11 Jun 17 13:06:13 ALPL505 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Jun 17 13:06:13 ALPL505 kernel: [] child_rip+0x0/0x11 Jun 17 13:06:13 ALPL505 kernel: Jun 17 13:09:02 ALPL505 kernel: Lustre: 630:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-145), not sending early reply Jun 17 13:09:02 ALPL505 kernel: req@ffff8104d6da4400 x1381174272536365/t0 o101->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 560/2720 e 1 to 0 dl 1339909747 ref 2 fl Interpret:/0/0 rc 0/0 Jun 17 13:10:50 ALPL505 kernel: Lustre: 617:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 13:10:50 ALPL505 kernel: Lustre: 617:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 78 previous similar messages Jun 17 13:10:50 ALPL505 kernel: Lustre: 617:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 13:10:50 ALPL505 kernel: Lustre: 617:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 77 previous similar messages Jun 17 13:10:50 ALPL505 kernel: LustreError: 617:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff81017e91b400 x1381174272666612/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339909950 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 13:10:50 ALPL505 kernel: LustreError: 617:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 77 previous similar messages Jun 17 13:12:07 ALPL505 kernel: Lustre: 628:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 13:12:07 ALPL505 kernel: Lustre: 628:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 10 previous similar messages Jun 17 13:12:07 ALPL505 kernel: Lustre: 628:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 13:12:07 ALPL505 kernel: Lustre: 628:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 10 previous similar messages Jun 17 13:12:07 ALPL505 kernel: LustreError: 628:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810235582400 x1381174272668692/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339910027 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 13:12:07 ALPL505 kernel: LustreError: 628:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 10 previous similar messages Jun 17 13:14:41 ALPL505 kernel: Lustre: 928:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 13:14:41 ALPL505 kernel: Lustre: 928:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 21 previous similar messages Jun 17 13:14:41 ALPL505 kernel: Lustre: 928:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 13:14:41 ALPL505 kernel: Lustre: 928:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 21 previous similar messages Jun 17 13:14:41 ALPL505 kernel: LustreError: 928:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff81043add1400 x1381174272671999/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339910181 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 13:14:41 ALPL505 kernel: LustreError: 928:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 21 previous similar messages Jun 17 13:19:42 ALPL505 kernel: Lustre: 968:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 13:19:42 ALPL505 kernel: Lustre: 968:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 42 previous similar messages Jun 17 13:19:42 ALPL505 kernel: Lustre: 968:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 13:19:42 ALPL505 kernel: Lustre: 968:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 42 previous similar messages Jun 17 13:19:42 ALPL505 kernel: LustreError: 968:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8101a7d91000 x1381174272677143/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339910482 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 13:19:42 ALPL505 kernel: LustreError: 968:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 42 previous similar messages Jun 17 13:24:22 ALPL505 kernel: Lustre: 615:0:(service.c:1434:ptlrpc_server_handle_request()) @@@ Request x1381174272536365 took longer than estimated (750+915s); client may timeout. req@ffff8104d6da4400 x1381174272536365/t0 o101->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 560/808 e 1 to 0 dl 1339909747 ref 1 fl Complete:/0/0 rc 0/0 Jun 17 13:24:22 ALPL505 kernel: Lustre: Service thread pid 615 completed after 1665.60s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jun 17 13:41:23 ALPL505 kernel: Lustre: 989:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Jun 17 13:41:23 ALPL505 kernel: req@ffff8102595da050 x1381174273247657/t0 o101->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 552/2720 e 0 to 0 dl 1339911688 ref 2 fl Interpret:/0/0 rc 0/0 Jun 17 13:41:29 ALPL505 kernel: Lustre: 987:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 13:41:29 ALPL505 kernel: Lustre: 987:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 40 previous similar messages Jun 17 13:41:29 ALPL505 kernel: Lustre: 987:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 13:41:29 ALPL505 kernel: Lustre: 987:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 39 previous similar messages Jun 17 13:41:29 ALPL505 kernel: LustreError: 987:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8101db967c00 x1381174273356520/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339911789 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 13:41:29 ALPL505 kernel: LustreError: 987:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 39 previous similar messages Jun 17 13:42:46 ALPL505 kernel: Lustre: 1057:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 13:42:46 ALPL505 kernel: Lustre: 1057:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 10 previous similar messages Jun 17 13:42:46 ALPL505 kernel: Lustre: 1057:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 13:42:46 ALPL505 kernel: Lustre: 1057:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 10 previous similar messages Jun 17 13:42:46 ALPL505 kernel: LustreError: 1057:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff81062e7e5800 x1381174273357841/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339911866 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 13:42:46 ALPL505 kernel: LustreError: 1057:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 10 previous similar messages Jun 17 13:45:20 ALPL505 kernel: Lustre: 628:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 13:45:20 ALPL505 kernel: Lustre: 628:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 21 previous similar messages Jun 17 13:45:20 ALPL505 kernel: Lustre: 628:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 13:45:20 ALPL505 kernel: Lustre: 628:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 21 previous similar messages Jun 17 13:45:20 ALPL505 kernel: LustreError: 628:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810637fa5c00 x1381174273360570/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339912020 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 13:45:20 ALPL505 kernel: LustreError: 628:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 21 previous similar messages Jun 17 13:48:53 ALPL505 kernel: Lustre: Service thread pid 1052 was inactive for 1200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 17 13:48:53 ALPL505 kernel: Pid: 1052, comm: ll_mdt_124 Jun 17 13:48:53 ALPL505 kernel: Jun 17 13:48:53 ALPL505 kernel: Call Trace: Jun 17 13:48:53 ALPL505 kernel: [] lcw_cb+0x0/0x460 [libcfs] Jun 17 13:48:53 ALPL505 kernel: [] libcfs_debug_dumpstack+0x51/0x60 [libcfs] Jun 17 13:48:53 ALPL505 kernel: [] lcw_cb+0x33c/0x460 [libcfs] Jun 17 13:48:53 ALPL505 kernel: [] run_timer_softirq+0x193/0x241 Jun 17 13:48:53 ALPL505 kernel: [] __do_softirq+0x89/0x133 Jun 17 13:48:53 ALPL505 kernel: [] call_softirq+0x1c/0x28 Jun 17 13:48:53 ALPL505 kernel: [] do_softirq+0x2c/0x7d Jun 17 13:48:53 ALPL505 kernel: [] apic_timer_interrupt+0x66/0x6c Jun 17 13:48:53 ALPL505 kernel: [] _spin_unlock_irqrestore+0x8/0x9 Jun 17 13:48:53 ALPL505 kernel: [] __down_trylock+0x44/0x4e Jun 17 13:48:53 ALPL505 kernel: [] __down_failed_trylock+0x35/0x3a Jun 17 13:48:53 ALPL505 kernel: [] .text.lock.ldlm_resource+0x73/0x87 [ptlrpc] Jun 17 13:48:53 ALPL505 kernel: [] ldlm_pools_shrink+0x247/0x2f0 [ptlrpc] Jun 17 13:48:53 ALPL505 kernel: [] __down_read+0x12/0x92 Jun 17 13:48:53 ALPL505 kernel: [] __up_read+0x19/0x7f Jun 17 13:48:53 ALPL505 kernel: [] shrink_slab+0xdc/0x153 Jun 17 13:48:53 ALPL505 kernel: [] zone_reclaim+0x235/0x2cd Jun 17 13:48:53 ALPL505 kernel: [] check_block_validity+0x45/0xa0 [ldiskfs] Jun 17 13:48:53 ALPL505 kernel: [] get_page_from_freelist+0xbf/0x43a Jun 17 13:48:53 ALPL505 kernel: [] __alloc_pages+0x78/0x308 Jun 17 13:48:53 ALPL505 kernel: [] cache_grow+0x133/0x3c1 Jun 17 13:48:53 ALPL505 kernel: [] cache_alloc_refill+0x136/0x186 Jun 17 13:48:53 ALPL505 kernel: [] kmem_cache_alloc+0x6c/0x76 Jun 17 13:48:53 ALPL505 kernel: [] ldiskfs_alloc_inode+0x19/0x150 [ldiskfs] Jun 17 13:48:53 ALPL505 kernel: [] alloc_inode+0x17/0x192 Jun 17 13:48:53 ALPL505 kernel: [] iget_locked+0x6d/0x149 Jun 17 13:48:53 ALPL505 kernel: [] ldiskfs_iget+0x38/0x6f0 [ldiskfs] Jun 17 13:48:53 ALPL505 kernel: [] ldiskfs_lookup+0xbb/0x200 [ldiskfs] Jun 17 13:48:53 ALPL505 kernel: [] __lookup_hash+0x10b/0x12f Jun 17 13:48:53 ALPL505 kernel: [] lookup_one_len+0x53/0x61 Jun 17 13:48:53 ALPL505 kernel: [] mds_lookup+0xa4/0x760 [mds] Jun 17 13:48:53 ALPL505 kernel: [] upcall_cache_get_entry+0x920/0xa50 [lvfs] Jun 17 13:48:53 ALPL505 kernel: [] mds_get_parent_child_locked+0x33f/0x960 [mds] Jun 17 13:48:53 ALPL505 kernel: [] ldlm_lock_remove_from_lru+0x74/0xe0 [ptlrpc] Jun 17 13:48:53 ALPL505 kernel: [] lock_res_and_lock+0xba/0xd0 [ptlrpc] Jun 17 13:48:53 ALPL505 kernel: [] mds_getattr_lock+0x632/0xc90 [mds] Jun 17 13:48:53 ALPL505 kernel: [] fixup_handle_for_resent_req+0x5a/0x2c0 [mds] Jun 17 13:48:53 ALPL505 kernel: [] mds_intent_policy+0x623/0xc20 [mds] Jun 17 13:48:53 ALPL505 kernel: [] ldlm_resource_putref_internal+0x230/0x460 [ptlrpc] Jun 17 13:48:53 ALPL505 kernel: [] ldlm_lock_enqueue+0x186/0xb20 [ptlrpc] Jun 17 13:48:53 ALPL505 kernel: [] ldlm_lock_create+0x9bd/0x9f0 [ptlrpc] Jun 17 13:48:53 ALPL505 kernel: [] ldlm_server_blocking_ast+0x0/0x83d [ptlrpc] Jun 17 13:48:53 ALPL505 kernel: [] ldlm_handle_enqueue+0xc09/0x1210 [ptlrpc] Jun 17 13:48:53 ALPL505 kernel: [] mds_handle+0x40e0/0x4d10 [mds] Jun 17 13:48:53 ALPL505 kernel: [] smp_send_reschedule+0x4e/0x53 Jun 17 13:48:53 ALPL505 kernel: [] enqueue_task+0x41/0x56 Jun 17 13:48:53 ALPL505 kernel: [] lustre_msg_get_conn_cnt+0x35/0xf0 [ptlrpc] Jun 17 13:48:53 ALPL505 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Jun 17 13:48:53 ALPL505 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Jun 17 13:48:53 ALPL505 kernel: [] __wake_up_common+0x3e/0x68 Jun 17 13:48:53 ALPL505 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Jun 17 13:48:53 ALPL505 kernel: [] child_rip+0xa/0x11 Jun 17 13:48:53 ALPL505 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Jun 17 13:48:53 ALPL505 kernel: [] child_rip+0x0/0x11 Jun 17 13:48:53 ALPL505 kernel: Jun 17 13:50:21 ALPL505 kernel: Lustre: 978:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 13:50:21 ALPL505 kernel: Lustre: 978:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 42 previous similar messages Jun 17 13:50:21 ALPL505 kernel: Lustre: 978:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 13:50:21 ALPL505 kernel: Lustre: 978:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 42 previous similar messages Jun 17 13:50:21 ALPL505 kernel: LustreError: 978:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff81013166e800 x1381174273378475/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339912321 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 13:50:21 ALPL505 kernel: LustreError: 978:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 42 previous similar messages Jun 17 14:00:23 ALPL505 kernel: Lustre: 941:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 14:00:23 ALPL505 kernel: Lustre: 941:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 85 previous similar messages Jun 17 14:00:23 ALPL505 kernel: Lustre: 941:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 14:00:23 ALPL505 kernel: Lustre: 941:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 85 previous similar messages Jun 17 14:00:23 ALPL505 kernel: LustreError: 941:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff81063b6eb800 x1381174273389115/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339912923 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 14:00:23 ALPL505 kernel: LustreError: 941:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 85 previous similar messages Jun 17 14:10:25 ALPL505 kernel: Lustre: 612:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 14:10:25 ALPL505 kernel: Lustre: 612:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 85 previous similar messages Jun 17 14:10:25 ALPL505 kernel: Lustre: 612:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 14:10:25 ALPL505 kernel: Lustre: 612:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 85 previous similar messages Jun 17 14:10:25 ALPL505 kernel: LustreError: 612:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff81062cbf3c00 x1381174273400558/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339913525 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 14:10:25 ALPL505 kernel: LustreError: 612:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 85 previous similar messages Jun 17 14:20:27 ALPL505 kernel: Lustre: 620:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 14:20:27 ALPL505 kernel: Lustre: 620:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 85 previous similar messages Jun 17 14:20:27 ALPL505 kernel: Lustre: 620:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 14:20:27 ALPL505 kernel: Lustre: 620:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 85 previous similar messages Jun 17 14:20:27 ALPL505 kernel: LustreError: 620:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff81063481c800 x1381174273415181/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339914127 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 14:20:27 ALPL505 kernel: LustreError: 620:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 85 previous similar messages Jun 17 14:30:29 ALPL505 kernel: Lustre: 607:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 14:30:29 ALPL505 kernel: Lustre: 607:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 85 previous similar messages Jun 17 14:30:29 ALPL505 kernel: Lustre: 607:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 14:30:29 ALPL505 kernel: Lustre: 607:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 85 previous similar messages Jun 17 14:30:29 ALPL505 kernel: LustreError: 607:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810318d10c50 x1381174273427869/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339914729 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 14:30:29 ALPL505 kernel: LustreError: 607:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 85 previous similar messages Jun 17 14:40:31 ALPL505 kernel: Lustre: 1053:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 14:40:31 ALPL505 kernel: Lustre: 1053:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 85 previous similar messages Jun 17 14:40:31 ALPL505 kernel: Lustre: 1053:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 14:40:31 ALPL505 kernel: Lustre: 1053:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 85 previous similar messages Jun 17 14:40:31 ALPL505 kernel: LustreError: 1053:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff81051a73dc00 x1381174273440169/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339915331 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 14:40:31 ALPL505 kernel: LustreError: 1053:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 85 previous similar messages Jun 17 14:50:33 ALPL505 kernel: Lustre: 1020:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 14:50:33 ALPL505 kernel: Lustre: 1020:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 85 previous similar messages Jun 17 14:50:33 ALPL505 kernel: Lustre: 1020:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 14:50:33 ALPL505 kernel: Lustre: 1020:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 85 previous similar messages Jun 17 14:50:33 ALPL505 kernel: LustreError: 1020:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8101897c3800 x1381174273454912/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339915933 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 14:50:33 ALPL505 kernel: LustreError: 1020:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 85 previous similar messages Jun 17 15:00:35 ALPL505 kernel: Lustre: 1026:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 15:00:35 ALPL505 kernel: Lustre: 1026:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 85 previous similar messages Jun 17 15:00:35 ALPL505 kernel: Lustre: 1026:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 15:00:35 ALPL505 kernel: Lustre: 1026:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 85 previous similar messages Jun 17 15:00:35 ALPL505 kernel: LustreError: 1026:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff81063e254800 x1381174273468313/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339916535 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 15:00:35 ALPL505 kernel: LustreError: 1026:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 85 previous similar messages Jun 17 15:10:37 ALPL505 kernel: Lustre: 961:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 15:10:37 ALPL505 kernel: Lustre: 961:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 85 previous similar messages Jun 17 15:10:37 ALPL505 kernel: Lustre: 961:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 15:10:37 ALPL505 kernel: Lustre: 961:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 85 previous similar messages Jun 17 15:10:37 ALPL505 kernel: LustreError: 961:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8103009d8400 x1381174273479937/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339917137 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 15:10:37 ALPL505 kernel: LustreError: 961:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 85 previous similar messages Jun 17 15:20:39 ALPL505 kernel: Lustre: 953:0:(ldlm_lib.c:574:target_handle_reconnect()) LFS05-MDT0000: 1d5ac532-1b11-3731-d5a7-a7567aee1188 reconnecting Jun 17 15:20:39 ALPL505 kernel: Lustre: 953:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 85 previous similar messages Jun 17 15:20:39 ALPL505 kernel: Lustre: 953:0:(ldlm_lib.c:874:target_handle_connect()) LFS05-MDT0000: refuse reconnection from 1d5ac532-1b11-3731-d5a7-a7567aee1188@10.3.5.66@o2ib to 0xffff810310066200; still busy with 1 active RPCs Jun 17 15:20:39 ALPL505 kernel: Lustre: 953:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 85 previous similar messages Jun 17 15:20:39 ALPL505 kernel: LustreError: 953:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff81063f532c00 x1381174273490776/t0 o38->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 368/264 e 0 to 0 dl 1339917739 ref 1 fl Interpret:/0/0 rc -16/0 Jun 17 15:20:39 ALPL505 kernel: LustreError: 953:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 85 previous similar messages Jun 17 15:27:04 ALPL505 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.3.5.66@o2ib ns: mds-LFS05-MDT0000_UUID lock: ffff8101593d7a00/0x1e6d8ca96468b232 lrc: 3/0,0 mode: CR/CR res: 2012841/2646490682 bits 0x3 rrc: 2 type: IBT flags: 0x4000020 remote: 0x7d640213f4f67cd1 expref: 4812 pid: 630 timeout: 20554587017 Jun 17 15:30:42 ALPL505 kernel: Lustre: Service thread pid 993 was inactive for 218.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 17 15:30:42 ALPL505 kernel: Pid: 993, comm: ll_mdt_98 Jun 17 15:30:42 ALPL505 kernel: Jun 17 15:30:42 ALPL505 kernel: Call Trace: Jun 17 15:30:42 ALPL505 kernel: [] dput+0x2c/0x113 Jun 17 15:30:42 ALPL505 kernel: [] __mutex_lock_slowpath+0x60/0x9b Jun 17 15:30:42 ALPL505 kernel: [] .text.lock.mutex+0xf/0x14 Jun 17 15:30:42 ALPL505 kernel: [] mds_lookup+0x97/0x760 [mds] Jun 17 15:30:42 ALPL505 kernel: [] upcall_cache_get_entry+0x920/0xa50 [lvfs] Jun 17 15:30:42 ALPL505 kernel: [] mds_get_parent_child_locked+0x33f/0x960 [mds] Jun 17 15:30:42 ALPL505 kernel: [] mds_getattr_lock+0x632/0xc90 [mds] Jun 17 15:30:42 ALPL505 kernel: [] fixup_handle_for_resent_req+0x5a/0x2c0 [mds] Jun 17 15:30:42 ALPL505 kernel: [] mds_intent_policy+0x623/0xc20 [mds] Jun 17 15:30:42 ALPL505 kernel: [] ldlm_resource_putref_internal+0x230/0x460 [ptlrpc] Jun 17 15:30:42 ALPL505 kernel: [] ldlm_lock_enqueue+0x186/0xb20 [ptlrpc] Jun 17 15:30:42 ALPL505 kernel: [] ldlm_lock_create+0x9bd/0x9f0 [ptlrpc] Jun 17 15:30:42 ALPL505 kernel: [] ldlm_server_blocking_ast+0x0/0x83d [ptlrpc] Jun 17 15:30:42 ALPL505 kernel: [] ldlm_handle_enqueue+0xc09/0x1210 [ptlrpc] Jun 17 15:30:42 ALPL505 kernel: [] mds_handle+0x40e0/0x4d10 [mds] Jun 17 15:30:42 ALPL505 kernel: [] smp_send_reschedule+0x4e/0x53 Jun 17 15:30:42 ALPL505 kernel: [] enqueue_task+0x41/0x56 Jun 17 15:30:42 ALPL505 kernel: [] lustre_msg_get_conn_cnt+0x35/0xf0 [ptlrpc] Jun 17 15:30:42 ALPL505 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Jun 17 15:30:42 ALPL505 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Jun 17 15:30:42 ALPL505 kernel: [] __wake_up_common+0x3e/0x68 Jun 17 15:30:42 ALPL505 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Jun 17 15:30:42 ALPL505 kernel: [] child_rip+0xa/0x11 Jun 17 15:30:42 ALPL505 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Jun 17 15:30:42 ALPL505 kernel: [] child_rip+0x0/0x11 Jun 17 15:30:42 ALPL505 kernel: Jun 17 15:35:37 ALPL505 kernel: Lustre: Service thread pid 993 completed after 512.73s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jun 17 15:35:37 ALPL505 kernel: LustreError: 1052:0:(ldlm_lockd.c:1184:ldlm_handle_enqueue()) ### lock on destroyed export ffff810310066200 ns: mds-LFS05-MDT0000_UUID lock: ffff810111b01200/0x1e6d8ca9719880da lrc: 3/0,0 mode: CR/CR res: 15334960/2640813075 bits 0x3 rrc: 1 type: IBT flags: 0x4000000 remote: 0x7d640213f4f660f4 expref: 3 pid: 1052 timeout: 0 Jun 17 15:35:37 ALPL505 kernel: LustreError: 1052:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-107) req@ffff8102595da050 x1381174273247657/t0 o101->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 552/808 e 0 to 0 dl 1339911688 ref 1 fl Interpret:/0/0 rc -107/0 Jun 17 15:35:37 ALPL505 kernel: LustreError: 1052:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 54 previous similar messages Jun 17 15:35:37 ALPL505 kernel: Lustre: 1052:0:(service.c:1434:ptlrpc_server_handle_request()) @@@ Request x1381174273247657 took longer than estimated (755+6849s); client may timeout. req@ffff8102595da050 x1381174273247657/t0 o101->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 552/808 e 0 to 0 dl 1339911688 ref 1 fl Complete:/0/0 rc -107/-107 Jun 17 15:35:37 ALPL505 kernel: Lustre: Service thread pid 1052 completed after 7605.04s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jun 17 15:36:05 ALPL505 kernel: LustreError: 2068:0:(mds_open.c:1645:mds_close()) @@@ no handle for file close ino 15314179: cookie 0x1e6d8ca9645c84ed req@ffff81062cac5c00 x1381174273536458/t0 o35->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 408/4896 e 0 to 0 dl 1339918571 ref 1 fl Interpret:/0/0 rc 0/0 Jun 17 15:36:05 ALPL505 kernel: LustreError: 2068:0:(mds_open.c:1645:mds_close()) Skipped 4 previous similar messages Jun 17 15:36:59 ALPL505 kernel: LustreError: 2131:0:(mds_open.c:1645:mds_close()) @@@ no handle for file close ino 9351499: cookie 0x1e6d8ca9613db1f8 req@ffff81015c576400 x1381174273590801/t0 o35->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 408/4896 e 0 to 0 dl 1339918625 ref 1 fl Interpret:/0/0 rc 0/0 Jun 17 15:38:06 ALPL505 kernel: LustreError: 2060:0:(mds_open.c:1645:mds_close()) @@@ no handle for file close ino 140219826: cookie 0x1e6d8ca91c7bb1f9 req@ffff81020206bc00 x1381174273679367/t0 o35->1d5ac532-1b11-3731-d5a7-a7567aee1188@NET_0x500000a030542_UUID:0/0 lens 408/4896 e 0 to 0 dl 1339918692 ref 1 fl Interpret:/0/0 rc 0/0 Jun 17 18:11:14 ALPL505 kernel: Lustre: Service thread pid 926 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 17 18:11:14 ALPL505 kernel: Pid: 926, comm: ll_mdt_33 Jun 17 18:11:14 ALPL505 kernel: Jun 17 18:11:14 ALPL505 kernel: Call Trace: Jun 17 18:11:14 ALPL505 kernel: [] lcw_cb+0x0/0x460 [libcfs] Jun 17 18:11:14 ALPL505 kernel: [] libcfs_debug_dumpstack+0x51/0x60 [libcfs] Jun 17 18:11:14 ALPL505 kernel: [] lcw_cb+0x33c/0x460 [libcfs] Jun 17 18:11:14 ALPL505 kernel: [] run_timer_softirq+0x193/0x241 Jun 17 18:11:14 ALPL505 kernel: [] __do_softirq+0x89/0x133 Jun 17 18:11:14 ALPL505 kernel: [] call_softirq+0x1c/0x28 Jun 17 18:11:14 ALPL505 kernel: [] do_softirq+0x2c/0x7d Jun 17 18:11:14 ALPL505 kernel: [] apic_timer_interrupt+0x66/0x6c Jun 17 18:11:14 ALPL505 kernel: [] cache_reap+0x0/0x217 Jun 17 18:11:14 ALPL505 kernel: [] _spin_unlock_irqrestore+0x8/0x9 Jun 17 18:11:14 ALPL505 kernel: [] __down_trylock+0x44/0x4e Jun 17 18:11:14 ALPL505 kernel: [] __down_failed_trylock+0x35/0x3a Jun 17 18:11:14 ALPL505 kernel: [] cache_reap+0x0/0x217 Jun 17 18:11:14 ALPL505 kernel: [] .text.lock.ldlm_resource+0x7d/0x87 [ptlrpc] Jun 17 18:11:14 ALPL505 kernel: [] ldlm_pools_shrink+0x15c/0x2f0 [ptlrpc] Jun 17 18:11:14 ALPL505 kernel: [] shrink_slab+0xd0/0x153 Jun 17 18:11:14 ALPL505 kernel: [] zone_reclaim+0x235/0x2cd Jun 17 18:11:14 ALPL505 kernel: [] get_page_from_freelist+0xbf/0x43a Jun 17 18:11:14 ALPL505 kernel: [] __alloc_pages+0x78/0x308 Jun 17 18:11:14 ALPL505 kernel: [] cache_grow+0x133/0x3c1 Jun 17 18:11:14 ALPL505 kernel: [] cache_alloc_refill+0x136/0x186 Jun 17 18:11:14 ALPL505 kernel: [] kmem_cache_alloc+0x6c/0x76 Jun 17 18:11:14 ALPL505 kernel: [] ldiskfs_alloc_inode+0x19/0x150 [ldiskfs] Jun 17 18:11:14 ALPL505 kernel: [] alloc_inode+0x17/0x192 Jun 17 18:11:14 ALPL505 kernel: [] iget_locked+0x6d/0x149 Jun 17 18:11:14 ALPL505 kernel: [] ldiskfs_iget+0x38/0x6f0 [ldiskfs] Jun 17 18:11:14 ALPL505 kernel: [] ldiskfs_lookup+0xbb/0x200 [ldiskfs] Jun 17 18:11:14 ALPL505 kernel: [] __lookup_hash+0x10b/0x12f Jun 17 18:11:14 ALPL505 kernel: [] lookup_one_len+0x53/0x61 Jun 17 18:11:14 ALPL505 kernel: [] mds_lookup+0xa4/0x760 [mds] Jun 17 18:11:14 ALPL505 kernel: [] mds_get_parent_child_locked+0x33f/0x960 [mds] Jun 17 18:11:14 ALPL505 kernel: [] ldlm_lock_remove_from_lru+0x74/0xe0 [ptlrpc] Jun 17 18:11:14 ALPL505 kernel: [] lock_res_and_lock+0xba/0xd0 [ptlrpc] Jun 17 18:11:14 ALPL505 kernel: [] mds_getattr_lock+0x632/0xc90 [mds] Jun 17 18:11:14 ALPL505 kernel: [] fixup_handle_for_resent_req+0x5a/0x2c0 [mds] Jun 17 18:11:14 ALPL505 kernel: [] mds_intent_policy+0x623/0xc20 [mds] Jun 17 18:11:14 ALPL505 kernel: [] ldlm_resource_putref_internal+0x230/0x460 [ptlrpc] Jun 17 18:11:14 ALPL505 kernel: [] ldlm_lock_enqueue+0x186/0xb20 [ptlrpc] Jun 17 18:11:14 ALPL505 kernel: [] ldlm_lock_create+0x9bd/0x9f0 [ptlrpc] Jun 17 18:11:14 ALPL505 kernel: [] ldlm_server_blocking_ast+0x0/0x83d [ptlrpc] Jun 17 18:11:14 ALPL505 kernel: [] ldlm_handle_enqueue+0xc09/0x1210 [ptlrpc] Jun 17 18:11:14 ALPL505 kernel: [] mds_handle+0x40e0/0x4d10 [mds] Jun 17 18:11:14 ALPL505 kernel: [] smp_send_reschedule+0x4e/0x53 Jun 17 18:11:14 ALPL505 kernel: [] enqueue_task+0x41/0x56 Jun 17 18:11:14 ALPL505 kernel: [] lustre_msg_get_conn_cnt+0x35/0xf0 [ptlrpc] Jun 17 18:11:14 ALPL505 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Jun 17 18:11:14 ALPL505 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Jun 17 18:11:14 ALPL505 kernel: [] __wake_up_common+0x3e/0x68 Jun 17 18:11:14 ALPL505 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Jun 17 18:11:14 ALPL505 kernel: [] child_rip+0xa/0x11 Jun 17 18:11:14 ALPL505 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Jun 17 18:11:14 ALPL505 kernel: [] child_rip+0x0/0x11 Jun 17 18:11:14 ALPL505 kernel: Jun 17 18:13:59 ALPL505 kernel: Lustre: Service thread pid 926 completed after 364.99s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jun 17 18:17:21 ALPL505 kernel: Lustre: Service thread pid 2045 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 17 18:17:21 ALPL505 kernel: Pid: 2045, comm: ll_mdt_rdpg_36 Jun 17 18:17:21 ALPL505 kernel: Jun 17 18:17:21 ALPL505 kernel: Call Trace: Jun 17 18:17:21 ALPL505 kernel: [] lcw_cb+0x0/0x460 [libcfs] Jun 17 18:17:21 ALPL505 kernel: [] libcfs_debug_dumpstack+0x51/0x60 [libcfs] Jun 17 18:17:21 ALPL505 kernel: [] lcw_cb+0x33c/0x460 [libcfs] Jun 17 18:17:21 ALPL505 kernel: [] run_timer_softirq+0x193/0x241 Jun 17 18:17:21 ALPL505 kernel: [] __do_softirq+0x89/0x133 Jun 17 18:17:21 ALPL505 kernel: [] call_softirq+0x1c/0x28 Jun 17 18:17:21 ALPL505 kernel: [] do_softirq+0x2c/0x7d Jun 17 18:17:21 ALPL505 kernel: [] apic_timer_interrupt+0x66/0x6c Jun 17 18:17:21 ALPL505 kernel: [] ext3_releasepage+0x0/0x73 [ext3] Jun 17 18:17:21 ALPL505 kernel: [] cache_reap+0x0/0x217 Jun 17 18:17:21 ALPL505 kernel: [] ldlm_srv_pool_push_slv+0x65/0x80 [ptlrpc] Jun 17 18:17:21 ALPL505 kernel: [] ldlm_srv_pool_shrink+0xdf/0x110 [ptlrpc] Jun 17 18:17:21 ALPL505 kernel: [] __down_trylock+0x44/0x4e Jun 17 18:17:21 ALPL505 kernel: [] __down_failed_trylock+0x35/0x3a Jun 17 18:17:21 ALPL505 kernel: [] ext3_releasepage+0x0/0x73 [ext3] Jun 17 18:17:21 ALPL505 kernel: [] ldlm_pool_shrink+0x31/0xf0 [ptlrpc] Jun 17 18:17:21 ALPL505 kernel: [] .text.lock.ldlm_resource+0x7d/0x87 [ptlrpc] Jun 17 18:17:21 ALPL505 kernel: [] ldlm_pools_shrink+0x29f/0x2f0 [ptlrpc] Jun 17 18:17:21 ALPL505 kernel: [] __down_read+0x12/0x92 Jun 17 18:17:21 ALPL505 kernel: [] __up_read+0x19/0x7f Jun 17 18:17:21 ALPL505 kernel: [] shrink_slab+0xdc/0x153 Jun 17 18:17:21 ALPL505 kernel: [] zone_reclaim+0x235/0x2cd Jun 17 18:17:21 ALPL505 kernel: [] get_page_from_freelist+0xbf/0x43a Jun 17 18:17:21 ALPL505 kernel: [] __alloc_pages+0x78/0x308 Jun 17 18:17:21 ALPL505 kernel: [] cache_grow+0x133/0x3c1 Jun 17 18:17:21 ALPL505 kernel: [] cache_alloc_refill+0x136/0x186 Jun 17 18:17:21 ALPL505 kernel: [] __kmalloc+0x95/0x9f Jun 17 18:17:21 ALPL505 kernel: [] cfs_alloc+0x68/0xc0 [libcfs] Jun 17 18:17:21 ALPL505 kernel: [] lustre_pack_reply_flags+0x5f3/0x950 [ptlrpc] Jun 17 18:17:21 ALPL505 kernel: [] LNetMDBind+0x301/0x450 [lnet] Jun 17 18:17:21 ALPL505 kernel: [] lustre_pack_reply+0x29/0xb0 [ptlrpc] Jun 17 18:17:21 ALPL505 kernel: [] lustre_msg_set_limit+0x35/0xf0 [ptlrpc] Jun 17 18:17:21 ALPL505 kernel: [] mds_close+0x1cf/0x8d0 [mds] Jun 17 18:17:21 ALPL505 kernel: [] __next_cpu+0x19/0x28 Jun 17 18:17:21 ALPL505 kernel: [] find_busiest_group+0x20d/0x621 Jun 17 18:17:21 ALPL505 kernel: [] mds_handle+0x254b/0x4d10 [mds] Jun 17 18:17:21 ALPL505 kernel: [] smp_send_reschedule+0x4e/0x53 Jun 17 18:17:21 ALPL505 kernel: [] enqueue_task+0x41/0x56 Jun 17 18:17:21 ALPL505 kernel: [] lustre_msg_get_conn_cnt+0x35/0xf0 [ptlrpc] Jun 17 18:17:21 ALPL505 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Jun 17 18:17:21 ALPL505 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Jun 17 18:17:21 ALPL505 kernel: [] __wake_up_common+0x3e/0x68 Jun 17 18:17:21 ALPL505 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Jun 17 18:17:21 ALPL505 kernel: [] child_rip+0xa/0x11 Jun 17 18:17:21 ALPL505 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Jun 17 18:17:21 ALPL505 kernel: [] child_rip+0x0/0x11 Jun 17 18:17:21 ALPL505 kernel: Jun 17 18:24:27 ALPL505 kernel: Lustre: Service thread pid 2045 completed after 625.90s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [root@ALPL505 ~]#