Mar 15 03:12:13 cmds1 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="4651" x-info="http://www.rsyslog.com"] rsyslogd was HUPed Mar 15 08:50:20 cmds1 ntpd[5845]: synchronized to 10.21.20.11, stratum 2 Mar 15 13:49:10 cmds1 kernel: INFO: task mdt03_005:6827 blocked for more than 120 seconds. Mar 15 13:49:10 cmds1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 15 13:49:10 cmds1 kernel: mdt03_005 D 000000000000002c 0 6827 2 0x00000000 Mar 15 13:49:10 cmds1 kernel: ffff882d921eb840 0000000000000046 0000000000000000 ffffffff81055f96 Mar 15 13:49:10 cmds1 kernel: ffff882d921eb7d0 ffff882fe6f7aae0 ffff882d921eb7d0 ffffffff8105231d Mar 15 13:49:10 cmds1 kernel: ffff882d91acf098 ffff882d921ebfd8 000000000000fb88 ffff882d91acf098 Mar 15 13:49:10 cmds1 kernel: Call Trace: Mar 15 13:49:10 cmds1 kernel: [] ? enqueue_task+0x66/0x80 Mar 15 13:49:10 cmds1 kernel: [] ? check_preempt_curr+0x6d/0x90 Mar 15 13:49:10 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 13:49:10 cmds1 kernel: [] ? autoremove_wake_function+0x16/0x40 Mar 15 13:49:10 cmds1 kernel: [] ? __wake_up_common+0x59/0x90 Mar 15 13:49:10 cmds1 kernel: [] wait_for_common+0x123/0x180 Mar 15 13:49:10 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 13:49:10 cmds1 kernel: [] ? __queue_work+0x41/0x50 Mar 15 13:49:10 cmds1 kernel: [] wait_for_completion+0x1d/0x20 Mar 15 13:49:10 cmds1 kernel: [] call_usermodehelper_exec+0x10c/0x120 Mar 15 13:49:10 cmds1 kernel: [] mdt_identity_do_upcall+0x13d/0x4c0 [mdt] Mar 15 13:49:10 cmds1 kernel: [] upcall_cache_get_entry+0x1b4/0x860 [libcfs] Mar 15 13:49:10 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 13:49:10 cmds1 kernel: [] ? __req_capsule_get+0x166/0x700 [ptlrpc] Mar 15 13:49:10 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 13:49:10 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 13:49:10 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 13:49:10 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 13:49:10 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 13:49:10 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 13:49:10 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 13:49:10 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 13:49:10 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 13:49:10 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 13:49:10 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 13:49:10 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 13:49:10 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 13:49:10 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 13:49:10 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 13:49:10 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 13:49:10 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 13:49:10 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:49:10 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 13:49:10 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:49:10 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:49:10 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 13:49:31 cmds1 kernel: LNet: Service thread pid 6827 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 13:49:31 cmds1 kernel: Pid: 6827, comm: mdt03_005 Mar 15 13:49:31 cmds1 kernel: Mar 15 13:49:31 cmds1 kernel: Call Trace: Mar 15 13:49:31 cmds1 kernel: [] ? enqueue_task+0x66/0x80 Mar 15 13:49:31 cmds1 kernel: [] ? check_preempt_curr+0x6d/0x90 Mar 15 13:49:31 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 13:49:31 cmds1 kernel: [] ? autoremove_wake_function+0x16/0x40 Mar 15 13:49:31 cmds1 kernel: [] ? __wake_up_common+0x59/0x90 Mar 15 13:49:31 cmds1 kernel: [] wait_for_common+0x123/0x180 Mar 15 13:49:31 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 13:49:31 cmds1 kernel: [] ? __queue_work+0x41/0x50 Mar 15 13:49:31 cmds1 kernel: [] wait_for_completion+0x1d/0x20 Mar 15 13:49:31 cmds1 kernel: [] call_usermodehelper_exec+0x10c/0x120 Mar 15 13:49:31 cmds1 kernel: [] mdt_identity_do_upcall+0x13d/0x4c0 [mdt] Mar 15 13:49:31 cmds1 kernel: [] upcall_cache_get_entry+0x1b4/0x860 [libcfs] Mar 15 13:49:31 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 13:49:31 cmds1 kernel: [] ? __req_capsule_get+0x166/0x700 [ptlrpc] Mar 15 13:49:31 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 13:49:31 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 13:49:31 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 13:49:31 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 13:49:31 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 13:49:31 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 13:49:31 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 13:49:31 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 13:49:31 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 13:49:31 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 13:49:31 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 13:49:31 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 13:49:31 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 13:49:31 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 13:49:31 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 13:49:31 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 13:49:31 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 13:49:31 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:49:31 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 13:49:31 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:49:31 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:49:31 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 13:49:31 cmds1 kernel: Mar 15 13:49:31 cmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1426427371.6827 Mar 15 13:51:10 cmds1 kernel: INFO: task mdt03_005:6827 blocked for more than 120 seconds. Mar 15 13:51:10 cmds1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 15 13:51:10 cmds1 kernel: mdt03_005 D 000000000000002c 0 6827 2 0x00000000 Mar 15 13:51:10 cmds1 kernel: ffff882d921eb840 0000000000000046 0000000000000000 ffffffff81055f96 Mar 15 13:51:10 cmds1 kernel: ffff882d921eb7d0 ffff882fe6f7aae0 ffff882d921eb7d0 ffffffff8105231d Mar 15 13:51:10 cmds1 kernel: ffff882d91acf098 ffff882d921ebfd8 000000000000fb88 ffff882d91acf098 Mar 15 13:51:10 cmds1 kernel: Call Trace: Mar 15 13:51:10 cmds1 kernel: [] ? enqueue_task+0x66/0x80 Mar 15 13:51:10 cmds1 kernel: [] ? check_preempt_curr+0x6d/0x90 Mar 15 13:51:10 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 13:51:10 cmds1 kernel: [] ? autoremove_wake_function+0x16/0x40 Mar 15 13:51:10 cmds1 kernel: [] ? __wake_up_common+0x59/0x90 Mar 15 13:51:10 cmds1 kernel: [] wait_for_common+0x123/0x180 Mar 15 13:51:10 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 13:51:10 cmds1 kernel: [] ? __queue_work+0x41/0x50 Mar 15 13:51:10 cmds1 kernel: [] wait_for_completion+0x1d/0x20 Mar 15 13:51:10 cmds1 kernel: [] call_usermodehelper_exec+0x10c/0x120 Mar 15 13:51:10 cmds1 kernel: [] mdt_identity_do_upcall+0x13d/0x4c0 [mdt] Mar 15 13:51:10 cmds1 kernel: [] upcall_cache_get_entry+0x1b4/0x860 [libcfs] Mar 15 13:51:10 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 13:51:10 cmds1 kernel: [] ? __req_capsule_get+0x166/0x700 [ptlrpc] Mar 15 13:51:10 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 13:51:10 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 13:51:10 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 13:51:10 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 13:51:10 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 13:51:10 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 13:51:10 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 13:51:10 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 13:51:10 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 13:51:10 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 13:51:10 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 13:51:10 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 13:51:10 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 13:51:10 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 13:51:10 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 13:51:10 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 13:51:10 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 13:51:10 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:51:10 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 13:51:10 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:51:10 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:51:10 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 13:53:10 cmds1 kernel: INFO: task lc_watchdogd:5045 blocked for more than 120 seconds. Mar 15 13:53:10 cmds1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 15 13:53:10 cmds1 kernel: lc_watchdogd D 000000000000001c 0 5045 2 0x00000000 Mar 15 13:53:10 cmds1 kernel: ffff882fe140bc40 0000000000000046 0000000000000000 0000000000000000 Mar 15 13:53:10 cmds1 kernel: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Mar 15 13:53:10 cmds1 kernel: ffff882fe1165098 ffff882fe140bfd8 000000000000fb88 ffff882fe1165098 Mar 15 13:53:10 cmds1 kernel: Call Trace: Mar 15 13:53:10 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 13:53:10 cmds1 kernel: [] wait_for_common+0x123/0x180 Mar 15 13:53:10 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 13:53:10 cmds1 kernel: [] ? __queue_work+0x41/0x50 Mar 15 13:53:10 cmds1 kernel: [] wait_for_completion+0x1d/0x20 Mar 15 13:53:10 cmds1 kernel: [] call_usermodehelper_exec+0x10c/0x120 Mar 15 13:53:10 cmds1 kernel: [] libcfs_run_debug_log_upcall+0xa1/0x2b0 [libcfs] Mar 15 13:53:10 cmds1 kernel: [] ? printk+0x41/0x48 Mar 15 13:53:10 cmds1 kernel: [] libcfs_debug_dumplog_internal+0xc1/0xd0 [libcfs] Mar 15 13:53:10 cmds1 kernel: [] lc_watchdog_dumplog+0x11/0x20 [libcfs] Mar 15 13:53:10 cmds1 kernel: [] lcw_dispatch_main+0x623/0x960 [libcfs] Mar 15 13:53:10 cmds1 kernel: [] ? autoremove_wake_function+0x0/0x40 Mar 15 13:53:10 cmds1 kernel: [] ? lcw_dispatch_main+0x0/0x960 [libcfs] Mar 15 13:53:10 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 13:53:10 cmds1 kernel: [] ? lcw_dispatch_main+0x0/0x960 [libcfs] Mar 15 13:53:10 cmds1 kernel: [] ? lcw_dispatch_main+0x0/0x960 [libcfs] Mar 15 13:53:10 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 13:53:10 cmds1 kernel: INFO: task mdt00_001:5097 blocked for more than 120 seconds. Mar 15 13:53:10 cmds1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 15 13:53:10 cmds1 kernel: mdt00_001 D 0000000000000018 0 5097 2 0x00000000 Mar 15 13:53:10 cmds1 kernel: ffff885fc15239b0 0000000000000046 0000000000000001 0000000000000282 Mar 15 13:53:10 cmds1 kernel: ffff885fc1523950 ffffffff81055ad3 ffff882fe3ea0000 ffff882d949233c0 Mar 15 13:53:10 cmds1 kernel: ffff885fe50e7058 ffff885fc1523fd8 000000000000fb88 ffff885fe50e7058 Mar 15 13:53:10 cmds1 kernel: Call Trace: Mar 15 13:53:10 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 13:53:10 cmds1 kernel: [] ? ksocknal_queue_tx_locked+0x136/0x530 [ksocklnd] Mar 15 13:53:10 cmds1 kernel: [] ? put_dec+0x10c/0x110 Mar 15 13:53:10 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 13:53:10 cmds1 kernel: [] ? ksocknal_launch_packet+0x183/0x410 [ksocklnd] Mar 15 13:53:10 cmds1 kernel: [] wait_for_common+0x123/0x180 Mar 15 13:53:10 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 13:53:10 cmds1 kernel: [] ? __queue_work+0x41/0x50 Mar 15 13:53:10 cmds1 kernel: [] wait_for_completion+0x1d/0x20 Mar 15 13:53:10 cmds1 kernel: [] call_usermodehelper_exec+0x10c/0x120 Mar 15 13:53:10 cmds1 kernel: [] mdt_identity_do_upcall+0x13d/0x4c0 [mdt] Mar 15 13:53:10 cmds1 kernel: [] upcall_cache_get_entry+0x1b4/0x860 [libcfs] Mar 15 13:53:10 cmds1 kernel: [] ? __req_capsule_get+0x166/0x700 [ptlrpc] Mar 15 13:53:10 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 13:53:10 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 13:53:10 cmds1 kernel: [] mdt_getattr_name+0x98/0x280 [mdt] Mar 15 13:53:10 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 13:53:10 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 13:53:10 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 13:53:10 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 13:53:10 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 13:53:10 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 13:53:10 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 13:53:10 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 13:53:10 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 13:53:10 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:53:10 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 13:53:10 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:53:10 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:53:10 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 13:53:10 cmds1 kernel: INFO: task mdt03_005:6827 blocked for more than 120 seconds. Mar 15 13:53:10 cmds1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 15 13:53:10 cmds1 kernel: mdt03_005 D 000000000000002c 0 6827 2 0x00000000 Mar 15 13:53:10 cmds1 kernel: ffff882d921eb840 0000000000000046 0000000000000000 ffffffff81055f96 Mar 15 13:53:10 cmds1 kernel: ffff882d921eb7d0 ffff882fe6f7aae0 ffff882d921eb7d0 ffffffff8105231d Mar 15 13:53:10 cmds1 kernel: ffff882d91acf098 ffff882d921ebfd8 000000000000fb88 ffff882d91acf098 Mar 15 13:53:10 cmds1 kernel: Call Trace: Mar 15 13:53:10 cmds1 kernel: [] ? enqueue_task+0x66/0x80 Mar 15 13:53:10 cmds1 kernel: [] ? check_preempt_curr+0x6d/0x90 Mar 15 13:53:10 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 13:53:10 cmds1 kernel: [] ? autoremove_wake_function+0x16/0x40 Mar 15 13:53:10 cmds1 kernel: [] ? __wake_up_common+0x59/0x90 Mar 15 13:53:10 cmds1 kernel: [] wait_for_common+0x123/0x180 Mar 15 13:53:10 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 13:53:10 cmds1 kernel: [] ? __queue_work+0x41/0x50 Mar 15 13:53:10 cmds1 kernel: [] wait_for_completion+0x1d/0x20 Mar 15 13:53:10 cmds1 kernel: [] call_usermodehelper_exec+0x10c/0x120 Mar 15 13:53:10 cmds1 kernel: [] mdt_identity_do_upcall+0x13d/0x4c0 [mdt] Mar 15 13:53:10 cmds1 kernel: [] upcall_cache_get_entry+0x1b4/0x860 [libcfs] Mar 15 13:53:10 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 13:53:10 cmds1 kernel: [] ? __req_capsule_get+0x166/0x700 [ptlrpc] Mar 15 13:53:10 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 13:53:10 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 13:53:10 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 13:53:10 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 13:53:10 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 13:53:10 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 13:53:10 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 13:53:10 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 13:53:10 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 13:53:10 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 13:53:10 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 13:53:10 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 13:53:10 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 13:53:10 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 13:53:10 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 13:53:10 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 13:53:10 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 13:53:10 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:53:10 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 13:53:10 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:53:10 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:53:10 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 13:53:10 cmds1 kernel: INFO: task mdt00_009:6891 blocked for more than 120 seconds. Mar 15 13:53:10 cmds1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 15 13:53:10 cmds1 kernel: mdt00_009 D 0000000000000000 0 6891 2 0x00000000 Mar 15 13:53:10 cmds1 kernel: ffff882d931b1840 0000000000000046 0000000000000000 ffff8814e0bbc7b0 Mar 15 13:53:10 cmds1 kernel: ffff8814e0bbc678 ffff880a850ccd60 ffff880a850cc338 ffff880a850cc268 Mar 15 13:53:10 cmds1 kernel: ffff882d93047af8 ffff882d931b1fd8 000000000000fb88 ffff882d93047af8 Mar 15 13:53:10 cmds1 kernel: Call Trace: Mar 15 13:53:10 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 13:53:10 cmds1 kernel: [] wait_for_common+0x123/0x180 Mar 15 13:53:10 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 13:53:10 cmds1 kernel: [] ? __queue_work+0x41/0x50 Mar 15 13:53:10 cmds1 kernel: [] wait_for_completion+0x1d/0x20 Mar 15 13:53:10 cmds1 kernel: [] call_usermodehelper_exec+0x10c/0x120 Mar 15 13:53:10 cmds1 kernel: [] mdt_identity_do_upcall+0x13d/0x4c0 [mdt] Mar 15 13:53:10 cmds1 kernel: [] upcall_cache_get_entry+0x1b4/0x860 [libcfs] Mar 15 13:53:10 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 13:53:10 cmds1 kernel: [] ? __req_capsule_get+0x166/0x700 [ptlrpc] Mar 15 13:53:10 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 13:53:10 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 13:53:10 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 13:53:10 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 13:53:10 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 13:53:10 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 13:53:10 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 13:53:10 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 13:53:10 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 13:53:10 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 13:53:10 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 13:53:10 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 13:53:10 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 13:53:10 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 13:53:10 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 13:53:10 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 13:53:10 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 13:53:10 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:53:10 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 13:53:10 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:53:10 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:53:10 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 13:55:10 cmds1 kernel: INFO: task lc_watchdogd:5045 blocked for more than 120 seconds. Mar 15 13:55:10 cmds1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 15 13:55:10 cmds1 kernel: lc_watchdogd D 000000000000001c 0 5045 2 0x00000000 Mar 15 13:55:10 cmds1 kernel: ffff882fe140bc40 0000000000000046 0000000000000000 0000000000000000 Mar 15 13:55:10 cmds1 kernel: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Mar 15 13:55:10 cmds1 kernel: ffff882fe1165098 ffff882fe140bfd8 000000000000fb88 ffff882fe1165098 Mar 15 13:55:10 cmds1 kernel: Call Trace: Mar 15 13:55:10 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 13:55:10 cmds1 kernel: [] wait_for_common+0x123/0x180 Mar 15 13:55:10 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 13:55:10 cmds1 kernel: [] ? __queue_work+0x41/0x50 Mar 15 13:55:10 cmds1 kernel: [] wait_for_completion+0x1d/0x20 Mar 15 13:55:10 cmds1 kernel: [] call_usermodehelper_exec+0x10c/0x120 Mar 15 13:55:10 cmds1 kernel: [] libcfs_run_debug_log_upcall+0xa1/0x2b0 [libcfs] Mar 15 13:55:10 cmds1 kernel: [] ? printk+0x41/0x48 Mar 15 13:55:10 cmds1 kernel: [] libcfs_debug_dumplog_internal+0xc1/0xd0 [libcfs] Mar 15 13:55:10 cmds1 kernel: [] lc_watchdog_dumplog+0x11/0x20 [libcfs] Mar 15 13:55:10 cmds1 kernel: [] lcw_dispatch_main+0x623/0x960 [libcfs] Mar 15 13:55:10 cmds1 kernel: [] ? autoremove_wake_function+0x0/0x40 Mar 15 13:55:10 cmds1 kernel: [] ? lcw_dispatch_main+0x0/0x960 [libcfs] Mar 15 13:55:10 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 13:55:10 cmds1 kernel: [] ? lcw_dispatch_main+0x0/0x960 [libcfs] Mar 15 13:55:10 cmds1 kernel: [] ? lcw_dispatch_main+0x0/0x960 [libcfs] Mar 15 13:55:10 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 13:55:10 cmds1 kernel: INFO: task mdt00_001:5097 blocked for more than 120 seconds. Mar 15 13:55:10 cmds1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 15 13:55:10 cmds1 kernel: mdt00_001 D 0000000000000018 0 5097 2 0x00000000 Mar 15 13:55:10 cmds1 kernel: ffff885fc15239b0 0000000000000046 0000000000000001 0000000000000282 Mar 15 13:55:10 cmds1 kernel: ffff885fc1523950 ffffffff81055ad3 ffff882fe3ea0000 ffff882d949233c0 Mar 15 13:55:10 cmds1 kernel: ffff885fe50e7058 ffff885fc1523fd8 000000000000fb88 ffff885fe50e7058 Mar 15 13:55:10 cmds1 kernel: Call Trace: Mar 15 13:55:10 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 13:55:10 cmds1 kernel: [] ? ksocknal_queue_tx_locked+0x136/0x530 [ksocklnd] Mar 15 13:55:10 cmds1 kernel: [] ? put_dec+0x10c/0x110 Mar 15 13:55:10 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 13:55:10 cmds1 kernel: [] ? ksocknal_launch_packet+0x183/0x410 [ksocklnd] Mar 15 13:55:10 cmds1 kernel: [] wait_for_common+0x123/0x180 Mar 15 13:55:10 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 13:55:10 cmds1 kernel: [] ? __queue_work+0x41/0x50 Mar 15 13:55:10 cmds1 kernel: [] wait_for_completion+0x1d/0x20 Mar 15 13:55:10 cmds1 kernel: [] call_usermodehelper_exec+0x10c/0x120 Mar 15 13:55:10 cmds1 kernel: [] mdt_identity_do_upcall+0x13d/0x4c0 [mdt] Mar 15 13:55:10 cmds1 kernel: [] upcall_cache_get_entry+0x1b4/0x860 [libcfs] Mar 15 13:55:10 cmds1 kernel: [] ? __req_capsule_get+0x166/0x700 [ptlrpc] Mar 15 13:55:10 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 13:55:10 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 13:55:10 cmds1 kernel: [] mdt_getattr_name+0x98/0x280 [mdt] Mar 15 13:55:10 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 13:55:10 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 13:55:10 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 13:55:10 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 13:55:10 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 13:55:10 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 13:55:10 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 13:55:10 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 13:55:10 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 13:55:10 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:55:10 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 13:55:10 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:55:10 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:55:10 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 13:55:10 cmds1 kernel: INFO: task mdt03_003:5893 blocked for more than 120 seconds. Mar 15 13:55:10 cmds1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 15 13:55:10 cmds1 kernel: mdt03_003 D 000000000000002a 0 5893 2 0x00000000 Mar 15 13:55:10 cmds1 kernel: ffff882d91adf840 0000000000000046 ffff8801050d67a8 ffff8801050d6740 Mar 15 13:55:10 cmds1 kernel: 000000000000002a 000000000000002c ffff882d91adf7e0 ffffffff81065d54 Mar 15 13:55:10 cmds1 kernel: ffff882d919785f8 ffff882d91adffd8 000000000000fb88 ffff882d919785f8 Mar 15 13:55:10 cmds1 kernel: Call Trace: Mar 15 13:55:10 cmds1 kernel: [] ? enqueue_task_fair+0x64/0x100 Mar 15 13:55:10 cmds1 kernel: [] ? put_dec+0x10c/0x110 Mar 15 13:55:10 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 13:55:10 cmds1 kernel: [] wait_for_common+0x123/0x180 Mar 15 13:55:10 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 13:55:10 cmds1 kernel: [] ? __queue_work+0x41/0x50 Mar 15 13:55:10 cmds1 kernel: [] wait_for_completion+0x1d/0x20 Mar 15 13:55:10 cmds1 kernel: [] call_usermodehelper_exec+0x10c/0x120 Mar 15 13:55:10 cmds1 kernel: [] mdt_identity_do_upcall+0x13d/0x4c0 [mdt] Mar 15 13:55:10 cmds1 kernel: [] upcall_cache_get_entry+0x1b4/0x860 [libcfs] Mar 15 13:55:10 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 13:55:10 cmds1 kernel: [] ? __req_capsule_get+0x166/0x700 [ptlrpc] Mar 15 13:55:10 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 13:55:10 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 13:55:10 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 13:55:10 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 13:55:10 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 13:55:10 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 13:55:10 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 13:55:10 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 13:55:10 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 13:55:10 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 13:55:10 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 13:55:10 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 13:55:10 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 13:55:10 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 13:55:10 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 13:55:10 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 13:55:10 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 13:55:10 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:55:10 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 13:55:10 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:55:10 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:55:10 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 13:55:10 cmds1 kernel: INFO: task mdt03_005:6827 blocked for more than 120 seconds. Mar 15 13:55:10 cmds1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 15 13:55:10 cmds1 kernel: mdt03_005 D 000000000000002c 0 6827 2 0x00000000 Mar 15 13:55:10 cmds1 kernel: ffff882d921eb840 0000000000000046 0000000000000000 ffffffff81055f96 Mar 15 13:55:10 cmds1 kernel: ffff882d921eb7d0 ffff882fe6f7aae0 ffff882d921eb7d0 ffffffff8105231d Mar 15 13:55:10 cmds1 kernel: ffff882d91acf098 ffff882d921ebfd8 000000000000fb88 ffff882d91acf098 Mar 15 13:55:10 cmds1 kernel: Call Trace: Mar 15 13:55:10 cmds1 kernel: [] ? enqueue_task+0x66/0x80 Mar 15 13:55:10 cmds1 kernel: [] ? check_preempt_curr+0x6d/0x90 Mar 15 13:55:10 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 13:55:10 cmds1 kernel: [] ? autoremove_wake_function+0x16/0x40 Mar 15 13:55:10 cmds1 kernel: [] ? __wake_up_common+0x59/0x90 Mar 15 13:55:10 cmds1 kernel: [] wait_for_common+0x123/0x180 Mar 15 13:55:10 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 13:55:10 cmds1 kernel: [] ? __queue_work+0x41/0x50 Mar 15 13:55:10 cmds1 kernel: [] wait_for_completion+0x1d/0x20 Mar 15 13:55:10 cmds1 kernel: [] call_usermodehelper_exec+0x10c/0x120 Mar 15 13:55:10 cmds1 kernel: [] mdt_identity_do_upcall+0x13d/0x4c0 [mdt] Mar 15 13:55:10 cmds1 kernel: [] upcall_cache_get_entry+0x1b4/0x860 [libcfs] Mar 15 13:55:10 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 13:55:10 cmds1 kernel: [] ? __req_capsule_get+0x166/0x700 [ptlrpc] Mar 15 13:55:10 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 13:55:10 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 13:55:10 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 13:55:10 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 13:55:10 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 13:55:10 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 13:55:10 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 13:55:10 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 13:55:10 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 13:55:10 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 13:55:10 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 13:55:10 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 13:55:10 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 13:55:10 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 13:55:10 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 13:55:10 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 13:55:10 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 13:55:10 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:55:10 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 13:55:10 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:55:10 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:55:10 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 13:56:58 cmds1 kernel: LNet: Service thread pid 7280 was inactive for 305.60s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 13:56:58 cmds1 kernel: Pid: 7280, comm: mdt07_014 Mar 15 13:56:58 cmds1 kernel: Mar 15 13:56:58 cmds1 kernel: Call Trace: Mar 15 13:56:58 cmds1 kernel: [] ? __find_get_block+0x97/0xe0 Mar 15 13:56:58 cmds1 kernel: [] ? __getblk+0x2c/0x2a0 Mar 15 13:56:58 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 13:56:58 cmds1 kernel: [] wait_for_common+0x123/0x180 Mar 15 13:56:58 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 13:56:58 cmds1 kernel: [] ? __queue_work+0x41/0x50 Mar 15 13:56:58 cmds1 kernel: [] wait_for_completion+0x1d/0x20 Mar 15 13:56:58 cmds1 kernel: [] call_usermodehelper_exec+0x10c/0x120 Mar 15 13:56:58 cmds1 kernel: [] mdt_identity_do_upcall+0x13d/0x4c0 [mdt] Mar 15 13:56:58 cmds1 kernel: [] upcall_cache_get_entry+0x1b4/0x860 [libcfs] Mar 15 13:56:58 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ? __req_capsule_get+0x166/0x700 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 13:56:58 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 13:56:58 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 13:56:58 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 13:56:58 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 13:56:58 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 13:56:58 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 13:56:58 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 13:56:58 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 13:56:58 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 13:56:58 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 13:56:58 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 13:56:58 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 13:56:58 cmds1 kernel: Mar 15 13:56:58 cmds1 kernel: Pid: 6891, comm: mdt00_009 Mar 15 13:56:58 cmds1 kernel: Mar 15 13:56:58 cmds1 kernel: Call Trace: Mar 15 13:56:58 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 13:56:58 cmds1 kernel: [] wait_for_common+0x123/0x180 Mar 15 13:56:58 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 13:56:58 cmds1 kernel: [] ? __queue_work+0x41/0x50 Mar 15 13:56:58 cmds1 kernel: [] wait_for_completion+0x1d/0x20 Mar 15 13:56:58 cmds1 kernel: [] call_usermodehelper_exec+0x10c/0x120 Mar 15 13:56:58 cmds1 kernel: [] mdt_identity_do_upcall+0x13d/0x4c0 [mdt] Mar 15 13:56:58 cmds1 kernel: [] upcall_cache_get_entry+0x1b4/0x860 [libcfs] Mar 15 13:56:58 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ? __req_capsule_get+0x166/0x700 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 13:56:58 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 13:56:58 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 13:56:58 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 13:56:58 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 13:56:58 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 13:56:58 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 13:56:58 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 13:56:58 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 13:56:58 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 13:56:58 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 13:56:58 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 13:56:58 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 13:56:58 cmds1 kernel: Mar 15 13:56:58 cmds1 kernel: Pid: 7329, comm: mdt03_012 Mar 15 13:56:58 cmds1 kernel: Mar 15 13:56:58 cmds1 kernel: Call Trace: Mar 15 13:56:58 cmds1 kernel: [] ? _spin_unlock_bh+0x1b/0x20 Mar 15 13:56:58 cmds1 kernel: [] ? ksocknal_queue_tx_locked+0x136/0x530 [ksocklnd] Mar 15 13:56:58 cmds1 kernel: [] ? ksocknal_find_conn_locked+0x159/0x290 [ksocklnd] Mar 15 13:56:58 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 13:56:58 cmds1 kernel: [] cfs_waitq_timedwait+0x11/0x20 [libcfs] Mar 15 13:56:58 cmds1 kernel: [] upcall_cache_get_entry+0x253/0x860 [libcfs] Mar 15 13:56:58 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 13:56:58 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 13:56:58 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 13:56:58 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 13:56:58 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 13:56:58 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 13:56:58 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 13:56:58 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 13:56:58 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 13:56:58 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 13:56:58 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 13:56:58 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 13:56:58 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 13:56:58 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 13:56:58 cmds1 kernel: Mar 15 13:56:58 cmds1 kernel: Pid: 5097, comm: mdt00_001 Mar 15 13:56:58 cmds1 kernel: Mar 15 13:56:58 cmds1 kernel: Call Trace: Mar 15 13:56:58 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 13:56:58 cmds1 kernel: [] ? ksocknal_queue_tx_locked+0x136/0x530 [ksocklnd] Mar 15 13:56:58 cmds1 kernel: [] ? put_dec+0x10c/0x110 Mar 15 13:56:58 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 13:56:58 cmds1 kernel: [] ? ksocknal_launch_packet+0x183/0x410 [ksocklnd] Mar 15 13:56:58 cmds1 kernel: [] wait_for_common+0x123/0x180 Mar 15 13:56:58 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 13:56:58 cmds1 kernel: [] ? __queue_work+0x41/0x50 Mar 15 13:56:58 cmds1 kernel: [] wait_for_completion+0x1d/0x20 Mar 15 13:56:58 cmds1 kernel: [] call_usermodehelper_exec+0x10c/0x120 Mar 15 13:56:58 cmds1 kernel: [] mdt_identity_do_upcall+0x13d/0x4c0 [mdt] Mar 15 13:56:58 cmds1 kernel: [] upcall_cache_get_entry+0x1b4/0x860 [libcfs] Mar 15 13:56:58 cmds1 kernel: [] ? __req_capsule_get+0x166/0x700 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 13:56:58 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 13:56:58 cmds1 kernel: [] mdt_getattr_name+0x98/0x280 [mdt] Mar 15 13:56:58 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 13:56:58 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 13:56:58 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 13:56:58 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 13:56:58 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 13:56:58 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 13:56:58 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 13:56:58 cmds1 kernel: Mar 15 13:56:58 cmds1 kernel: Pid: 4571, comm: mdt03_035 Mar 15 13:56:58 cmds1 kernel: Mar 15 13:56:58 cmds1 kernel: Call Trace: Mar 15 13:56:58 cmds1 kernel: [] ? _spin_unlock_bh+0x1b/0x20 Mar 15 13:56:58 cmds1 kernel: [] ? ksocknal_queue_tx_locked+0x136/0x530 [ksocklnd] Mar 15 13:56:58 cmds1 kernel: [] ? ksocknal_find_conn_locked+0x159/0x290 [ksocklnd] Mar 15 13:56:58 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 13:56:58 cmds1 kernel: [] cfs_waitq_timedwait+0x11/0x20 [libcfs] Mar 15 13:56:58 cmds1 kernel: [] upcall_cache_get_entry+0x253/0x860 [libcfs] Mar 15 13:56:58 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 13:56:58 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 13:56:58 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 13:56:58 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 13:56:58 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 13:56:58 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 13:56:58 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 13:56:58 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 13:56:58 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 13:56:58 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 13:56:58 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 13:56:58 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 13:56:58 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 13:56:58 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 13:56:58 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 13:56:58 cmds1 kernel: Mar 15 13:56:59 cmds1 kernel: LNet: Service thread pid 7329 completed after 467.31s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 13:56:59 cmds1 kernel: LNet: Service thread pid 6891 completed after 382.46s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 13:56:59 cmds1 kernel: LNet: Skipped 2 previous similar messages Mar 15 14:03:00 cmds1 kernel: LNet: Service thread pid 6843 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 14:03:00 cmds1 kernel: LNet: Skipped 4 previous similar messages Mar 15 14:03:00 cmds1 kernel: Pid: 6843, comm: mdt_rdpg05_002 Mar 15 14:03:00 cmds1 kernel: Mar 15 14:03:00 cmds1 kernel: Call Trace: Mar 15 14:03:00 cmds1 kernel: [] ? try_to_free_buffers+0x45/0xc0 Mar 15 14:03:00 cmds1 kernel: [] ? jbd2_journal_try_to_free_buffers+0xa7/0x150 [jbd2] Mar 15 14:03:00 cmds1 kernel: [] ? bdev_try_to_free_page+0x48/0x90 [ldiskfs] Mar 15 14:03:00 cmds1 kernel: [] ? __wake_up_bit+0x31/0x40 Mar 15 14:03:00 cmds1 kernel: [] ? shrink_page_list.clone.3+0xd0/0x650 Mar 15 14:03:00 cmds1 kernel: [] ? mem_cgroup_lru_del_list+0x2b/0xb0 Mar 15 14:03:00 cmds1 kernel: [] ? isolate_lru_pages.clone.0+0xd7/0x170 Mar 15 14:03:00 cmds1 kernel: [] ? __pagevec_release+0x26/0x40 Mar 15 14:03:00 cmds1 kernel: [] ? shrink_inactive_list+0x4f5/0x830 Mar 15 14:03:00 cmds1 kernel: [] ? shrink_mem_cgroup_zone+0x3ae/0x610 Mar 15 14:03:00 cmds1 kernel: [] ? mem_cgroup_iter+0xfd/0x280 Mar 15 14:03:00 cmds1 kernel: [] ? shrink_zone+0x63/0xb0 Mar 15 14:03:00 cmds1 kernel: [] ? do_try_to_free_pages+0x115/0x610 Mar 15 14:03:00 cmds1 kernel: [] ? zone_watermark_ok+0x1f/0x30 Mar 15 14:03:00 cmds1 kernel: [] ? try_to_free_pages+0x92/0x120 Mar 15 14:03:00 cmds1 kernel: [] ? next_zone+0x30/0x40 Mar 15 14:03:00 cmds1 kernel: [] ? __alloc_pages_nodemask+0x478/0x8d0 Mar 15 14:03:00 cmds1 kernel: [] ? kmem_getpages+0x62/0x170 Mar 15 14:03:00 cmds1 kernel: [] ? fallback_alloc+0x1ba/0x270 Mar 15 14:03:00 cmds1 kernel: [] ? cache_grow+0x2cf/0x320 Mar 15 14:03:00 cmds1 kernel: [] ? ____cache_alloc_node+0x99/0x160 Mar 15 14:03:00 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 14:03:00 cmds1 kernel: [] ? __kmalloc+0x189/0x220 Mar 15 14:03:00 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 14:03:00 cmds1 kernel: [] ? ptlrpc_new_bulk+0x48/0x280 [ptlrpc] Mar 15 14:03:00 cmds1 kernel: [] ? ptlrpc_prep_bulk_exp+0x5b/0x180 [ptlrpc] Mar 15 14:03:00 cmds1 kernel: [] ? mdd_dir_page_build+0x0/0x210 [mdd] Mar 15 14:03:00 cmds1 kernel: [] ? mdt_sendpage+0x6b/0x240 [mdt] Mar 15 14:03:00 cmds1 kernel: [] ? mdt_readpage+0x497/0x960 [mdt] Mar 15 14:03:00 cmds1 kernel: [] ? mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 14:03:00 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 14:03:00 cmds1 kernel: [] ? mds_readpage_handle+0x15/0x20 [mdt] Mar 15 14:03:00 cmds1 kernel: [] ? ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 14:03:00 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 14:03:00 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 14:03:00 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 14:03:00 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 14:03:00 cmds1 kernel: [] ? ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 14:03:00 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:03:00 cmds1 kernel: [] ? child_rip+0xa/0x20 Mar 15 14:03:00 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:03:00 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:03:00 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 14:03:00 cmds1 kernel: Mar 15 14:03:00 cmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1426428180.6843 Mar 15 14:16:04 cmds1 kernel: Lustre: 42075:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-383), not sending early reply Mar 15 14:16:04 cmds1 kernel: req@ffff885ce5b48800 x1495261833981160/t0(0) o37->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 448/440 e 4 to 0 dl 1426428968 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 14:16:15 cmds1 kernel: Lustre: 41497:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Mar 15 14:16:15 cmds1 kernel: req@ffff881ababad800 x1495260055489144/t0(0) o101->9320a00a-8d9a-c89c-3c69-a54bac6d4be5@10.21.22.27@tcp:0/0 lens 576/3448 e 0 to 0 dl 1426428980 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 14:16:21 cmds1 kernel: Lustre: charlie-MDT0000: Client 9320a00a-8d9a-c89c-3c69-a54bac6d4be5 (at 10.21.22.27@tcp) reconnecting Mar 15 14:16:21 cmds1 kernel: Lustre: charlie-MDT0000: Client 9320a00a-8d9a-c89c-3c69-a54bac6d4be5 (at 10.21.22.27@tcp) refused reconnection, still busy with 1 active RPCs Mar 15 14:16:46 cmds1 kernel: Lustre: charlie-MDT0000: Client 9320a00a-8d9a-c89c-3c69-a54bac6d4be5 (at 10.21.22.27@tcp) reconnecting Mar 15 14:16:46 cmds1 kernel: Lustre: charlie-MDT0000: Client 9320a00a-8d9a-c89c-3c69-a54bac6d4be5 (at 10.21.22.27@tcp) refused reconnection, still busy with 1 active RPCs Mar 15 14:16:54 cmds1 kernel: Lustre: 9392:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-189), not sending early reply Mar 15 14:16:54 cmds1 kernel: req@ffff884e0fecf800 x1495256916706128/t0(0) o101->513ddad8-648c-4e1d-4def-6b4a17dbd93c@10.21.22.26@tcp:0/0 lens 576/3448 e 1 to 0 dl 1426429019 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 14:17:09 cmds1 kernel: LustreError: 6843:0:(ldlm_lib.c:2702:target_bulk_io()) @@@ timeout on bulk PUT after -61+61s req@ffff885ce5b48800 x1495261833981160/t0(0) o37->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 448/440 e 4 to 0 dl 1426428968 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 14:17:09 cmds1 kernel: Lustre: 6843:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (988:61s); client may timeout. req@ffff885ce5b48800 x1495261833981160/t0(0) o37->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 448/408 e 4 to 0 dl 1426428968 ref 1 fl Complete:/0/0 rc -110/-110 Mar 15 14:17:09 cmds1 kernel: LNet: Service thread pid 6843 completed after 1048.98s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 14:17:09 cmds1 kernel: LNet: Skipped 2 previous similar messages Mar 15 14:17:09 cmds1 kernel: LNet: Service thread pid 17051 was inactive for 803.94s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 14:17:09 cmds1 kernel: Pid: 17051, comm: mdt07_066 Mar 15 14:17:09 cmds1 kernel: Mar 15 14:17:09 cmds1 kernel: Call Trace: Mar 15 14:17:09 cmds1 kernel: [] ? enqueue_task_fair+0x64/0x100 Mar 15 14:17:09 cmds1 kernel: [] ? put_dec+0x10c/0x110 Mar 15 14:17:09 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 14:17:09 cmds1 kernel: [] wait_for_common+0x123/0x180 Mar 15 14:17:09 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 14:17:09 cmds1 kernel: [] ? __queue_work+0x41/0x50 Mar 15 14:17:09 cmds1 kernel: [] wait_for_completion+0x1d/0x20 Mar 15 14:17:09 cmds1 kernel: [] call_usermodehelper_exec+0x10c/0x120 Mar 15 14:17:09 cmds1 kernel: [] mdt_identity_do_upcall+0x13d/0x4c0 [mdt] Mar 15 14:17:09 cmds1 kernel: [] upcall_cache_get_entry+0x1b4/0x860 [libcfs] Mar 15 14:17:09 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ? __req_capsule_get+0x166/0x700 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 14:17:09 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 14:17:09 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 14:17:09 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 14:17:09 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 14:17:09 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 14:17:09 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 14:17:09 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 14:17:09 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 14:17:09 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 14:17:09 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 14:17:09 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 14:17:09 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 14:17:09 cmds1 kernel: Mar 15 14:17:09 cmds1 kernel: Pid: 7349, comm: mdt03_016 Mar 15 14:17:09 cmds1 kernel: Mar 15 14:17:09 cmds1 kernel: Call Trace: Mar 15 14:17:09 cmds1 kernel: [] ? _spin_unlock_bh+0x1b/0x20 Mar 15 14:17:09 cmds1 kernel: [] ? ksocknal_queue_tx_locked+0x136/0x530 [ksocklnd] Mar 15 14:17:09 cmds1 kernel: [] ? ksocknal_find_conn_locked+0x159/0x290 [ksocklnd] Mar 15 14:17:09 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 14:17:09 cmds1 kernel: [] cfs_waitq_timedwait+0x11/0x20 [libcfs] Mar 15 14:17:09 cmds1 kernel: [] upcall_cache_get_entry+0x253/0x860 [libcfs] Mar 15 14:17:09 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 14:17:09 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 14:17:09 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 14:17:09 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 14:17:09 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 14:17:09 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 14:17:09 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 14:17:09 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 14:17:09 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 14:17:09 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 14:17:09 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 14:17:09 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 14:17:09 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 14:17:09 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 14:17:09 cmds1 kernel: Mar 15 14:17:09 cmds1 kernel: Pid: 24531, comm: mdt05_042 Mar 15 14:17:09 cmds1 kernel: Mar 15 14:17:09 cmds1 kernel: Call Trace: Mar 15 14:17:09 cmds1 kernel: [] ? _spin_unlock_bh+0x1b/0x20 Mar 15 14:17:09 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 14:17:09 cmds1 kernel: [] cfs_waitq_timedwait+0x11/0x20 [libcfs] Mar 15 14:17:09 cmds1 kernel: [] upcall_cache_get_entry+0x253/0x860 [libcfs] Mar 15 14:17:09 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 14:17:09 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 14:17:09 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 14:17:09 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 14:17:09 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 14:17:09 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 14:17:09 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 14:17:09 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 14:17:09 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 14:17:09 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 14:17:09 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 14:17:09 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 14:17:09 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 14:17:09 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 14:17:09 cmds1 kernel: Mar 15 14:17:09 cmds1 kernel: Pid: 7333, comm: mdt05_012 Mar 15 14:17:09 cmds1 kernel: Mar 15 14:17:09 cmds1 kernel: Call Trace: Mar 15 14:17:09 cmds1 kernel: [] ? __alloc_pages_nodemask+0x113/0x8d0 Mar 15 14:17:09 cmds1 kernel: [] ? put_dec+0x10c/0x110 Mar 15 14:17:09 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 14:17:09 cmds1 kernel: [] wait_for_common+0x123/0x180 Mar 15 14:17:09 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 14:17:09 cmds1 kernel: [] ? __queue_work+0x41/0x50 Mar 15 14:17:09 cmds1 kernel: [] wait_for_completion+0x1d/0x20 Mar 15 14:17:09 cmds1 kernel: [] call_usermodehelper_exec+0x10c/0x120 Mar 15 14:17:09 cmds1 kernel: [] mdt_identity_do_upcall+0x13d/0x4c0 [mdt] Mar 15 14:17:09 cmds1 kernel: [] upcall_cache_get_entry+0x1b4/0x860 [libcfs] Mar 15 14:17:09 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ? __req_capsule_get+0x166/0x700 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 14:17:09 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 14:17:09 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 14:17:09 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 14:17:09 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 14:17:09 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 14:17:09 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 14:17:09 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 14:17:09 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 14:17:09 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 14:17:09 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 14:17:09 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 14:17:09 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 14:17:09 cmds1 kernel: Mar 15 14:17:09 cmds1 kernel: Pid: 4558, comm: mdt03_029 Mar 15 14:17:09 cmds1 kernel: Mar 15 14:17:09 cmds1 kernel: Call Trace: Mar 15 14:17:09 cmds1 kernel: [] ? _spin_unlock_bh+0x1b/0x20 Mar 15 14:17:09 cmds1 kernel: [] ? ksocknal_queue_tx_locked+0x136/0x530 [ksocklnd] Mar 15 14:17:09 cmds1 kernel: [] ? ksocknal_find_conn_locked+0x159/0x290 [ksocklnd] Mar 15 14:17:09 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 14:17:09 cmds1 kernel: [] cfs_waitq_timedwait+0x11/0x20 [libcfs] Mar 15 14:17:09 cmds1 kernel: [] upcall_cache_get_entry+0x253/0x860 [libcfs] Mar 15 14:17:09 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 14:17:09 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 14:17:09 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 14:17:09 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 14:17:09 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 14:17:09 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 14:17:09 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 14:17:09 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 14:17:09 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 14:17:09 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 14:17:09 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 14:17:09 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 14:17:09 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 14:17:09 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:17:09 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 14:17:09 cmds1 kernel: Mar 15 14:17:09 cmds1 kernel: LNet: Service thread pid 5106 was inactive for 658.76s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 15 14:17:09 cmds1 kernel: LNet: Service thread pid 24531 completed after 592.56s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 14:17:09 cmds1 kernel: Lustre: 6831:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (755:49s); client may timeout. req@ffff881ababad800 x1495260055489144/t0(0) o101->9320a00a-8d9a-c89c-3c69-a54bac6d4be5@10.21.22.27@tcp:0/0 lens 576/536 e 0 to 0 dl 1426428980 ref 1 fl Complete:/0/0 rc 0/0 Mar 15 14:17:11 cmds1 kernel: Lustre: charlie-MDT0000: Client 9320a00a-8d9a-c89c-3c69-a54bac6d4be5 (at 10.21.22.27@tcp) reconnecting Mar 15 14:26:48 cmds1 kernel: LNet: Service thread pid 42216 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 14:26:48 cmds1 kernel: LNet: Skipped 4 previous similar messages Mar 15 14:26:48 cmds1 kernel: Pid: 42216, comm: mdt_rdpg07_016 Mar 15 14:26:48 cmds1 kernel: Mar 15 14:26:48 cmds1 kernel: Call Trace: Mar 15 14:26:48 cmds1 kernel: [] ? mem_cgroup_iter+0xfd/0x280 Mar 15 14:26:48 cmds1 kernel: [] ? shrink_zone+0x63/0xb0 Mar 15 14:26:48 cmds1 kernel: [] ? do_try_to_free_pages+0x115/0x610 Mar 15 14:26:48 cmds1 kernel: [] ? ktime_get_ts+0xb1/0xf0 Mar 15 14:26:48 cmds1 kernel: [] ? try_to_free_pages+0x92/0x120 Mar 15 14:26:48 cmds1 kernel: [] ? __alloc_pages_nodemask+0x478/0x8d0 Mar 15 14:26:48 cmds1 kernel: [] ? kmem_getpages+0x62/0x170 Mar 15 14:26:48 cmds1 kernel: [] ? fallback_alloc+0x1ba/0x270 Mar 15 14:26:48 cmds1 kernel: [] ? cache_grow+0x2cf/0x320 Mar 15 14:26:48 cmds1 kernel: [] ? ____cache_alloc_node+0x99/0x160 Mar 15 14:26:48 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 14:26:48 cmds1 kernel: [] ? __kmalloc+0x189/0x220 Mar 15 14:26:48 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 14:26:48 cmds1 kernel: [] ? ptlrpc_new_bulk+0x48/0x280 [ptlrpc] Mar 15 14:26:48 cmds1 kernel: [] ? ptlrpc_prep_bulk_exp+0x5b/0x180 [ptlrpc] Mar 15 14:26:48 cmds1 kernel: [] ? mdd_dir_page_build+0x0/0x210 [mdd] Mar 15 14:26:48 cmds1 kernel: [] ? mdt_sendpage+0x6b/0x240 [mdt] Mar 15 14:26:48 cmds1 kernel: [] ? mdt_readpage+0x497/0x960 [mdt] Mar 15 14:26:48 cmds1 kernel: [] ? mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 14:26:48 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 14:26:48 cmds1 kernel: [] ? mds_readpage_handle+0x15/0x20 [mdt] Mar 15 14:26:48 cmds1 kernel: [] ? ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 14:26:48 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 14:26:48 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 14:26:48 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 14:26:48 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 14:26:48 cmds1 kernel: [] ? ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 14:26:48 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:26:48 cmds1 kernel: [] ? child_rip+0xa/0x20 Mar 15 14:26:48 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:26:48 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:26:48 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 14:26:48 cmds1 kernel: Mar 15 14:26:48 cmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1426429607.42216 Mar 15 14:37:42 cmds1 kernel: LNet: Service thread pid 42073 completed after 646.52s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 14:37:42 cmds1 kernel: LNet: Skipped 7 previous similar messages Mar 15 14:37:42 cmds1 kernel: LNet: Service thread pid 6738 completed after 646.61s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 14:37:42 cmds1 kernel: LNet: Skipped 3 previous similar messages Mar 15 14:47:03 cmds1 kernel: LNet: Service thread pid 7267 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 14:47:03 cmds1 kernel: Pid: 7267, comm: mdt03_011 Mar 15 14:47:03 cmds1 kernel: Mar 15 14:47:03 cmds1 kernel: Call Trace: Mar 15 14:47:03 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 14:47:03 cmds1 kernel: [] ? __ldiskfs_get_inode_loc+0xf5/0x3b0 [ldiskfs] Mar 15 14:47:03 cmds1 kernel: [] wait_for_common+0x123/0x180 Mar 15 14:47:03 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 14:47:03 cmds1 kernel: [] ? __queue_work+0x41/0x50 Mar 15 14:47:03 cmds1 kernel: [] wait_for_completion+0x1d/0x20 Mar 15 14:47:03 cmds1 kernel: [] call_usermodehelper_exec+0x10c/0x120 Mar 15 14:47:03 cmds1 kernel: [] mdt_identity_do_upcall+0x13d/0x4c0 [mdt] Mar 15 14:47:03 cmds1 kernel: [] upcall_cache_get_entry+0x1b4/0x860 [libcfs] Mar 15 14:47:03 cmds1 kernel: [] ? __req_capsule_get+0x166/0x700 [ptlrpc] Mar 15 14:47:03 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 14:47:03 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 14:47:03 cmds1 kernel: [] mdt_getattr_name+0x98/0x280 [mdt] Mar 15 14:47:03 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 14:47:03 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 14:47:03 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 14:47:03 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 14:47:03 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 14:47:03 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 14:47:03 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 14:47:03 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 14:47:03 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 14:47:03 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:47:03 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 14:47:03 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:47:03 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:47:03 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 14:47:03 cmds1 kernel: Mar 15 14:47:03 cmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1426430823.7267 Mar 15 14:55:04 cmds1 kernel: LNet: Service thread pid 7312 completed after 262.43s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 14:55:04 cmds1 kernel: LNet: Skipped 1 previous similar message Mar 15 14:55:12 cmds1 kernel: LNet: Service thread pid 4566 was inactive for 448.90s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 14:55:12 cmds1 kernel: Pid: 4566, comm: mdt03_034 Mar 15 14:55:12 cmds1 kernel: Mar 15 14:55:12 cmds1 kernel: Call Trace: Mar 15 14:55:12 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 14:55:12 cmds1 kernel: [] ? null_alloc_rs+0x1ab/0x3a0 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] cfs_waitq_timedwait+0x11/0x20 [libcfs] Mar 15 14:55:12 cmds1 kernel: [] upcall_cache_get_entry+0x253/0x860 [libcfs] Mar 15 14:55:12 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 14:55:12 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 14:55:12 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 14:55:12 cmds1 kernel: [] mdt_getattr_name+0x98/0x280 [mdt] Mar 15 14:55:12 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 14:55:12 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 14:55:12 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 14:55:12 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 14:55:12 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 14:55:12 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 14:55:12 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 14:55:12 cmds1 kernel: Mar 15 14:55:12 cmds1 kernel: Pid: 21242, comm: mdt05_022 Mar 15 14:55:12 cmds1 kernel: Mar 15 14:55:12 cmds1 kernel: Call Trace: Mar 15 14:55:12 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 14:55:12 cmds1 kernel: [] ? null_alloc_rs+0x1ab/0x3a0 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] cfs_waitq_timedwait+0x11/0x20 [libcfs] Mar 15 14:55:12 cmds1 kernel: [] upcall_cache_get_entry+0x253/0x860 [libcfs] Mar 15 14:55:12 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 14:55:12 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 14:55:12 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 14:55:12 cmds1 kernel: [] mdt_getattr_name+0x98/0x280 [mdt] Mar 15 14:55:12 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 14:55:12 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 14:55:12 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 14:55:12 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 14:55:12 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 14:55:12 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 14:55:12 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 14:55:12 cmds1 kernel: Mar 15 14:55:12 cmds1 kernel: Pid: 48292, comm: mdt03_040 Mar 15 14:55:12 cmds1 kernel: Mar 15 14:55:12 cmds1 kernel: Call Trace: Mar 15 14:55:12 cmds1 kernel: [] ? _spin_unlock_bh+0x1b/0x20 Mar 15 14:55:12 cmds1 kernel: [] ? ksocknal_queue_tx_locked+0x136/0x530 [ksocklnd] Mar 15 14:55:12 cmds1 kernel: [] ? ksocknal_find_conn_locked+0x159/0x290 [ksocklnd] Mar 15 14:55:12 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 14:55:12 cmds1 kernel: [] cfs_waitq_timedwait+0x11/0x20 [libcfs] Mar 15 14:55:12 cmds1 kernel: [] upcall_cache_get_entry+0x253/0x860 [libcfs] Mar 15 14:55:12 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 14:55:12 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 14:55:12 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 14:55:12 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 14:55:12 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 14:55:12 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 14:55:12 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 14:55:12 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 14:55:12 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 14:55:12 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 14:55:12 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 14:55:12 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 14:55:12 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 14:55:12 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 14:55:12 cmds1 kernel: Mar 15 14:55:12 cmds1 kernel: Pid: 7313, comm: mdt00_011 Mar 15 14:55:12 cmds1 kernel: Mar 15 14:55:12 cmds1 kernel: Call Trace: Mar 15 14:55:12 cmds1 kernel: [] ? _spin_unlock_bh+0x1b/0x20 Mar 15 14:55:12 cmds1 kernel: [] ? ksocknal_queue_tx_locked+0x136/0x530 [ksocklnd] Mar 15 14:55:12 cmds1 kernel: [] ? ksocknal_find_conn_locked+0x159/0x290 [ksocklnd] Mar 15 14:55:12 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 14:55:12 cmds1 kernel: [] cfs_waitq_timedwait+0x11/0x20 [libcfs] Mar 15 14:55:12 cmds1 kernel: [] upcall_cache_get_entry+0x253/0x860 [libcfs] Mar 15 14:55:12 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 14:55:12 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 14:55:12 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 14:55:12 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 14:55:12 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 14:55:12 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 14:55:12 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 14:55:12 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 14:55:12 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 14:55:12 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 14:55:12 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 14:55:12 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 14:55:12 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 14:55:12 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 14:55:12 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 14:55:12 cmds1 kernel: Mar 15 14:55:12 cmds1 kernel: LNet: Service thread pid 4566 completed after 449.06s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 15:03:33 cmds1 kernel: LNet: Service thread pid 42080 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 15:03:33 cmds1 kernel: LNet: Skipped 3 previous similar messages Mar 15 15:03:33 cmds1 kernel: Pid: 42080, comm: mdt_rdpg05_022 Mar 15 15:03:33 cmds1 kernel: Mar 15 15:03:33 cmds1 kernel: Call Trace: Mar 15 15:03:33 cmds1 kernel: [] ? try_to_free_buffers+0x51/0xc0 Mar 15 15:03:33 cmds1 kernel: [] ? jbd2_journal_try_to_free_buffers+0xa7/0x150 [jbd2] Mar 15 15:03:33 cmds1 kernel: [] ? apic_timer_interrupt+0xe/0x20 Mar 15 15:03:33 cmds1 kernel: [] ? bdev_try_to_free_page+0x48/0x90 [ldiskfs] Mar 15 15:03:33 cmds1 kernel: [] ? shrink_page_list.clone.3+0xd0/0x650 Mar 15 15:03:33 cmds1 kernel: [] ? mem_cgroup_lru_del_list+0x2b/0xb0 Mar 15 15:03:33 cmds1 kernel: [] ? apic_timer_interrupt+0xe/0x20 Mar 15 15:03:33 cmds1 kernel: [] ? __pagevec_release+0x26/0x40 Mar 15 15:03:33 cmds1 kernel: [] ? shrink_inactive_list+0xdf/0x830 Mar 15 15:03:33 cmds1 kernel: [] ? shrink_active_list+0x297/0x370 Mar 15 15:03:33 cmds1 kernel: [] ? shrink_mem_cgroup_zone+0x3ae/0x610 Mar 15 15:03:33 cmds1 kernel: [] ? mem_cgroup_iter+0xfd/0x280 Mar 15 15:03:33 cmds1 kernel: [] ? shrink_zone+0x63/0xb0 Mar 15 15:03:33 cmds1 kernel: [] ? do_try_to_free_pages+0x115/0x610 Mar 15 15:03:33 cmds1 kernel: [] ? zone_watermark_ok+0x1f/0x30 Mar 15 15:03:33 cmds1 kernel: [] ? try_to_free_pages+0x92/0x120 Mar 15 15:03:33 cmds1 kernel: [] ? next_zone+0x30/0x40 Mar 15 15:03:33 cmds1 kernel: [] ? __alloc_pages_nodemask+0x478/0x8d0 Mar 15 15:03:33 cmds1 kernel: [] ? kmem_getpages+0x62/0x170 Mar 15 15:03:33 cmds1 kernel: [] ? fallback_alloc+0x1ba/0x270 Mar 15 15:03:33 cmds1 kernel: [] ? cache_grow+0x2cf/0x320 Mar 15 15:03:33 cmds1 kernel: [] ? ____cache_alloc_node+0x99/0x160 Mar 15 15:03:33 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 15:03:33 cmds1 kernel: [] ? __kmalloc+0x189/0x220 Mar 15 15:03:33 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 15:03:33 cmds1 kernel: [] ? ptlrpc_new_bulk+0x48/0x280 [ptlrpc] Mar 15 15:03:33 cmds1 kernel: [] ? ptlrpc_prep_bulk_exp+0x5b/0x180 [ptlrpc] Mar 15 15:03:33 cmds1 kernel: [] ? mdd_dir_page_build+0x0/0x210 [mdd] Mar 15 15:03:33 cmds1 kernel: [] ? mdt_sendpage+0x6b/0x240 [mdt] Mar 15 15:03:33 cmds1 kernel: [] ? mdt_readpage+0x497/0x960 [mdt] Mar 15 15:03:33 cmds1 kernel: [] ? mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 15:03:33 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 15:03:33 cmds1 kernel: [] ? mds_readpage_handle+0x15/0x20 [mdt] Mar 15 15:03:33 cmds1 kernel: [] ? ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 15:03:33 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 15:03:33 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 15:03:33 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 15:03:33 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 15:03:33 cmds1 kernel: [] ? ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 15:03:33 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:03:33 cmds1 kernel: [] ? child_rip+0xa/0x20 Mar 15 15:03:33 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:03:33 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:03:33 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 15:03:33 cmds1 kernel: Mar 15 15:03:33 cmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1426431812.42080 Mar 15 15:06:06 cmds1 kernel: LustreError: 0:0:(ldlm_lockd.c:391:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.21.22.26@tcp ns: mdt-charlie-MDT0000_UUID lock: ffff88546e85e900/0xe1fcacc60ba56e43 lrc: 3/0,0 mode: PR/PR res: [0x2000148f1:0x1dc06:0x0].0 bits 0x2 rrc: 2 type: IBT flags: 0x20 nid: 10.21.22.26@tcp remote: 0x2d9b859d1816a6b8 expref: 12 pid: 6852 timeout: 6249524054 lvb_type: 0 Mar 15 15:10:07 cmds1 kernel: LustreError: 42216:0:(ldlm_lockd.c:391:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.21.22.26@tcp ns: mdt-charlie-MDT0000_UUID lock: ffff8809fb846000/0xe1fcacc60fefc46e lrc: 3/0,0 mode: PR/PR res: [0x200014306:0x28d4:0x0].0 bits 0x2 rrc: 2 type: IBT flags: 0x20 nid: 10.21.22.26@tcp remote: 0x2d9b859d184183af expref: 12 pid: 7330 timeout: 6249765433 lvb_type: 0 Mar 15 15:12:42 cmds1 kernel: Lustre: 42242:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Mar 15 15:12:42 cmds1 kernel: req@ffff885fc1159850 x1495256935745404/t0(0) o37->513ddad8-648c-4e1d-4def-6b4a17dbd93c@10.21.22.26@tcp:0/0 lens 448/440 e 0 to 0 dl 1426432367 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 15:13:54 cmds1 kernel: Lustre: 5118:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Mar 15 15:13:54 cmds1 kernel: req@ffff8855f6a71800 x1495256936997796/t0(0) o101->513ddad8-648c-4e1d-4def-6b4a17dbd93c@10.21.22.26@tcp:0/0 lens 576/3448 e 0 to 0 dl 1426432439 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 15:14:49 cmds1 kernel: Lustre: 4570:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-23), not sending early reply Mar 15 15:14:49 cmds1 kernel: req@ffff8802ff3ff400 x1495261918611336/t0(0) o101->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 376/472 e 3 to 0 dl 1426432494 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 15:15:26 cmds1 kernel: Lustre: 7350:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-23), not sending early reply Mar 15 15:15:26 cmds1 kernel: req@ffff884251d7f400 x1495261918705084/t0(0) o101->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 576/3448 e 3 to 0 dl 1426432531 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 15:16:22 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) reconnecting Mar 15 15:16:22 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) refused reconnection, still busy with 20 active RPCs Mar 15 15:16:35 cmds1 kernel: Lustre: 42084:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-383), not sending early reply Mar 15 15:16:35 cmds1 kernel: req@ffff885f8cf1d400 x1495261912892008/t0(0) o37->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 448/440 e 4 to 0 dl 1426432600 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 15:16:47 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) reconnecting Mar 15 15:16:47 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) refused reconnection, still busy with 20 active RPCs Mar 15 15:16:55 cmds1 kernel: Lustre: 42215:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Mar 15 15:16:55 cmds1 kernel: req@ffff88321b881400 x1495256940840920/t0(0) o37->513ddad8-648c-4e1d-4def-6b4a17dbd93c@10.21.22.26@tcp:0/0 lens 448/440 e 0 to 0 dl 1426432620 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 15:16:55 cmds1 kernel: Lustre: 42215:0:(service.c:1339:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Mar 15 15:17:12 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) reconnecting Mar 15 15:17:12 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) refused reconnection, still busy with 20 active RPCs Mar 15 15:17:37 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) reconnecting Mar 15 15:17:37 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) refused reconnection, still busy with 20 active RPCs Mar 15 15:18:02 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) reconnecting Mar 15 15:18:02 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) refused reconnection, still busy with 20 active RPCs Mar 15 15:18:07 cmds1 kernel: LNet: Service thread pid 7330 was inactive for 479.98s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 15:18:07 cmds1 kernel: Pid: 7330, comm: mdt07_029 Mar 15 15:18:07 cmds1 kernel: Mar 15 15:18:07 cmds1 kernel: Call Trace: Mar 15 15:18:07 cmds1 kernel: [] ? _spin_unlock_bh+0x1b/0x20 Mar 15 15:18:07 cmds1 kernel: [] ? ksocknal_queue_tx_locked+0x136/0x530 [ksocklnd] Mar 15 15:18:07 cmds1 kernel: [] ? ksocknal_find_conn_locked+0x159/0x290 [ksocklnd] Mar 15 15:18:07 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 15:18:07 cmds1 kernel: [] cfs_waitq_timedwait+0x11/0x20 [libcfs] Mar 15 15:18:07 cmds1 kernel: [] upcall_cache_get_entry+0x253/0x860 [libcfs] Mar 15 15:18:07 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 15:18:07 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 15:18:07 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 15:18:07 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 15:18:07 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 15:18:07 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 15:18:07 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 15:18:07 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 15:18:07 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 15:18:07 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 15:18:07 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 15:18:07 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 15:18:07 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 15:18:07 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 15:18:07 cmds1 kernel: Mar 15 15:18:07 cmds1 kernel: Pid: 21268, comm: mdt05_031 Mar 15 15:18:07 cmds1 kernel: Mar 15 15:18:07 cmds1 kernel: Call Trace: Mar 15 15:18:07 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 15:18:07 cmds1 kernel: [] cfs_waitq_timedwait+0x11/0x20 [libcfs] Mar 15 15:18:07 cmds1 kernel: [] upcall_cache_get_entry+0x253/0x860 [libcfs] Mar 15 15:18:07 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 15:18:07 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 15:18:07 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 15:18:07 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 15:18:07 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 15:18:07 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 15:18:07 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 15:18:07 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 15:18:07 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 15:18:07 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 15:18:07 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 15:18:07 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 15:18:07 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 15:18:07 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 15:18:07 cmds1 kernel: Mar 15 15:18:07 cmds1 kernel: Pid: 21576, comm: mdt05_021 Mar 15 15:18:07 cmds1 kernel: Mar 15 15:18:07 cmds1 kernel: Call Trace: Mar 15 15:18:07 cmds1 kernel: [] ? mark_page_accessed+0x41/0x50 Mar 15 15:18:07 cmds1 kernel: [] ? put_dec+0x10c/0x110 Mar 15 15:18:07 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 15:18:07 cmds1 kernel: [] ? __iget+0x66/0x70 Mar 15 15:18:07 cmds1 kernel: [] wait_for_common+0x123/0x180 Mar 15 15:18:07 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 15:18:07 cmds1 kernel: [] ? __queue_work+0x41/0x50 Mar 15 15:18:07 cmds1 kernel: [] wait_for_completion+0x1d/0x20 Mar 15 15:18:07 cmds1 kernel: [] call_usermodehelper_exec+0x10c/0x120 Mar 15 15:18:07 cmds1 kernel: [] mdt_identity_do_upcall+0x13d/0x4c0 [mdt] Mar 15 15:18:07 cmds1 kernel: [] upcall_cache_get_entry+0x1b4/0x860 [libcfs] Mar 15 15:18:07 cmds1 kernel: [] ? __req_capsule_get+0x166/0x700 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 15:18:07 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 15:18:07 cmds1 kernel: [] mdt_getattr_name+0x98/0x280 [mdt] Mar 15 15:18:07 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 15:18:07 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 15:18:07 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 15:18:07 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 15:18:07 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 15:18:07 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 15:18:07 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 15:18:07 cmds1 kernel: Mar 15 15:18:07 cmds1 kernel: Pid: 21242, comm: mdt05_022 Mar 15 15:18:07 cmds1 kernel: Mar 15 15:18:07 cmds1 kernel: Call Trace: Mar 15 15:18:07 cmds1 kernel: [] ? transfer_objects+0x5c/0x80 Mar 15 15:18:07 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 15:18:07 cmds1 kernel: [] cfs_waitq_timedwait+0x11/0x20 [libcfs] Mar 15 15:18:07 cmds1 kernel: [] upcall_cache_get_entry+0x253/0x860 [libcfs] Mar 15 15:18:07 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 15:18:07 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 15:18:07 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 15:18:07 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 15:18:07 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 15:18:07 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 15:18:07 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 15:18:07 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 15:18:07 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 15:18:07 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 15:18:07 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 15:18:07 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 15:18:07 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 15:18:07 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 15:18:07 cmds1 kernel: Mar 15 15:18:07 cmds1 kernel: Pid: 42076, comm: mdt_rdpg05_018 Mar 15 15:18:07 cmds1 kernel: Mar 15 15:18:07 cmds1 kernel: Call Trace: Mar 15 15:18:07 cmds1 kernel: [] ? try_to_free_buffers+0x51/0xc0 Mar 15 15:18:07 cmds1 kernel: [] ? jbd2_journal_try_to_free_buffers+0xa7/0x150 [jbd2] Mar 15 15:18:07 cmds1 kernel: [] ? bdev_try_to_free_page+0x48/0x90 [ldiskfs] Mar 15 15:18:07 cmds1 kernel: [] ? shrink_page_list.clone.3+0xd0/0x650 Mar 15 15:18:07 cmds1 kernel: [] ? mem_cgroup_lru_del_list+0x2b/0xb0 Mar 15 15:18:07 cmds1 kernel: [] ? isolate_lru_pages.clone.0+0xd7/0x170 Mar 15 15:18:07 cmds1 kernel: [] ? apic_timer_interrupt+0xe/0x20 Mar 15 15:18:07 cmds1 kernel: [] ? __pagevec_release+0x26/0x40 Mar 15 15:18:07 cmds1 kernel: [] ? shrink_inactive_list+0x726/0x830 Mar 15 15:18:07 cmds1 kernel: [] ? mem_cgroup_lru_del_list+0x2b/0xb0 Mar 15 15:18:07 cmds1 kernel: [] ? mem_cgroup_lru_del+0x39/0x40 Mar 15 15:18:07 cmds1 kernel: [] ? shrink_active_list+0x1e1/0x370 Mar 15 15:18:07 cmds1 kernel: [] ? shrink_mem_cgroup_zone+0x3f5/0x610 Mar 15 15:18:07 cmds1 kernel: [] ? mem_cgroup_iter+0xfd/0x280 Mar 15 15:18:07 cmds1 kernel: [] ? shrink_zone+0x63/0xb0 Mar 15 15:18:07 cmds1 kernel: [] ? do_try_to_free_pages+0x115/0x610 Mar 15 15:18:07 cmds1 kernel: [] ? zone_watermark_ok+0x1f/0x30 Mar 15 15:18:07 cmds1 kernel: [] ? try_to_free_pages+0x92/0x120 Mar 15 15:18:07 cmds1 kernel: [] ? next_zone+0x30/0x40 Mar 15 15:18:07 cmds1 kernel: [] ? __alloc_pages_nodemask+0x478/0x8d0 Mar 15 15:18:07 cmds1 kernel: [] ? kmem_getpages+0x62/0x170 Mar 15 15:18:07 cmds1 kernel: [] ? fallback_alloc+0x1ba/0x270 Mar 15 15:18:07 cmds1 kernel: [] ? cache_grow+0x2cf/0x320 Mar 15 15:18:07 cmds1 kernel: [] ? ____cache_alloc_node+0x99/0x160 Mar 15 15:18:07 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 15:18:07 cmds1 kernel: [] ? __kmalloc+0x189/0x220 Mar 15 15:18:07 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 15:18:07 cmds1 kernel: [] ? ptlrpc_new_bulk+0x48/0x280 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? ptlrpc_prep_bulk_exp+0x5b/0x180 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? mdd_dir_page_build+0x0/0x210 [mdd] Mar 15 15:18:07 cmds1 kernel: [] ? mdt_sendpage+0x6b/0x240 [mdt] Mar 15 15:18:07 cmds1 kernel: [] ? mdt_readpage+0x497/0x960 [mdt] Mar 15 15:18:07 cmds1 kernel: [] ? mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 15:18:07 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? mds_readpage_handle+0x15/0x20 [mdt] Mar 15 15:18:07 cmds1 kernel: [] ? ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 15:18:07 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 15:18:07 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 15:18:07 cmds1 kernel: [] ? ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? child_rip+0xa/0x20 Mar 15 15:18:07 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:18:07 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 15:18:07 cmds1 kernel: Mar 15 15:18:07 cmds1 kernel: LNet: Service thread pid 42075 was inactive for 821.91s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 15 15:18:07 cmds1 kernel: LNet: Skipped 2 previous similar messages Mar 15 15:18:07 cmds1 kernel: LustreError: 42241:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff885fc1159850 x1495256935745404/t0(0) o37->513ddad8-648c-4e1d-4def-6b4a17dbd93c@10.21.22.26@tcp:0/0 lens 448/440 e 0 to 0 dl 1426432367 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 15:18:07 cmds1 kernel: LustreError: 7821:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff883d9de2a000 x1495256940840708/t0(0) o37->513ddad8-648c-4e1d-4def-6b4a17dbd93c@10.21.22.26@tcp:0/0 lens 448/440 e 0 to 0 dl 1426432620 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 15:18:07 cmds1 kernel: Lustre: 42241:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (755:320s); client may timeout. req@ffff885fc1159850 x1495256935745404/t0(0) o37->513ddad8-648c-4e1d-4def-6b4a17dbd93c@10.21.22.26@tcp:0/0 lens 448/408 e 0 to 0 dl 1426432367 ref 1 fl Complete:/0/0 rc -107/-107 Mar 15 15:18:07 cmds1 kernel: Lustre: 42241:0:(service.c:2031:ptlrpc_server_handle_request()) Skipped 1 previous similar message Mar 15 15:18:07 cmds1 kernel: LNet: Service thread pid 42076 completed after 821.99s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 15:18:07 cmds1 kernel: LNet: Skipped 4 previous similar messages Mar 15 15:18:08 cmds1 kernel: LNet: Service thread pid 6843 completed after 822.11s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 15:18:08 cmds1 kernel: LNet: Skipped 3 previous similar messages Mar 15 15:18:08 cmds1 kernel: Lustre: 42073:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (988:87s); client may timeout. req@ffff885fc1024050 x1495261912892744/t0(0) o37->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 448/408 e 4 to 0 dl 1426432601 ref 1 fl Complete:/0/0 rc -107/-107 Mar 15 15:18:08 cmds1 kernel: Lustre: 42073:0:(service.c:2031:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Mar 15 15:18:08 cmds1 kernel: LustreError: 6849:0:(ldlm_lockd.c:1376:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8855f6a51000 ns: mdt-charlie-MDT0000_UUID lock: ffff883f578796c0/0xe1fcacc618f67662 lrc: 3/0,0 mode: PR/PR res: [0x200000007:0x1:0x0].0 bits 0x13 rrc: 16 type: IBT flags: 0x200000000000 nid: 10.21.22.26@tcp remote: 0x2d9b859d1816a832 expref: 2 pid: 6849 timeout: 0 lvb_type: 0 Mar 15 15:18:08 cmds1 kernel: Lustre: 6849:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (282:439s); client may timeout. req@ffff88321b9b5400 x1495256940855048/t0(0) o101->513ddad8-648c-4e1d-4def-6b4a17dbd93c@10.21.22.26@tcp:0/0 lens 576/536 e 1 to 0 dl 1426432249 ref 1 fl Complete:/0/0 rc -107/-107 Mar 15 15:18:08 cmds1 kernel: Lustre: 6849:0:(service.c:2031:ptlrpc_server_handle_request()) Skipped 5 previous similar messages Mar 15 15:18:08 cmds1 kernel: LNet: Service thread pid 6849 completed after 721.60s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 15:18:08 cmds1 kernel: LNet: Skipped 9 previous similar messages Mar 15 15:18:08 cmds1 kernel: LustreError: 7327:0:(ldlm_lockd.c:1376:ldlm_handle_enqueue0()) ### lock on destroyed export ffff885fdf944c00 ns: mdt-charlie-MDT0000_UUID lock: ffff885fb63496c0/0xe1fcacc618f67aec lrc: 3/0,0 mode: PR/PR res: [0x20000d04d:0xbb27:0x0].0 bits 0x13 rrc: 1 type: IBT flags: 0x200000000000 nid: 10.21.22.26@tcp remote: 0x2d9b859d17e245d6 expref: 2 pid: 7327 timeout: 0 lvb_type: 0 Mar 15 15:18:08 cmds1 rshd[36788]: connect second port 1016: Connection refused Mar 15 15:18:08 cmds1 rshd[36791]: connect second port 1014: Connection refused Mar 15 15:18:27 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) reconnecting Mar 15 15:19:59 cmds1 kernel: LustreError: 0:0:(ldlm_lockd.c:391:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.21.22.26@tcp ns: mdt-charlie-MDT0000_UUID lock: ffff883dec9dc240/0xe1fcacc619339496 lrc: 3/0,0 mode: PR/PR res: [0x2000148e5:0x7415:0x0].0 bits 0x2 rrc: 2 type: IBT flags: 0x20 nid: 10.21.22.26@tcp remote: 0x2d9b859d18b3918f expref: 8 pid: 7324 timeout: 6250357786 lvb_type: 0 Mar 15 15:21:53 cmds1 kernel: LustreError: 7821:0:(ldlm_lockd.c:391:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.21.22.26@tcp ns: mdt-charlie-MDT0000_UUID lock: ffff881b2dc16900/0xe1fcacc61af74fe5 lrc: 3/0,0 mode: PR/PR res: [0x2000148e5:0x84c6:0x0].0 bits 0x2 rrc: 2 type: IBT flags: 0x20 nid: 10.21.22.26@tcp remote: 0x2d9b859d18b76dae expref: 8 pid: 7322 timeout: 6250471851 lvb_type: 0 Mar 15 15:26:02 cmds1 kernel: LNet: Service thread pid 7347 was inactive for 302.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 15:26:02 cmds1 kernel: LNet: Skipped 4 previous similar messages Mar 15 15:26:02 cmds1 kernel: Pid: 7347, comm: mdt03_014 Mar 15 15:26:02 cmds1 kernel: Mar 15 15:26:02 cmds1 kernel: Call Trace: Mar 15 15:26:02 cmds1 kernel: [] ? wake_up_bit+0x2f/0x40 Mar 15 15:26:02 cmds1 kernel: [] ? put_dec+0x10c/0x110 Mar 15 15:26:02 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 15:26:02 cmds1 kernel: [] wait_for_common+0x123/0x180 Mar 15 15:26:02 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 15:26:02 cmds1 kernel: [] ? __queue_work+0x41/0x50 Mar 15 15:26:02 cmds1 kernel: [] wait_for_completion+0x1d/0x20 Mar 15 15:26:02 cmds1 kernel: [] call_usermodehelper_exec+0x10c/0x120 Mar 15 15:26:02 cmds1 kernel: [] mdt_identity_do_upcall+0x13d/0x4c0 [mdt] Mar 15 15:26:02 cmds1 kernel: [] upcall_cache_get_entry+0x1b4/0x860 [libcfs] Mar 15 15:26:02 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 15:26:02 cmds1 kernel: [] ? __req_capsule_get+0x166/0x700 [ptlrpc] Mar 15 15:26:02 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 15:26:02 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 15:26:02 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 15:26:02 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 15:26:02 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 15:26:02 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 15:26:02 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 15:26:02 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 15:26:02 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 15:26:02 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 15:26:02 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 15:26:02 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 15:26:02 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 15:26:02 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 15:26:02 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 15:26:02 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 15:26:02 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 15:26:02 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:26:02 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 15:26:02 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:26:02 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:26:02 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 15:26:02 cmds1 kernel: Mar 15 15:26:02 cmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1426433162.7347 Mar 15 15:29:03 cmds1 kernel: Lustre: 42078:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-36), not sending early reply Mar 15 15:29:03 cmds1 kernel: req@ffff885fc1027850 x1495261918820204/t0(0) o37->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 448/440 e 0 to 0 dl 1426433348 ref 2 fl Interpret:/2/0 rc 0/0 Mar 15 15:29:03 cmds1 kernel: Lustre: 42078:0:(service.c:1339:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Mar 15 15:29:09 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) reconnecting Mar 15 15:29:09 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) refused reconnection, still busy with 12 active RPCs Mar 15 15:29:34 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) reconnecting Mar 15 15:29:34 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) refused reconnection, still busy with 12 active RPCs Mar 15 15:29:59 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) reconnecting Mar 15 15:29:59 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) refused reconnection, still busy with 12 active RPCs Mar 15 15:30:24 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) reconnecting Mar 15 15:30:24 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) refused reconnection, still busy with 12 active RPCs Mar 15 15:30:34 cmds1 kernel: Lustre: 6890:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-126), not sending early reply Mar 15 15:30:34 cmds1 kernel: req@ffff882de9e6dc00 x1489201209811540/t0(0) o37->b19b9b9b-c5dd-6273-0815-b736f2e7ffdc@10.21.22.28@tcp:0/0 lens 448/440 e 0 to 0 dl 1426433439 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 15:30:34 cmds1 kernel: Lustre: 6890:0:(service.c:1339:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Mar 15 15:30:40 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 15 15:30:40 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 3 active RPCs Mar 15 15:30:49 cmds1 kernel: Lustre: 42241:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Mar 15 15:30:49 cmds1 kernel: req@ffff880feb083400 x1495256952596800/t0(0) o37->513ddad8-648c-4e1d-4def-6b4a17dbd93c@10.21.22.26@tcp:0/0 lens 448/440 e 0 to 0 dl 1426433454 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 15:30:49 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) refused reconnection, still busy with 12 active RPCs Mar 15 15:30:57 cmds1 kernel: Lustre: 38218:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Mar 15 15:30:57 cmds1 kernel: req@ffff88445ebabc00 x1495261918819532/t0(0) o34->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 456/3456 e 0 to 0 dl 1426433462 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 15:30:57 cmds1 kernel: Lustre: 38218:0:(service.c:1339:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Mar 15 15:31:05 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 15 15:31:05 cmds1 kernel: Lustre: Skipped 1 previous similar message Mar 15 15:31:14 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) refused reconnection, still busy with 12 active RPCs Mar 15 15:31:14 cmds1 kernel: Lustre: Skipped 1 previous similar message Mar 15 15:31:39 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) reconnecting Mar 15 15:31:39 cmds1 kernel: Lustre: Skipped 2 previous similar messages Mar 15 15:31:55 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 3 active RPCs Mar 15 15:31:55 cmds1 kernel: Lustre: Skipped 2 previous similar messages Mar 15 15:31:57 cmds1 kernel: Lustre: 7333:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Mar 15 15:31:57 cmds1 kernel: req@ffff885fb111f000 x1495261918972564/t0(0) o34->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 456/3456 e 0 to 0 dl 1426433522 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 15:31:57 cmds1 kernel: Lustre: 7333:0:(service.c:1339:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Mar 15 15:32:43 cmds1 kernel: Lustre: 42241:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Mar 15 15:32:43 cmds1 kernel: req@ffff885fc10d5050 x1495256952898384/t0(0) o37->513ddad8-648c-4e1d-4def-6b4a17dbd93c@10.21.22.26@tcp:0/0 lens 448/440 e 0 to 0 dl 1426433568 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 15:32:45 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 15 15:32:45 cmds1 kernel: Lustre: Skipped 4 previous similar messages Mar 15 15:33:10 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 3 active RPCs Mar 15 15:33:10 cmds1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 15:33:46 cmds1 kernel: Lustre: 29752:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-1), not sending early reply Mar 15 15:33:46 cmds1 kernel: req@ffff885453185000 x1495256952965240/t0(0) o101->513ddad8-648c-4e1d-4def-6b4a17dbd93c@10.21.22.26@tcp:0/0 lens 576/3448 e 0 to 0 dl 1426433631 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 15:33:46 cmds1 kernel: Lustre: 29752:0:(service.c:1339:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Mar 15 15:34:59 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) reconnecting Mar 15 15:34:59 cmds1 kernel: Lustre: Skipped 13 previous similar messages Mar 15 15:35:24 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) refused reconnection, still busy with 12 active RPCs Mar 15 15:35:24 cmds1 kernel: Lustre: Skipped 14 previous similar messages Mar 15 15:35:47 cmds1 kernel: Lustre: 8784:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-227), not sending early reply Mar 15 15:35:47 cmds1 kernel: req@ffff882fe4db2850 x1495260204302788/t0(0) o101->ec1c3acf-010a-7725-01ec-c89056ebc0e4@10.21.22.27@tcp:0/0 lens 576/3448 e 2 to 0 dl 1426433752 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 15:35:47 cmds1 kernel: Lustre: 8784:0:(service.c:1339:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Mar 15 15:37:24 cmds1 kernel: LustreError: 42075:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff885fc179a050 x1495261918854356/t0(0) o37->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 448/440 e 0 to 0 dl 1426433463 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 15:37:24 cmds1 kernel: LustreError: 42075:0:(ldlm_lib.c:2730:target_bulk_io()) Skipped 12 previous similar messages Mar 15 15:37:24 cmds1 kernel: Lustre: 42075:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (755:381s); client may timeout. req@ffff885fc179a050 x1495261918854356/t0(0) o37->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 448/408 e 0 to 0 dl 1426433463 ref 1 fl Complete:/0/0 rc -107/-107 Mar 15 15:37:24 cmds1 kernel: Lustre: 42075:0:(service.c:2031:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Mar 15 15:37:24 cmds1 kernel: LustreError: 7355:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff885fc1027850 x1495261918820204/t0(0) o37->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 448/440 e 0 to 0 dl 1426433348 ref 1 fl Interpret:/2/0 rc 0/0 Mar 15 15:37:24 cmds1 kernel: LustreError: 7355:0:(ldlm_lib.c:2730:target_bulk_io()) Skipped 3 previous similar messages Mar 15 15:37:25 cmds1 kernel: Lustre: 41550:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (755:391s); client may timeout. req@ffff8812cfb72400 x1495256952596488/t0(0) o37->513ddad8-648c-4e1d-4def-6b4a17dbd93c@10.21.22.26@tcp:0/0 lens 448/408 e 0 to 0 dl 1426433454 ref 1 fl Complete:/0/0 rc -107/-107 Mar 15 15:37:25 cmds1 kernel: Lustre: 41550:0:(service.c:2031:ptlrpc_server_handle_request()) Skipped 8 previous similar messages Mar 15 15:37:25 cmds1 kernel: LNet: Service thread pid 7358 was inactive for 624.80s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 15:37:25 cmds1 kernel: Pid: 7358, comm: mdt03_018 Mar 15 15:37:25 cmds1 kernel: Mar 15 15:37:25 cmds1 kernel: Call Trace: Mar 15 15:37:25 cmds1 kernel: [] ? _spin_unlock_bh+0x1b/0x20 Mar 15 15:37:25 cmds1 kernel: [] ? ksocknal_queue_tx_locked+0x136/0x530 [ksocklnd] Mar 15 15:37:25 cmds1 kernel: [] ? ksocknal_find_conn_locked+0x159/0x290 [ksocklnd] Mar 15 15:37:25 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 15:37:25 cmds1 kernel: [] cfs_waitq_timedwait+0x11/0x20 [libcfs] Mar 15 15:37:25 cmds1 kernel: [] upcall_cache_get_entry+0x253/0x860 [libcfs] Mar 15 15:37:25 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 15:37:25 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 15:37:25 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 15:37:25 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 15:37:25 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 15:37:25 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 15:37:25 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 15:37:25 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 15:37:25 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 15:37:25 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 15:37:25 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 15:37:25 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 15:37:25 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 15:37:25 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 15:37:25 cmds1 kernel: Mar 15 15:37:25 cmds1 kernel: Pid: 6829, comm: mdt03_007 Mar 15 15:37:25 cmds1 kernel: Mar 15 15:37:25 cmds1 kernel: Call Trace: Mar 15 15:37:25 cmds1 kernel: [] ? _spin_unlock_bh+0x1b/0x20 Mar 15 15:37:25 cmds1 kernel: [] ? ksocknal_queue_tx_locked+0x136/0x530 [ksocklnd] Mar 15 15:37:25 cmds1 kernel: [] ? ksocknal_find_conn_locked+0x159/0x290 [ksocklnd] Mar 15 15:37:25 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 15:37:25 cmds1 kernel: [] cfs_waitq_timedwait+0x11/0x20 [libcfs] Mar 15 15:37:25 cmds1 kernel: [] upcall_cache_get_entry+0x253/0x860 [libcfs] Mar 15 15:37:25 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 15:37:25 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 15:37:25 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 15:37:25 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 15:37:25 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 15:37:25 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 15:37:25 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 15:37:25 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 15:37:25 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 15:37:25 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 15:37:25 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 15:37:25 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 15:37:25 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 15:37:25 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 15:37:25 cmds1 kernel: Mar 15 15:37:25 cmds1 kernel: Pid: 4558, comm: mdt03_029 Mar 15 15:37:25 cmds1 kernel: Mar 15 15:37:25 cmds1 kernel: Call Trace: Mar 15 15:37:25 cmds1 kernel: [] ? _spin_unlock_bh+0x1b/0x20 Mar 15 15:37:25 cmds1 kernel: [] ? ksocknal_queue_tx_locked+0x136/0x530 [ksocklnd] Mar 15 15:37:25 cmds1 kernel: [] ? ksocknal_find_conn_locked+0x159/0x290 [ksocklnd] Mar 15 15:37:25 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 15:37:25 cmds1 kernel: [] cfs_waitq_timedwait+0x11/0x20 [libcfs] Mar 15 15:37:25 cmds1 kernel: [] upcall_cache_get_entry+0x253/0x860 [libcfs] Mar 15 15:37:25 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 15:37:25 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 15:37:25 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 15:37:25 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 15:37:25 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 15:37:25 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 15:37:25 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 15:37:25 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 15:37:25 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 15:37:25 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 15:37:25 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 15:37:25 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 15:37:25 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 15:37:25 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 15:37:25 cmds1 kernel: Mar 15 15:37:25 cmds1 kernel: Pid: 5106, comm: mdt03_001 Mar 15 15:37:25 cmds1 kernel: Mar 15 15:37:25 cmds1 kernel: Call Trace: Mar 15 15:37:25 cmds1 kernel: [] ? _spin_unlock_bh+0x1b/0x20 Mar 15 15:37:25 cmds1 kernel: [] ? ksocknal_queue_tx_locked+0x136/0x530 [ksocklnd] Mar 15 15:37:25 cmds1 kernel: [] ? ksocknal_find_conn_locked+0x159/0x290 [ksocklnd] Mar 15 15:37:25 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 15:37:25 cmds1 kernel: [] cfs_waitq_timedwait+0x11/0x20 [libcfs] Mar 15 15:37:25 cmds1 kernel: [] upcall_cache_get_entry+0x253/0x860 [libcfs] Mar 15 15:37:25 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 15:37:25 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 15:37:25 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 15:37:25 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 15:37:25 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 15:37:25 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 15:37:25 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 15:37:25 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 15:37:25 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 15:37:25 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 15:37:25 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 15:37:25 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 15:37:25 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 15:37:25 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 15:37:25 cmds1 kernel: Mar 15 15:37:25 cmds1 kernel: Pid: 47511, comm: mdt03_038 Mar 15 15:37:25 cmds1 kernel: Mar 15 15:37:25 cmds1 kernel: Call Trace: Mar 15 15:37:25 cmds1 kernel: [] ? _spin_unlock_bh+0x1b/0x20 Mar 15 15:37:25 cmds1 kernel: [] ? ksocknal_queue_tx_locked+0x136/0x530 [ksocklnd] Mar 15 15:37:25 cmds1 kernel: [] ? ksocknal_find_conn_locked+0x159/0x290 [ksocklnd] Mar 15 15:37:25 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 15:37:25 cmds1 kernel: [] cfs_waitq_timedwait+0x11/0x20 [libcfs] Mar 15 15:37:25 cmds1 kernel: [] upcall_cache_get_entry+0x253/0x860 [libcfs] Mar 15 15:37:25 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 15:37:25 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 15:37:25 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 15:37:25 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 15:37:25 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 15:37:25 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 15:37:25 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 15:37:25 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 15:37:25 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 15:37:25 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 15:37:25 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 15:37:25 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 15:37:25 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 15:37:25 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:37:25 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 15:37:25 cmds1 kernel: Mar 15 15:37:25 cmds1 kernel: LNet: Service thread pid 4553 was inactive for 804.80s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 15 15:37:25 cmds1 kernel: LNet: Skipped 14 previous similar messages Mar 15 15:37:25 cmds1 kernel: LNet: Service thread pid 4556 completed after 805.23s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 15:37:25 cmds1 kernel: LNet: Skipped 6 previous similar messages Mar 15 15:37:25 cmds1 kernel: Lustre: 17051:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (606:214s); client may timeout. req@ffff885453185000 x1495256952965240/t0(0) o101->513ddad8-648c-4e1d-4def-6b4a17dbd93c@10.21.22.26@tcp:0/0 lens 576/536 e 0 to 0 dl 1426433631 ref 1 fl Complete:/0/0 rc 0/0 Mar 15 15:37:25 cmds1 kernel: Lustre: 17051:0:(service.c:2031:ptlrpc_server_handle_request()) Skipped 13 previous similar messages Mar 15 15:37:25 cmds1 kernel: LNet: Service thread pid 48292 completed after 820.06s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 15:37:25 cmds1 kernel: LNet: Skipped 9 previous similar messages Mar 15 15:50:33 cmds1 kernel: LNet: Service thread pid 6756 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 15:50:33 cmds1 kernel: LNet: Skipped 4 previous similar messages Mar 15 15:50:33 cmds1 kernel: Pid: 6756, comm: mdt00_003 Mar 15 15:50:33 cmds1 kernel: Mar 15 15:50:33 cmds1 kernel: Call Trace: Mar 15 15:50:33 cmds1 kernel: [] ? shrink_inactive_list+0x343/0x830 Mar 15 15:50:33 cmds1 kernel: [] ? shrink_active_list+0x297/0x370 Mar 15 15:50:33 cmds1 kernel: [] shrink_mem_cgroup_zone+0x3ae/0x610 Mar 15 15:50:33 cmds1 kernel: [] ? mem_cgroup_iter+0xfd/0x280 Mar 15 15:50:33 cmds1 kernel: [] shrink_zone+0x63/0xb0 Mar 15 15:50:33 cmds1 kernel: [] do_try_to_free_pages+0x115/0x610 Mar 15 15:50:33 cmds1 kernel: [] ? zone_watermark_ok+0x1f/0x30 Mar 15 15:50:33 cmds1 kernel: [] try_to_free_pages+0x92/0x120 Mar 15 15:50:33 cmds1 kernel: [] __alloc_pages_nodemask+0x478/0x8d0 Mar 15 15:50:33 cmds1 kernel: [] kmem_getpages+0x62/0x170 Mar 15 15:50:33 cmds1 kernel: [] fallback_alloc+0x1ba/0x270 Mar 15 15:50:33 cmds1 kernel: [] ? cache_grow+0x2cf/0x320 Mar 15 15:50:33 cmds1 kernel: [] ____cache_alloc_node+0x99/0x160 Mar 15 15:50:33 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 15:50:33 cmds1 kernel: [] __kmalloc+0x189/0x220 Mar 15 15:50:33 cmds1 kernel: [] cfs_alloc+0x30/0x60 [libcfs] Mar 15 15:50:33 cmds1 kernel: [] osd_key_init+0x76/0x670 [osd_ldiskfs] Mar 15 15:50:33 cmds1 kernel: [] keys_fill+0x6f/0x190 [obdclass] Mar 15 15:50:33 cmds1 kernel: [] lu_context_init+0xab/0x260 [obdclass] Mar 15 15:50:33 cmds1 kernel: [] ? mdt_intent_layout+0x2fe/0x630 [mdt] Mar 15 15:50:33 cmds1 kernel: [] lu_env_init+0x1e/0x30 [obdclass] Mar 15 15:50:33 cmds1 kernel: [] mdt_lvbo_fill+0x1ab/0x840 [mdt] Mar 15 15:50:33 cmds1 kernel: [] ? mdt_lvbo_fill+0x0/0x840 [mdt] Mar 15 15:50:33 cmds1 kernel: [] ldlm_handle_enqueue0+0x61d/0x10b0 [ptlrpc] Mar 15 15:50:33 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 15:50:33 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 15:50:33 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 15:50:33 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 15:50:33 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 15:50:33 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 15:50:33 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 15:50:33 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 15:50:33 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 15:50:33 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 15:50:33 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:50:33 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 15:50:33 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:50:33 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:50:33 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 15:50:33 cmds1 kernel: Mar 15 15:50:33 cmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1426434633.6756 Mar 15 15:51:12 cmds1 kernel: Lustre: 24530:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Mar 15 15:51:12 cmds1 kernel: req@ffff884704ed3000 x1495261920295744/t0(0) o101->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 376/472 e 0 to 0 dl 1426434677 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 15:51:12 cmds1 kernel: Lustre: 24530:0:(service.c:1339:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Mar 15 15:51:18 cmds1 kernel: Lustre: charlie-MDT0000: Client 513ddad8-648c-4e1d-4def-6b4a17dbd93c (at 10.21.22.26@tcp) reconnecting Mar 15 15:51:18 cmds1 kernel: Lustre: Skipped 23 previous similar messages Mar 15 15:51:18 cmds1 kernel: Lustre: charlie-MDT0000: Client 513ddad8-648c-4e1d-4def-6b4a17dbd93c (at 10.21.22.26@tcp) refused reconnection, still busy with 2 active RPCs Mar 15 15:51:18 cmds1 kernel: Lustre: Skipped 16 previous similar messages Mar 15 15:52:08 cmds1 kernel: Lustre: charlie-MDT0000: Client 513ddad8-648c-4e1d-4def-6b4a17dbd93c (at 10.21.22.26@tcp) reconnecting Mar 15 15:52:08 cmds1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 15:52:08 cmds1 kernel: Lustre: charlie-MDT0000: Client 513ddad8-648c-4e1d-4def-6b4a17dbd93c (at 10.21.22.26@tcp) refused reconnection, still busy with 2 active RPCs Mar 15 15:52:08 cmds1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 15:53:23 cmds1 kernel: Lustre: charlie-MDT0000: Client 513ddad8-648c-4e1d-4def-6b4a17dbd93c (at 10.21.22.26@tcp) reconnecting Mar 15 15:53:23 cmds1 kernel: Lustre: Skipped 8 previous similar messages Mar 15 15:53:23 cmds1 kernel: Lustre: charlie-MDT0000: Client 513ddad8-648c-4e1d-4def-6b4a17dbd93c (at 10.21.22.26@tcp) refused reconnection, still busy with 2 active RPCs Mar 15 15:53:23 cmds1 kernel: Lustre: Skipped 8 previous similar messages Mar 15 15:54:46 cmds1 kernel: Lustre: 6901:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Mar 15 15:54:46 cmds1 kernel: req@ffff885f18fdf400 x1495261920308760/t0(0) o101->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 576/3448 e 0 to 0 dl 1426434891 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 15:54:46 cmds1 kernel: Lustre: 6901:0:(service.c:1339:ptlrpc_at_send_early_reply()) Skipped 8 previous similar messages Mar 15 15:55:46 cmds1 kernel: Lustre: 42317:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Mar 15 15:55:46 cmds1 kernel: req@ffff885bb32fd000 x1495261920311884/t0(0) o101->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 576/3448 e 0 to 0 dl 1426434951 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 15:55:53 cmds1 kernel: Lustre: charlie-MDT0000: Client 513ddad8-648c-4e1d-4def-6b4a17dbd93c (at 10.21.22.26@tcp) reconnecting Mar 15 15:55:53 cmds1 kernel: Lustre: Skipped 17 previous similar messages Mar 15 15:55:53 cmds1 kernel: Lustre: charlie-MDT0000: Client 513ddad8-648c-4e1d-4def-6b4a17dbd93c (at 10.21.22.26@tcp) refused reconnection, still busy with 2 active RPCs Mar 15 15:55:53 cmds1 kernel: Lustre: Skipped 17 previous similar messages Mar 15 15:57:28 cmds1 kernel: LustreError: 42073:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff884649e5d400 x1495261920295756/t0(0) o37->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 448/440 e 0 to 0 dl 1426434677 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 15:57:28 cmds1 kernel: LustreError: 42073:0:(ldlm_lib.c:2730:target_bulk_io()) Skipped 7 previous similar messages Mar 15 15:57:28 cmds1 kernel: Lustre: 42073:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (755:371s); client may timeout. req@ffff884649e5d400 x1495261920295756/t0(0) o37->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 448/408 e 0 to 0 dl 1426434677 ref 1 fl Complete:/0/0 rc -107/-107 Mar 15 15:57:28 cmds1 kernel: Lustre: 42073:0:(service.c:2031:ptlrpc_server_handle_request()) Skipped 5 previous similar messages Mar 15 15:57:28 cmds1 kernel: LustreError: 42075:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff8846e6ffc400 x1495261920295292/t0(0) o37->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 448/440 e 0 to 0 dl 1426434677 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 15:57:28 cmds1 kernel: LustreError: 42075:0:(ldlm_lib.c:2730:target_bulk_io()) Skipped 2 previous similar messages Mar 15 15:57:28 cmds1 kernel: LNet: Service thread pid 6756 completed after 615.50s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 15:57:28 cmds1 kernel: LNet: Skipped 1 previous similar message Mar 15 15:57:28 cmds1 kernel: LNet: Service thread pid 38218 was inactive for 492.32s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 15:57:28 cmds1 kernel: Pid: 38218, comm: mdt05_020 Mar 15 15:57:28 cmds1 kernel: Mar 15 15:57:28 cmds1 kernel: Call Trace: Mar 15 15:57:28 cmds1 kernel: [] ? _spin_unlock_bh+0x1b/0x20 Mar 15 15:57:28 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 15:57:28 cmds1 kernel: [] cfs_waitq_timedwait+0x11/0x20 [libcfs] Mar 15 15:57:28 cmds1 kernel: [] upcall_cache_get_entry+0x253/0x860 [libcfs] Mar 15 15:57:28 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 15:57:28 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 15:57:28 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 15:57:28 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 15:57:28 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 15:57:28 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 15:57:28 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 15:57:28 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 15:57:28 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 15:57:28 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 15:57:28 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 15:57:28 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 15:57:28 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 15:57:28 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 15:57:28 cmds1 kernel: Mar 15 15:57:28 cmds1 kernel: Pid: 4559, comm: mdt03_030 Mar 15 15:57:28 cmds1 kernel: Mar 15 15:57:28 cmds1 kernel: Call Trace: Mar 15 15:57:28 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 15:57:28 cmds1 kernel: [] ? null_alloc_rs+0x1ab/0x3a0 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] cfs_waitq_timedwait+0x11/0x20 [libcfs] Mar 15 15:57:28 cmds1 kernel: [] upcall_cache_get_entry+0x253/0x860 [libcfs] Mar 15 15:57:28 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 15:57:28 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 15:57:28 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 15:57:28 cmds1 kernel: [] mdt_getattr_name+0x98/0x280 [mdt] Mar 15 15:57:28 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 15:57:28 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 15:57:28 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 15:57:28 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 15:57:28 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 15:57:28 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 15:57:28 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 15:57:28 cmds1 kernel: Mar 15 15:57:28 cmds1 kernel: Pid: 4556, comm: mdt03_027 Mar 15 15:57:28 cmds1 kernel: Mar 15 15:57:28 cmds1 kernel: Call Trace: Mar 15 15:57:28 cmds1 kernel: [] ? mark_page_accessed+0x41/0x50 Mar 15 15:57:28 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 15:57:28 cmds1 kernel: [] ? __iget+0x66/0x70 Mar 15 15:57:28 cmds1 kernel: [] wait_for_common+0x123/0x180 Mar 15 15:57:28 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 15:57:28 cmds1 kernel: [] ? __queue_work+0x41/0x50 Mar 15 15:57:28 cmds1 kernel: [] wait_for_completion+0x1d/0x20 Mar 15 15:57:28 cmds1 kernel: [] call_usermodehelper_exec+0x10c/0x120 Mar 15 15:57:28 cmds1 kernel: [] mdt_identity_do_upcall+0x13d/0x4c0 [mdt] Mar 15 15:57:28 cmds1 kernel: [] upcall_cache_get_entry+0x1b4/0x860 [libcfs] Mar 15 15:57:28 cmds1 kernel: [] ? __req_capsule_get+0x166/0x700 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 15:57:28 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 15:57:28 cmds1 kernel: [] mdt_getattr_name+0x98/0x280 [mdt] Mar 15 15:57:28 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 15:57:28 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 15:57:28 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 15:57:28 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 15:57:28 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 15:57:28 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 15:57:28 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 15:57:28 cmds1 kernel: Mar 15 15:57:28 cmds1 kernel: Pid: 5113, comm: mdt05_002 Mar 15 15:57:28 cmds1 kernel: Mar 15 15:57:28 cmds1 kernel: Call Trace: Mar 15 15:57:28 cmds1 kernel: [] ? _spin_unlock_bh+0x1b/0x20 Mar 15 15:57:28 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 15:57:28 cmds1 kernel: [] cfs_waitq_timedwait+0x11/0x20 [libcfs] Mar 15 15:57:28 cmds1 kernel: [] upcall_cache_get_entry+0x253/0x860 [libcfs] Mar 15 15:57:28 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 15:57:28 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 15:57:28 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 15:57:28 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 15:57:28 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 15:57:28 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 15:57:28 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 15:57:28 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 15:57:28 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 15:57:28 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 15:57:28 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 15:57:28 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 15:57:28 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 15:57:28 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 15:57:28 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 15:57:28 cmds1 kernel: Mar 15 15:57:29 cmds1 kernel: Lustre: 42242:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (755:372s); client may timeout. req@ffff8832dc7db000 x1495256953476392/t0(0) o37->513ddad8-648c-4e1d-4def-6b4a17dbd93c@10.21.22.26@tcp:0/0 lens 448/408 e 0 to 0 dl 1426434677 ref 1 fl Complete:/0/0 rc -107/-107 Mar 15 15:57:29 cmds1 kernel: Lustre: 42242:0:(service.c:2031:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Mar 15 15:57:29 cmds1 kernel: LNet: Service thread pid 38218 completed after 492.58s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 15:57:38 cmds1 kernel: LustreError: 42080:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff885fbfb0a800 x1495261920296284/t0(0) o37->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 448/440 e 0 to 0 dl 1426434677 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 15:57:38 cmds1 kernel: LustreError: 42080:0:(ldlm_lib.c:2730:target_bulk_io()) Skipped 3 previous similar messages Mar 15 15:57:38 cmds1 kernel: Lustre: 42080:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (755:381s); client may timeout. req@ffff885fbfb0a800 x1495261920296284/t0(0) o37->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 448/408 e 0 to 0 dl 1426434677 ref 1 fl Complete:/0/0 rc -107/-107 Mar 15 15:57:38 cmds1 kernel: Lustre: 42080:0:(service.c:2031:ptlrpc_server_handle_request()) Skipped 5 previous similar messages Mar 15 16:00:50 cmds1 kernel: LustreError: 0:0:(ldlm_lockd.c:391:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.21.22.26@tcp ns: mdt-charlie-MDT0000_UUID lock: ffff881895cd7000/0xe1fcacc6317b178b lrc: 3/0,0 mode: PR/PR res: [0x2000148e5:0x8b11:0x0].0 bits 0x2 rrc: 2 type: IBT flags: 0x20 nid: 10.21.22.26@tcp remote: 0x2d9b859d18c283b7 expref: 143 pid: 7319 timeout: 6252808065 lvb_type: 0 Mar 15 16:02:44 cmds1 kernel: LustreError: 42242:0:(ldlm_lockd.c:391:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.21.22.26@tcp ns: mdt-charlie-MDT0000_UUID lock: ffff882dfaef56c0/0xe1fcacc6326bc679 lrc: 3/0,0 mode: PR/PR res: [0x2000148e5:0x8860:0x0].0 bits 0x2 rrc: 2 type: IBT flags: 0x20 nid: 10.21.22.26@tcp remote: 0x2d9b859d18c6456c expref: 8 pid: 6848 timeout: 6252922368 lvb_type: 0 Mar 15 16:04:02 cmds1 kernel: LustreError: 7370:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff883d71f6a000 x1495256953854112/t0(0) o37->513ddad8-648c-4e1d-4def-6b4a17dbd93c@10.21.22.26@tcp:0/0 lens 448/440 e 0 to 0 dl 1426435904 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 16:05:42 cmds1 kernel: LustreError: 7821:0:(ldlm_lockd.c:391:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.21.22.26@tcp ns: mdt-charlie-MDT0000_UUID lock: ffff882dec8ff000/0xe1fcacc63378366e lrc: 3/0,0 mode: PR/PR res: [0x2000148e5:0x8bee:0x0].0 bits 0x2 rrc: 2 type: IBT flags: 0x20 nid: 10.21.22.26@tcp remote: 0x2d9b859d18c72470 expref: 7 pid: 7311 timeout: 6253100819 lvb_type: 0 Mar 15 16:08:20 cmds1 kernel: LustreError: 42216:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff8855aa0cbc00 x1495256954137944/t0(0) o37->513ddad8-648c-4e1d-4def-6b4a17dbd93c@10.21.22.26@tcp:0/0 lens 448/440 e 0 to 0 dl 1426436019 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 16:08:56 cmds1 kernel: LustreError: 7821:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff8853cf7b3400 x1495256954216616/t0(0) o37->513ddad8-648c-4e1d-4def-6b4a17dbd93c@10.21.22.26@tcp:0/0 lens 448/440 e 0 to 0 dl 1426436197 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 16:08:56 cmds1 kernel: LustreError: 7821:0:(ldlm_lib.c:2730:target_bulk_io()) Skipped 1 previous similar message Mar 15 16:12:44 cmds1 kernel: LNet: Service thread pid 6891 was inactive for 246.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 16:12:44 cmds1 kernel: LNet: Skipped 3 previous similar messages Mar 15 16:12:44 cmds1 kernel: Pid: 6891, comm: mdt00_009 Mar 15 16:12:44 cmds1 kernel: Mar 15 16:12:44 cmds1 kernel: Call Trace: Mar 15 16:12:44 cmds1 kernel: [] ? try_to_free_buffers+0x51/0xc0 Mar 15 16:12:44 cmds1 kernel: [] ? jbd2_journal_try_to_free_buffers+0xa7/0x150 [jbd2] Mar 15 16:12:44 cmds1 kernel: [] ? bdev_try_to_free_page+0x48/0x90 [ldiskfs] Mar 15 16:12:44 cmds1 kernel: [] ? __wake_up_bit+0x31/0x40 Mar 15 16:12:44 cmds1 kernel: [] ? blkdev_releasepage+0x36/0x50 Mar 15 16:12:44 cmds1 kernel: [] ? shrink_page_list.clone.3+0xc0/0x650 Mar 15 16:12:44 cmds1 kernel: [] ? mem_cgroup_lru_del_list+0x2b/0xb0 Mar 15 16:12:44 cmds1 kernel: [] ? isolate_lru_pages.clone.0+0xd7/0x170 Mar 15 16:12:44 cmds1 kernel: [] ? apic_timer_interrupt+0xe/0x20 Mar 15 16:12:44 cmds1 kernel: [] ? shrink_inactive_list+0x343/0x830 Mar 15 16:12:44 cmds1 kernel: [] ? shrink_active_list+0x297/0x370 Mar 15 16:12:44 cmds1 kernel: [] ? shrink_mem_cgroup_zone+0x3ae/0x610 Mar 15 16:12:44 cmds1 kernel: [] ? mem_cgroup_iter+0xfd/0x280 Mar 15 16:12:44 cmds1 kernel: [] ? shrink_zone+0x63/0xb0 Mar 15 16:12:44 cmds1 kernel: [] ? do_try_to_free_pages+0x115/0x610 Mar 15 16:12:44 cmds1 kernel: [] ? zone_watermark_ok+0x1f/0x30 Mar 15 16:12:44 cmds1 kernel: [] ? try_to_free_pages+0x92/0x120 Mar 15 16:12:44 cmds1 kernel: [] ? next_zone+0x30/0x40 Mar 15 16:12:44 cmds1 kernel: [] ? __alloc_pages_nodemask+0x478/0x8d0 Mar 15 16:12:44 cmds1 kernel: [] ? kmem_getpages+0x62/0x170 Mar 15 16:12:44 cmds1 kernel: [] ? fallback_alloc+0x1ba/0x270 Mar 15 16:12:44 cmds1 kernel: [] ? cache_grow+0x2cf/0x320 Mar 15 16:12:44 cmds1 kernel: [] ? ____cache_alloc_node+0x99/0x160 Mar 15 16:12:44 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 16:12:44 cmds1 kernel: [] ? __kmalloc+0x189/0x220 Mar 15 16:12:44 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 16:12:44 cmds1 kernel: [] ? osd_key_init+0x1e/0x670 [osd_ldiskfs] Mar 15 16:12:44 cmds1 kernel: [] ? keys_fill+0x6f/0x190 [obdclass] Mar 15 16:12:44 cmds1 kernel: [] ? lu_context_init+0xab/0x260 [obdclass] Mar 15 16:12:44 cmds1 kernel: [] ? mdt_intent_layout+0x2fe/0x630 [mdt] Mar 15 16:12:44 cmds1 kernel: [] ? lu_env_init+0x1e/0x30 [obdclass] Mar 15 16:12:44 cmds1 kernel: [] ? mdt_lvbo_fill+0x1ab/0x840 [mdt] Mar 15 16:12:44 cmds1 kernel: [] ? mdt_lvbo_fill+0x0/0x840 [mdt] Mar 15 16:12:44 cmds1 kernel: [] ? ldlm_handle_enqueue0+0x61d/0x10b0 [ptlrpc] Mar 15 16:12:44 cmds1 kernel: [] ? mdt_enqueue+0x46/0xe0 [mdt] Mar 15 16:12:44 cmds1 kernel: [] ? mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 16:12:44 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 16:12:44 cmds1 kernel: [] ? mds_regular_handle+0x15/0x20 [mdt] Mar 15 16:12:44 cmds1 kernel: [] ? ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 16:12:44 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 16:12:44 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 16:12:44 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 16:12:44 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 16:12:44 cmds1 kernel: [] ? ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 16:12:44 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:12:44 cmds1 kernel: [] ? child_rip+0xa/0x20 Mar 15 16:12:44 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:12:44 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:12:44 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 16:12:44 cmds1 kernel: Mar 15 16:12:44 cmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1426435964.6891 Mar 15 16:15:56 cmds1 kernel: LNet: Service thread pid 14596 was inactive for 454.56s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 16:15:56 cmds1 kernel: Pid: 14596, comm: mdt_rdpg00_007 Mar 15 16:15:56 cmds1 kernel: Mar 15 16:15:56 cmds1 kernel: Call Trace: Mar 15 16:15:56 cmds1 kernel: [] ? try_to_free_buffers+0x45/0xc0 Mar 15 16:15:56 cmds1 kernel: [] ? jbd2_journal_try_to_free_buffers+0xa7/0x150 [jbd2] Mar 15 16:15:56 cmds1 kernel: [] ? bdev_try_to_free_page+0x48/0x90 [ldiskfs] Mar 15 16:15:56 cmds1 kernel: [] ? __remove_mapping+0xb8/0x160 Mar 15 16:15:56 cmds1 kernel: [] ? shrink_page_list.clone.3+0x17d/0x650 Mar 15 16:15:56 cmds1 kernel: [] ? mem_cgroup_lru_del_list+0x2b/0xb0 Mar 15 16:15:56 cmds1 kernel: [] ? isolate_lru_pages.clone.0+0xd7/0x170 Mar 15 16:15:56 cmds1 kernel: [] ? shrink_inactive_list+0x343/0x830 Mar 15 16:15:56 cmds1 kernel: [] ? shrink_active_list+0x297/0x370 Mar 15 16:15:56 cmds1 kernel: [] ? shrink_mem_cgroup_zone+0x3ae/0x610 Mar 15 16:15:56 cmds1 kernel: [] ? mem_cgroup_iter+0xfd/0x280 Mar 15 16:15:56 cmds1 kernel: [] ? shrink_zone+0x63/0xb0 Mar 15 16:15:56 cmds1 kernel: [] ? do_try_to_free_pages+0x115/0x610 Mar 15 16:15:56 cmds1 kernel: [] ? zone_watermark_ok+0x1f/0x30 Mar 15 16:15:56 cmds1 kernel: [] ? try_to_free_pages+0x92/0x120 Mar 15 16:15:56 cmds1 kernel: [] ? next_zone+0x30/0x40 Mar 15 16:15:56 cmds1 kernel: [] ? __alloc_pages_nodemask+0x478/0x8d0 Mar 15 16:15:56 cmds1 kernel: [] ? kmem_getpages+0x62/0x170 Mar 15 16:15:56 cmds1 kernel: [] ? fallback_alloc+0x1ba/0x270 Mar 15 16:15:56 cmds1 kernel: [] ? cache_grow+0x2cf/0x320 Mar 15 16:15:56 cmds1 kernel: [] ? ____cache_alloc_node+0x99/0x160 Mar 15 16:15:56 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 16:15:56 cmds1 kernel: [] ? __kmalloc+0x189/0x220 Mar 15 16:15:56 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 16:15:56 cmds1 kernel: [] ? ptlrpc_new_bulk+0x48/0x280 [ptlrpc] Mar 15 16:15:56 cmds1 kernel: [] ? ptlrpc_prep_bulk_exp+0x5b/0x180 [ptlrpc] Mar 15 16:15:56 cmds1 kernel: [] ? mdd_dir_page_build+0x0/0x210 [mdd] Mar 15 16:15:56 cmds1 kernel: [] ? mdt_sendpage+0x6b/0x240 [mdt] Mar 15 16:15:56 cmds1 kernel: [] ? mdt_readpage+0x497/0x960 [mdt] Mar 15 16:15:56 cmds1 kernel: [] ? mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 16:15:56 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 16:15:56 cmds1 kernel: [] ? mds_readpage_handle+0x15/0x20 [mdt] Mar 15 16:15:56 cmds1 kernel: [] ? ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 16:15:56 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 16:15:56 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 16:15:56 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 16:15:56 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 16:15:56 cmds1 kernel: [] ? ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 16:15:56 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:15:56 cmds1 kernel: [] ? child_rip+0xa/0x20 Mar 15 16:15:56 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:15:56 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:15:56 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 16:15:56 cmds1 kernel: Mar 15 16:15:56 cmds1 kernel: LNet: Service thread pid 14596 completed after 454.72s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 16:15:56 cmds1 kernel: LNet: Skipped 3 previous similar messages Mar 15 16:17:12 cmds1 kernel: LustreError: 42075:0:(ldlm_lockd.c:391:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.21.22.29@tcp ns: mdt-charlie-MDT0000_UUID lock: ffff880641effb40/0xe1fcacc6396e18e6 lrc: 3/0,0 mode: PR/PR res: [0x200014909:0x6ab4:0x0].0 bits 0x2 rrc: 2 type: IBT flags: 0x20 nid: 10.21.22.29@tcp remote: 0x6d6d652d7312aa36 expref: 999 pid: 4560 timeout: 6253790837 lvb_type: 0 Mar 15 16:17:39 cmds1 kernel: LustreError: 7101:0:(ldlm_lockd.c:391:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.21.22.26@tcp ns: mdt-charlie-MDT0000_UUID lock: ffff882266c08480/0xe1fcacc63977d931 lrc: 3/0,0 mode: PR/PR res: [0x2000148e5:0x9649:0x0].0 bits 0x2 rrc: 2 type: IBT flags: 0x20 nid: 10.21.22.26@tcp remote: 0x2d9b859d18f64cd0 expref: 9 pid: 6849 timeout: 6253817034 lvb_type: 0 Mar 15 16:19:44 cmds1 kernel: LustreError: 42076:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff8852287d9c00 x1495261937616144/t0(0) o37->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 448/440 e 0 to 0 dl 1426436623 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 16:19:44 cmds1 kernel: Lustre: 42071:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (491:61s); client may timeout. req@ffff885fc179c850 x1495261926069356/t0(0) o37->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 448/408 e 0 to 0 dl 1426436323 ref 1 fl Complete:/0/0 rc -107/-107 Mar 15 16:20:03 cmds1 kernel: LustreError: 11599:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff885c7c1ef000 x1495261937617036/t0(0) o37->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 448/440 e 0 to 0 dl 1426436623 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 16:20:03 cmds1 kernel: LustreError: 11599:0:(ldlm_lib.c:2730:target_bulk_io()) Skipped 4 previous similar messages Mar 15 16:20:03 cmds1 kernel: LNet: Service thread pid 6891 completed after 685.86s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 16:23:24 cmds1 kernel: LNet: Service thread pid 7383 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 16:23:24 cmds1 kernel: Pid: 7383, comm: mdt_rdpg03_005 Mar 15 16:23:24 cmds1 kernel: Mar 15 16:23:24 cmds1 kernel: Call Trace: Mar 15 16:23:24 cmds1 kernel: [] ? shrink_inactive_list+0x343/0x830 Mar 15 16:23:24 cmds1 kernel: [] ? shrink_active_list+0x297/0x370 Mar 15 16:23:24 cmds1 kernel: [] shrink_mem_cgroup_zone+0x3ae/0x610 Mar 15 16:23:24 cmds1 kernel: [] ? mem_cgroup_iter+0xfd/0x280 Mar 15 16:23:24 cmds1 kernel: [] shrink_zone+0x63/0xb0 Mar 15 16:23:24 cmds1 kernel: [] do_try_to_free_pages+0x115/0x610 Mar 15 16:23:24 cmds1 kernel: [] ? zone_watermark_ok+0x1f/0x30 Mar 15 16:23:24 cmds1 kernel: [] try_to_free_pages+0x92/0x120 Mar 15 16:23:24 cmds1 kernel: [] ? next_zone+0x30/0x40 Mar 15 16:23:24 cmds1 kernel: [] __alloc_pages_nodemask+0x478/0x8d0 Mar 15 16:23:24 cmds1 kernel: [] kmem_getpages+0x62/0x170 Mar 15 16:23:24 cmds1 kernel: [] fallback_alloc+0x1ba/0x270 Mar 15 16:23:24 cmds1 kernel: [] ? cache_grow+0x2cf/0x320 Mar 15 16:23:24 cmds1 kernel: [] ____cache_alloc_node+0x99/0x160 Mar 15 16:23:24 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 16:23:24 cmds1 kernel: [] __kmalloc+0x189/0x220 Mar 15 16:23:24 cmds1 kernel: [] cfs_alloc+0x30/0x60 [libcfs] Mar 15 16:23:24 cmds1 kernel: [] ptlrpc_new_bulk+0x48/0x280 [ptlrpc] Mar 15 16:23:24 cmds1 kernel: [] ptlrpc_prep_bulk_exp+0x5b/0x180 [ptlrpc] Mar 15 16:23:24 cmds1 kernel: [] ? mdd_dir_page_build+0x0/0x210 [mdd] Mar 15 16:23:24 cmds1 kernel: [] mdt_sendpage+0x6b/0x240 [mdt] Mar 15 16:23:24 cmds1 kernel: [] mdt_readpage+0x497/0x960 [mdt] Mar 15 16:23:24 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 16:23:24 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 16:23:24 cmds1 kernel: [] mds_readpage_handle+0x15/0x20 [mdt] Mar 15 16:23:24 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 16:23:24 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 16:23:24 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 16:23:24 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 16:23:24 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 16:23:24 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 16:23:24 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:23:24 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 16:23:24 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:23:24 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:23:24 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 16:23:24 cmds1 kernel: Mar 15 16:23:24 cmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1426436603.7383 Mar 15 16:28:37 cmds1 kernel: LNet: Service thread pid 7350 completed after 513.91s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 16:28:38 cmds1 kernel: LustreError: 42078:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff885f2f271400 x1495261926068580/t0(0) o37->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 448/440 e 0 to 0 dl 1426436323 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 16:28:38 cmds1 kernel: LustreError: 42078:0:(ldlm_lib.c:2730:target_bulk_io()) Skipped 2 previous similar messages Mar 15 16:28:38 cmds1 kernel: Lustre: 42078:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (491:594s); client may timeout. req@ffff885f2f271400 x1495261926068580/t0(0) o37->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 448/408 e 0 to 0 dl 1426436323 ref 1 fl Complete:/0/0 rc -107/-107 Mar 15 16:28:38 cmds1 kernel: LNet: Service thread pid 7327 was inactive for 229.90s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 16:28:38 cmds1 kernel: Pid: 7327, comm: mdt07_027 Mar 15 16:28:38 cmds1 kernel: Mar 15 16:28:38 cmds1 kernel: Call Trace: Mar 15 16:28:38 cmds1 kernel: [] ? __alloc_pages_nodemask+0x113/0x8d0 Mar 15 16:28:38 cmds1 kernel: [] ? put_dec+0x10c/0x110 Mar 15 16:28:38 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 16:28:38 cmds1 kernel: [] wait_for_common+0x123/0x180 Mar 15 16:28:38 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 16:28:38 cmds1 kernel: [] ? __queue_work+0x41/0x50 Mar 15 16:28:38 cmds1 kernel: [] wait_for_completion+0x1d/0x20 Mar 15 16:28:38 cmds1 kernel: [] call_usermodehelper_exec+0x10c/0x120 Mar 15 16:28:38 cmds1 kernel: [] mdt_identity_do_upcall+0x13d/0x4c0 [mdt] Mar 15 16:28:38 cmds1 kernel: [] upcall_cache_get_entry+0x1b4/0x860 [libcfs] Mar 15 16:28:38 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? __req_capsule_get+0x166/0x700 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 16:28:38 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 16:28:38 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 16:28:38 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 16:28:38 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 16:28:38 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 16:28:38 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 16:28:38 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 16:28:38 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 16:28:38 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 16:28:38 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 16:28:38 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 16:28:38 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 16:28:38 cmds1 kernel: Mar 15 16:28:38 cmds1 kernel: Pid: 6852, comm: mdt07_007 Mar 15 16:28:38 cmds1 kernel: Mar 15 16:28:38 cmds1 kernel: Call Trace: Mar 15 16:28:38 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 16:28:38 cmds1 kernel: [] cfs_waitq_timedwait+0x11/0x20 [libcfs] Mar 15 16:28:38 cmds1 kernel: [] upcall_cache_get_entry+0x253/0x860 [libcfs] Mar 15 16:28:38 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 16:28:38 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 16:28:38 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 16:28:38 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 16:28:38 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 16:28:38 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 16:28:38 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 16:28:38 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 16:28:38 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 16:28:38 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 16:28:38 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 16:28:38 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 16:28:38 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 16:28:38 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 16:28:38 cmds1 kernel: Mar 15 16:28:38 cmds1 kernel: Pid: 21250, comm: mdt05_027 Mar 15 16:28:38 cmds1 kernel: Mar 15 16:28:38 cmds1 kernel: Call Trace: Mar 15 16:28:38 cmds1 kernel: [] ? enqueue_task+0x66/0x80 Mar 15 16:28:38 cmds1 kernel: [] ? check_preempt_curr+0x6d/0x90 Mar 15 16:28:38 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 16:28:38 cmds1 kernel: [] ? autoremove_wake_function+0x16/0x40 Mar 15 16:28:38 cmds1 kernel: [] ? __wake_up_common+0x59/0x90 Mar 15 16:28:38 cmds1 kernel: [] wait_for_common+0x123/0x180 Mar 15 16:28:38 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 16:28:38 cmds1 kernel: [] ? __queue_work+0x41/0x50 Mar 15 16:28:38 cmds1 kernel: [] wait_for_completion+0x1d/0x20 Mar 15 16:28:38 cmds1 kernel: [] call_usermodehelper_exec+0x10c/0x120 Mar 15 16:28:38 cmds1 kernel: [] mdt_identity_do_upcall+0x13d/0x4c0 [mdt] Mar 15 16:28:38 cmds1 kernel: [] upcall_cache_get_entry+0x1b4/0x860 [libcfs] Mar 15 16:28:38 cmds1 kernel: [] ? __req_capsule_get+0x166/0x700 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 16:28:38 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 16:28:38 cmds1 kernel: [] mdt_getattr_name+0x98/0x280 [mdt] Mar 15 16:28:38 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 16:28:38 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 16:28:38 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 16:28:38 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 16:28:38 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 16:28:38 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 16:28:38 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 16:28:38 cmds1 kernel: Mar 15 16:28:38 cmds1 kernel: Pid: 42216, comm: mdt_rdpg07_016 Mar 15 16:28:38 cmds1 kernel: Mar 15 16:28:38 cmds1 kernel: Call Trace: Mar 15 16:28:38 cmds1 kernel: [] ? try_to_free_buffers+0x51/0xc0 Mar 15 16:28:38 cmds1 kernel: [] ? jbd2_journal_try_to_free_buffers+0xa7/0x150 [jbd2] Mar 15 16:28:38 cmds1 kernel: [] ? bdev_try_to_free_page+0x48/0x90 [ldiskfs] Mar 15 16:28:38 cmds1 kernel: [] ? __wake_up_bit+0x31/0x40 Mar 15 16:28:38 cmds1 kernel: [] ? shrink_page_list.clone.3+0xd0/0x650 Mar 15 16:28:38 cmds1 kernel: [] ? mem_cgroup_lru_del_list+0x2b/0xb0 Mar 15 16:28:38 cmds1 kernel: [] ? isolate_lru_pages.clone.0+0xd7/0x170 Mar 15 16:28:38 cmds1 kernel: [] ? apic_timer_interrupt+0xe/0x20 Mar 15 16:28:38 cmds1 kernel: [] ? shrink_inactive_list+0x3b0/0x830 Mar 15 16:28:38 cmds1 kernel: [] ? shrink_active_list+0x297/0x370 Mar 15 16:28:38 cmds1 kernel: [] ? shrink_mem_cgroup_zone+0x3ae/0x610 Mar 15 16:28:38 cmds1 kernel: [] ? mem_cgroup_iter+0xfd/0x280 Mar 15 16:28:38 cmds1 kernel: [] ? shrink_zone+0x63/0xb0 Mar 15 16:28:38 cmds1 kernel: [] ? do_try_to_free_pages+0x115/0x610 Mar 15 16:28:38 cmds1 kernel: [] ? zone_watermark_ok+0x1f/0x30 Mar 15 16:28:38 cmds1 kernel: [] ? try_to_free_pages+0x92/0x120 Mar 15 16:28:38 cmds1 kernel: [] ? next_zone+0x30/0x40 Mar 15 16:28:38 cmds1 kernel: [] ? __alloc_pages_nodemask+0x478/0x8d0 Mar 15 16:28:38 cmds1 kernel: [] ? kmem_getpages+0x62/0x170 Mar 15 16:28:38 cmds1 kernel: [] ? fallback_alloc+0x1ba/0x270 Mar 15 16:28:38 cmds1 kernel: [] ? cache_grow+0x2cf/0x320 Mar 15 16:28:38 cmds1 kernel: [] ? ____cache_alloc_node+0x99/0x160 Mar 15 16:28:38 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 16:28:38 cmds1 kernel: [] ? __kmalloc+0x189/0x220 Mar 15 16:28:38 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 16:28:38 cmds1 kernel: [] ? ptlrpc_new_bulk+0x48/0x280 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? ptlrpc_prep_bulk_exp+0x5b/0x180 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? mdd_dir_page_build+0x0/0x210 [mdd] Mar 15 16:28:38 cmds1 kernel: [] ? mdt_sendpage+0x6b/0x240 [mdt] Mar 15 16:28:38 cmds1 kernel: [] ? mdt_readpage+0x497/0x960 [mdt] Mar 15 16:28:38 cmds1 kernel: [] ? mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 16:28:38 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? mds_readpage_handle+0x15/0x20 [mdt] Mar 15 16:28:38 cmds1 kernel: [] ? ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 16:28:38 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 16:28:38 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 16:28:38 cmds1 kernel: [] ? ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? child_rip+0xa/0x20 Mar 15 16:28:38 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 16:28:38 cmds1 kernel: Mar 15 16:28:38 cmds1 kernel: Pid: 6897, comm: mdt_rdpg03_003 Mar 15 16:28:38 cmds1 kernel: Mar 15 16:28:38 cmds1 kernel: Call Trace: Mar 15 16:28:38 cmds1 kernel: [] ? move_active_pages_to_lru+0x13b/0x1c0 Mar 15 16:28:38 cmds1 kernel: [] ? shrink_active_list+0x282/0x370 Mar 15 16:28:38 cmds1 kernel: [] ? shrink_mem_cgroup_zone+0x3f5/0x610 Mar 15 16:28:38 cmds1 kernel: [] ? mem_cgroup_iter+0xfd/0x280 Mar 15 16:28:38 cmds1 kernel: [] ? shrink_zone+0x63/0xb0 Mar 15 16:28:38 cmds1 kernel: [] ? do_try_to_free_pages+0x115/0x610 Mar 15 16:28:38 cmds1 kernel: [] ? zone_watermark_ok+0x1f/0x30 Mar 15 16:28:38 cmds1 kernel: [] ? try_to_free_pages+0x92/0x120 Mar 15 16:28:38 cmds1 kernel: [] ? next_zone+0x30/0x40 Mar 15 16:28:38 cmds1 kernel: [] ? __alloc_pages_nodemask+0x478/0x8d0 Mar 15 16:28:38 cmds1 kernel: [] ? kmem_getpages+0x62/0x170 Mar 15 16:28:38 cmds1 kernel: [] ? fallback_alloc+0x1ba/0x270 Mar 15 16:28:38 cmds1 kernel: [] ? cache_grow+0x2cf/0x320 Mar 15 16:28:38 cmds1 kernel: [] ? ____cache_alloc_node+0x99/0x160 Mar 15 16:28:38 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 16:28:38 cmds1 kernel: [] ? __kmalloc+0x189/0x220 Mar 15 16:28:38 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 16:28:38 cmds1 kernel: [] ? ptlrpc_new_bulk+0x48/0x280 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? ptlrpc_prep_bulk_exp+0x5b/0x180 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? mdd_dir_page_build+0x0/0x210 [mdd] Mar 15 16:28:38 cmds1 kernel: [] ? mdt_sendpage+0x6b/0x240 [mdt] Mar 15 16:28:38 cmds1 kernel: [] ? mdt_readpage+0x497/0x960 [mdt] Mar 15 16:28:38 cmds1 kernel: [] ? mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 16:28:38 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? mds_readpage_handle+0x15/0x20 [mdt] Mar 15 16:28:38 cmds1 kernel: [] ? ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 16:28:38 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 16:28:38 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 16:28:38 cmds1 kernel: [] ? ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? child_rip+0xa/0x20 Mar 15 16:28:38 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:28:38 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 16:28:38 cmds1 kernel: Mar 15 16:28:38 cmds1 kernel: LNet: Service thread pid 7327 completed after 230.89s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 16:28:38 cmds1 kernel: LNet: Skipped 3 previous similar messages Mar 15 16:30:19 cmds1 kernel: LustreError: 4663:0:(ldlm_lockd.c:391:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.21.22.26@tcp ns: mdt-charlie-MDT0000_UUID lock: ffff882e6c27f240/0xe1fcacc63ebb221f lrc: 3/0,0 mode: PR/PR res: [0x200014914:0x1e5d2:0x0].0 bits 0x2 rrc: 2 type: IBT flags: 0x20 nid: 10.21.22.26@tcp remote: 0x2d9b859d18f9cfe7 expref: 9 pid: 29750 timeout: 6254577036 lvb_type: 0 Mar 15 16:31:34 cmds1 kernel: Lustre: 42073:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-90), not sending early reply Mar 15 16:31:34 cmds1 kernel: req@ffff88597b203800 x1495261937744412/t0(0) o37->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 448/440 e 0 to 0 dl 1426437099 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 16:31:40 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) reconnecting Mar 15 16:31:40 cmds1 kernel: Lustre: Skipped 15 previous similar messages Mar 15 16:31:40 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) refused reconnection, still busy with 8 active RPCs Mar 15 16:31:40 cmds1 kernel: Lustre: Skipped 12 previous similar messages Mar 15 16:32:30 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) reconnecting Mar 15 16:32:30 cmds1 kernel: Lustre: Skipped 1 previous similar message Mar 15 16:32:30 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) refused reconnection, still busy with 8 active RPCs Mar 15 16:32:30 cmds1 kernel: Lustre: Skipped 1 previous similar message Mar 15 16:33:30 cmds1 kernel: Lustre: 12226:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-207), not sending early reply Mar 15 16:33:30 cmds1 kernel: req@ffff88094e6c9800 x1495260270642704/t0(0) o37->ec1c3acf-010a-7725-01ec-c89056ebc0e4@10.21.22.27@tcp:0/0 lens 448/440 e 5 to 0 dl 1426437215 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 16:33:45 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) reconnecting Mar 15 16:33:45 cmds1 kernel: Lustre: Skipped 2 previous similar messages Mar 15 16:33:45 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) refused reconnection, still busy with 8 active RPCs Mar 15 16:33:45 cmds1 kernel: Lustre: Skipped 2 previous similar messages Mar 15 16:36:15 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) reconnecting Mar 15 16:36:15 cmds1 kernel: Lustre: Skipped 7 previous similar messages Mar 15 16:36:15 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) refused reconnection, still busy with 8 active RPCs Mar 15 16:36:15 cmds1 kernel: Lustre: Skipped 7 previous similar messages Mar 15 16:38:28 cmds1 kernel: LNet: Service thread pid 42075 was inactive for 1104.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 16:38:28 cmds1 kernel: LNet: Skipped 4 previous similar messages Mar 15 16:38:28 cmds1 kernel: Pid: 42075, comm: mdt_rdpg05_017 Mar 15 16:38:28 cmds1 kernel: Mar 15 16:38:28 cmds1 kernel: Call Trace: Mar 15 16:38:28 cmds1 kernel: [] ? shrink_inactive_list+0x4f5/0x830 Mar 15 16:38:28 cmds1 kernel: [] shrink_mem_cgroup_zone+0x3ae/0x610 Mar 15 16:38:28 cmds1 kernel: [] ? mem_cgroup_iter+0xfd/0x280 Mar 15 16:38:28 cmds1 kernel: [] shrink_zone+0x63/0xb0 Mar 15 16:38:28 cmds1 kernel: [] do_try_to_free_pages+0x115/0x610 Mar 15 16:38:28 cmds1 kernel: [] ? zone_watermark_ok+0x1f/0x30 Mar 15 16:38:28 cmds1 kernel: [] try_to_free_pages+0x92/0x120 Mar 15 16:38:28 cmds1 kernel: [] ? next_zone+0x30/0x40 Mar 15 16:38:28 cmds1 kernel: [] __alloc_pages_nodemask+0x478/0x8d0 Mar 15 16:38:28 cmds1 kernel: [] kmem_getpages+0x62/0x170 Mar 15 16:38:28 cmds1 kernel: [] fallback_alloc+0x1ba/0x270 Mar 15 16:38:28 cmds1 kernel: [] ? cache_grow+0x2cf/0x320 Mar 15 16:38:28 cmds1 kernel: [] ____cache_alloc_node+0x99/0x160 Mar 15 16:38:28 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 16:38:28 cmds1 kernel: [] __kmalloc+0x189/0x220 Mar 15 16:38:28 cmds1 kernel: [] cfs_alloc+0x30/0x60 [libcfs] Mar 15 16:38:28 cmds1 kernel: [] ptlrpc_new_bulk+0x48/0x280 [ptlrpc] Mar 15 16:38:28 cmds1 kernel: [] ptlrpc_prep_bulk_exp+0x5b/0x180 [ptlrpc] Mar 15 16:38:28 cmds1 kernel: [] ? mdd_dir_page_build+0x0/0x210 [mdd] Mar 15 16:38:28 cmds1 kernel: [] mdt_sendpage+0x6b/0x240 [mdt] Mar 15 16:38:28 cmds1 kernel: [] mdt_readpage+0x497/0x960 [mdt] Mar 15 16:38:28 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 16:38:28 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 16:38:28 cmds1 kernel: [] mds_readpage_handle+0x15/0x20 [mdt] Mar 15 16:38:28 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 16:38:28 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 16:38:28 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 16:38:28 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 16:38:28 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 16:38:28 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 16:38:28 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:38:28 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 16:38:28 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:38:28 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:38:28 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 16:38:28 cmds1 kernel: Mar 15 16:38:28 cmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1426437508.42075 Mar 15 16:39:22 cmds1 kernel: Lustre: 7796:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-43), not sending early reply Mar 15 16:39:22 cmds1 kernel: req@ffff882fd5cda050 x1489201248660808/t0(0) o37->b19b9b9b-c5dd-6273-0815-b736f2e7ffdc@10.21.22.28@tcp:0/0 lens 448/440 e 0 to 0 dl 1426437567 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 16:39:28 cmds1 kernel: LustreError: 7312:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff882dccbe5c00 x1489201251927792/t0(0) o37->b19b9b9b-c5dd-6273-0815-b736f2e7ffdc@10.21.22.28@tcp:0/0 lens 448/440 e 0 to 0 dl 1426438323 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 16:40:30 cmds1 kernel: Lustre: 12228:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-42), not sending early reply Mar 15 16:40:30 cmds1 kernel: req@ffff8806a841b000 x1495260285736524/t0(0) o37->ec1c3acf-010a-7725-01ec-c89056ebc0e4@10.21.22.27@tcp:0/0 lens 448/440 e 0 to 0 dl 1426437635 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 16:40:30 cmds1 kernel: Lustre: 12228:0:(service.c:1339:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Mar 15 16:40:34 cmds1 kernel: LustreError: 7344:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff881962e8fc00 x1489201248987780/t0(0) o37->b19b9b9b-c5dd-6273-0815-b736f2e7ffdc@10.21.22.28@tcp:0/0 lens 448/440 e 0 to 0 dl 1426437640 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 16:40:34 cmds1 kernel: Lustre: 42075:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (695:535s); client may timeout. req@ffff88597b203800 x1495261937744412/t0(0) o37->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 448/408 e 0 to 0 dl 1426437099 ref 1 fl Complete:/0/0 rc -107/-107 Mar 15 16:40:34 cmds1 kernel: LNet: Service thread pid 42075 completed after 1230.42s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 16:40:34 cmds1 kernel: LNet: Skipped 2 previous similar messages Mar 15 16:40:35 cmds1 kernel: Lustre: 6890:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (648:68s); client may timeout. req@ffff882fd5cda050 x1489201248660808/t0(0) o37->b19b9b9b-c5dd-6273-0815-b736f2e7ffdc@10.21.22.28@tcp:0/0 lens 448/408 e 0 to 0 dl 1426437567 ref 1 fl Complete:/0/0 rc -107/-107 Mar 15 16:40:35 cmds1 kernel: LustreError: 42078:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff885fc17a9850 x1495261937889840/t0(0) o37->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 448/440 e 0 to 0 dl 1426437673 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 16:40:35 cmds1 kernel: LustreError: 42078:0:(ldlm_lib.c:2730:target_bulk_io()) Skipped 5 previous similar messages Mar 15 16:40:35 cmds1 kernel: Lustre: 7383:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (812:420s); client may timeout. req@ffff88094e6c9800 x1495260270642704/t0(0) o37->ec1c3acf-010a-7725-01ec-c89056ebc0e4@10.21.22.27@tcp:0/0 lens 448/408 e 5 to 0 dl 1426437215 ref 1 fl Complete:/0/0 rc -107/-107 Mar 15 16:40:35 cmds1 kernel: LNet: Service thread pid 7383 completed after 1231.75s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 16:40:43 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 15 16:40:43 cmds1 kernel: Lustre: Skipped 24 previous similar messages Mar 15 16:40:43 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 1 active RPCs Mar 15 16:40:43 cmds1 kernel: Lustre: Skipped 24 previous similar messages Mar 15 16:41:31 cmds1 kernel: LustreError: 42077:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff885f83c0f000 x1495261937909128/t0(0) o37->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 448/440 e 0 to 0 dl 1426437674 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 16:41:31 cmds1 kernel: LustreError: 42077:0:(ldlm_lib.c:2730:target_bulk_io()) Skipped 5 previous similar messages Mar 15 16:41:31 cmds1 kernel: Lustre: 42077:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (755:17s); client may timeout. req@ffff885f83c0f000 x1495261937909128/t0(0) o37->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 448/408 e 0 to 0 dl 1426437674 ref 1 fl Complete:/0/0 rc -107/-107 Mar 15 16:41:31 cmds1 kernel: Lustre: 42077:0:(service.c:2031:ptlrpc_server_handle_request()) Skipped 1 previous similar message Mar 15 16:42:21 cmds1 kernel: LustreError: 34830:0:(ldlm_lockd.c:391:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.21.22.26@tcp ns: mdt-charlie-MDT0000_UUID lock: ffff8805dc292900/0xe1fcacc64566cbef lrc: 3/0,0 mode: PR/PR res: [0x2000148ec:0x159dc:0x0].0 bits 0x2 rrc: 2 type: IBT flags: 0x20 nid: 10.21.22.26@tcp remote: 0x2d9b859d19b68453 expref: 8 pid: 29748 timeout: 6255299444 lvb_type: 0 Mar 15 16:42:33 cmds1 kernel: LustreError: 41550:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff885fc1159050 x1495256972525932/t0(0) o37->513ddad8-648c-4e1d-4def-6b4a17dbd93c@10.21.22.26@tcp:0/0 lens 448/440 e 0 to 0 dl 1426438396 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 16:42:33 cmds1 kernel: LustreError: 41550:0:(ldlm_lib.c:2730:target_bulk_io()) Skipped 1 previous similar message Mar 15 16:42:34 cmds1 kernel: Lustre: 26607:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (648:167s); client may timeout. req@ffff882fd5cdb050 x1489201248869300/t0(0) o37->b19b9b9b-c5dd-6273-0815-b736f2e7ffdc@10.21.22.28@tcp:0/0 lens 448/408 e 0 to 0 dl 1426437587 ref 1 fl Complete:/0/0 rc -107/-107 Mar 15 16:42:34 cmds1 kernel: Lustre: 26607:0:(service.c:2031:ptlrpc_server_handle_request()) Skipped 1 previous similar message Mar 15 16:46:08 cmds1 kernel: LNet: Service thread pid 6889 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 16:46:08 cmds1 kernel: Pid: 6889, comm: mdt00_008 Mar 15 16:46:08 cmds1 kernel: Mar 15 16:46:08 cmds1 kernel: Call Trace: Mar 15 16:46:08 cmds1 kernel: [] ? try_to_free_buffers+0x45/0xc0 Mar 15 16:46:08 cmds1 kernel: [] ? jbd2_journal_try_to_free_buffers+0xa7/0x150 [jbd2] Mar 15 16:46:08 cmds1 kernel: [] ? apic_timer_interrupt+0xe/0x20 Mar 15 16:46:08 cmds1 kernel: [] ? bdev_try_to_free_page+0x48/0x90 [ldiskfs] Mar 15 16:46:08 cmds1 kernel: [] ? blkdev_releasepage+0x36/0x50 Mar 15 16:46:08 cmds1 kernel: [] ? try_to_release_page+0x30/0x60 Mar 15 16:46:08 cmds1 kernel: [] ? shrink_page_list.clone.3+0x517/0x650 Mar 15 16:46:08 cmds1 kernel: [] ? mem_cgroup_lru_del_list+0x2b/0xb0 Mar 15 16:46:08 cmds1 kernel: [] ? isolate_lru_pages.clone.0+0xd7/0x170 Mar 15 16:46:08 cmds1 kernel: [] ? shrink_inactive_list+0x343/0x830 Mar 15 16:46:08 cmds1 kernel: [] ? shrink_active_list+0x297/0x370 Mar 15 16:46:08 cmds1 kernel: [] ? shrink_mem_cgroup_zone+0x3ae/0x610 Mar 15 16:46:08 cmds1 kernel: [] ? mem_cgroup_iter+0xfd/0x280 Mar 15 16:46:08 cmds1 kernel: [] ? shrink_zone+0x63/0xb0 Mar 15 16:46:08 cmds1 kernel: [] ? do_try_to_free_pages+0x115/0x610 Mar 15 16:46:08 cmds1 kernel: [] ? zone_watermark_ok+0x1f/0x30 Mar 15 16:46:08 cmds1 kernel: [] ? try_to_free_pages+0x92/0x120 Mar 15 16:46:08 cmds1 kernel: [] ? next_zone+0x30/0x40 Mar 15 16:46:08 cmds1 kernel: [] ? __alloc_pages_nodemask+0x478/0x8d0 Mar 15 16:46:08 cmds1 kernel: [] ? kmem_getpages+0x62/0x170 Mar 15 16:46:08 cmds1 kernel: [] ? fallback_alloc+0x1ba/0x270 Mar 15 16:46:08 cmds1 kernel: [] ? cache_grow+0x2cf/0x320 Mar 15 16:46:08 cmds1 kernel: [] ? ____cache_alloc_node+0x99/0x160 Mar 15 16:46:08 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 16:46:08 cmds1 kernel: [] ? __kmalloc+0x189/0x220 Mar 15 16:46:08 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 16:46:08 cmds1 kernel: [] ? osd_key_init+0x1e/0x670 [osd_ldiskfs] Mar 15 16:46:08 cmds1 kernel: [] ? keys_fill+0x6f/0x190 [obdclass] Mar 15 16:46:08 cmds1 kernel: [] ? lu_context_init+0xab/0x260 [obdclass] Mar 15 16:46:08 cmds1 kernel: [] ? mdt_intent_layout+0x2fe/0x630 [mdt] Mar 15 16:46:08 cmds1 kernel: [] ? lu_env_init+0x1e/0x30 [obdclass] Mar 15 16:46:08 cmds1 kernel: [] ? mdt_lvbo_fill+0x1ab/0x840 [mdt] Mar 15 16:46:08 cmds1 kernel: [] ? mdt_lvbo_fill+0x0/0x840 [mdt] Mar 15 16:46:08 cmds1 kernel: [] ? ldlm_handle_enqueue0+0x61d/0x10b0 [ptlrpc] Mar 15 16:46:08 cmds1 kernel: [] ? mdt_enqueue+0x46/0xe0 [mdt] Mar 15 16:46:08 cmds1 kernel: [] ? mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 16:46:08 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 16:46:08 cmds1 kernel: [] ? mds_regular_handle+0x15/0x20 [mdt] Mar 15 16:46:08 cmds1 kernel: [] ? ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 16:46:08 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 16:46:08 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 16:46:08 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 16:46:08 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 16:46:08 cmds1 kernel: [] ? ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 16:46:08 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:46:08 cmds1 kernel: [] ? child_rip+0xa/0x20 Mar 15 16:46:08 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:46:08 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:46:08 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 16:46:08 cmds1 kernel: Mar 15 16:46:08 cmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1426437968.6889 Mar 15 16:48:02 cmds1 kernel: LNet: Service thread pid 7320 was inactive for 206.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 16:48:02 cmds1 kernel: Pid: 7320, comm: mdt07_020 Mar 15 16:48:02 cmds1 kernel: Mar 15 16:48:02 cmds1 kernel: Call Trace: Mar 15 16:48:02 cmds1 kernel: [] ? try_to_free_buffers+0x45/0xc0 Mar 15 16:48:02 cmds1 kernel: [] ? jbd2_journal_try_to_free_buffers+0xa7/0x150 [jbd2] Mar 15 16:48:02 cmds1 kernel: [] ? bdev_try_to_free_page+0x48/0x90 [ldiskfs] Mar 15 16:48:02 cmds1 kernel: [] ? blkdev_releasepage+0x36/0x50 Mar 15 16:48:02 cmds1 kernel: [] ? try_to_release_page+0x30/0x60 Mar 15 16:48:02 cmds1 kernel: [] ? shrink_page_list.clone.3+0x517/0x650 Mar 15 16:48:02 cmds1 kernel: [] ? mem_cgroup_lru_del_list+0x2b/0xb0 Mar 15 16:48:02 cmds1 kernel: [] ? isolate_lru_pages.clone.0+0xd7/0x170 Mar 15 16:48:02 cmds1 kernel: [] ? shrink_inactive_list+0x343/0x830 Mar 15 16:48:02 cmds1 kernel: [] ? shrink_active_list+0x297/0x370 Mar 15 16:48:02 cmds1 kernel: [] ? shrink_mem_cgroup_zone+0x3ae/0x610 Mar 15 16:48:02 cmds1 kernel: [] ? mem_cgroup_iter+0xfd/0x280 Mar 15 16:48:02 cmds1 kernel: [] ? shrink_zone+0x63/0xb0 Mar 15 16:48:02 cmds1 kernel: [] ? do_try_to_free_pages+0x115/0x610 Mar 15 16:48:02 cmds1 kernel: [] ? zone_watermark_ok+0x1f/0x30 Mar 15 16:48:02 cmds1 kernel: [] ? try_to_free_pages+0x92/0x120 Mar 15 16:48:02 cmds1 kernel: [] ? next_zone+0x30/0x40 Mar 15 16:48:02 cmds1 kernel: [] ? __alloc_pages_nodemask+0x478/0x8d0 Mar 15 16:48:02 cmds1 kernel: [] ? kmem_getpages+0x62/0x170 Mar 15 16:48:02 cmds1 kernel: [] ? fallback_alloc+0x1ba/0x270 Mar 15 16:48:02 cmds1 kernel: [] ? cache_grow+0x2cf/0x320 Mar 15 16:48:02 cmds1 kernel: [] ? ____cache_alloc_node+0x99/0x160 Mar 15 16:48:02 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 16:48:02 cmds1 kernel: [] ? __kmalloc+0x189/0x220 Mar 15 16:48:02 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 16:48:02 cmds1 kernel: [] ? osd_key_init+0x76/0x670 [osd_ldiskfs] Mar 15 16:48:02 cmds1 kernel: [] ? keys_fill+0x6f/0x190 [obdclass] Mar 15 16:48:02 cmds1 kernel: [] ? lu_context_init+0xab/0x260 [obdclass] Mar 15 16:48:02 cmds1 kernel: [] ? mdt_intent_layout+0x2fe/0x630 [mdt] Mar 15 16:48:02 cmds1 kernel: [] ? lu_env_init+0x1e/0x30 [obdclass] Mar 15 16:48:02 cmds1 kernel: [] ? mdt_lvbo_fill+0x1ab/0x840 [mdt] Mar 15 16:48:02 cmds1 kernel: [] ? mdt_lvbo_fill+0x0/0x840 [mdt] Mar 15 16:48:02 cmds1 kernel: [] ? ldlm_handle_enqueue0+0x61d/0x10b0 [ptlrpc] Mar 15 16:48:02 cmds1 kernel: [] ? mdt_enqueue+0x46/0xe0 [mdt] Mar 15 16:48:02 cmds1 kernel: [] ? mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 16:48:02 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 16:48:02 cmds1 kernel: [] ? mds_regular_handle+0x15/0x20 [mdt] Mar 15 16:48:02 cmds1 kernel: [] ? ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 16:48:02 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 16:48:02 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 16:48:02 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 16:48:02 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 16:48:02 cmds1 kernel: [] ? ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 16:48:02 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:48:02 cmds1 kernel: [] ? child_rip+0xa/0x20 Mar 15 16:48:02 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:48:02 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:48:02 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 16:48:02 cmds1 kernel: Mar 15 16:48:02 cmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1426438082.7320 Mar 15 16:48:59 cmds1 kernel: LNet: Service thread pid 7320 completed after 263.08s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 16:52:22 cmds1 kernel: Lustre: 7343:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-42), not sending early reply Mar 15 16:52:22 cmds1 kernel: req@ffff885fc0df9800 x1495261937997404/t0(0) o101->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 376/472 e 0 to 0 dl 1426438347 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 16:52:22 cmds1 kernel: Lustre: 7343:0:(service.c:1339:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Mar 15 16:52:28 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) reconnecting Mar 15 16:52:28 cmds1 kernel: Lustre: Skipped 9 previous similar messages Mar 15 16:52:28 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) refused reconnection, still busy with 1 active RPCs Mar 15 16:52:28 cmds1 kernel: Lustre: Skipped 6 previous similar messages Mar 15 16:52:53 cmds1 kernel: LustreError: 37743:0:(ldlm_lockd.c:391:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.21.22.26@tcp ns: mdt-charlie-MDT0000_UUID lock: ffff8812ba5146c0/0xe1fcacc646b62ed1 lrc: 3/0,0 mode: PR/PR res: [0x200014934:0xb1d7:0x0].0 bits 0x2 rrc: 2 type: IBT flags: 0x20 nid: 10.21.22.26@tcp remote: 0x2d9b859d19b6fb53 expref: 8 pid: 5119 timeout: 6255931566 lvb_type: 0 Mar 15 16:52:53 cmds1 kernel: LustreError: 37743:0:(ldlm_lockd.c:391:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.21.22.28@tcp ns: mdt-charlie-MDT0000_UUID lock: ffff8815842076c0/0xe1fcacc646b653d0 lrc: 3/0,0 mode: PR/PR res: [0x2000148d7:0x174a9:0x0].0 bits 0x2 rrc: 2 type: IBT flags: 0x20 nid: 10.21.22.28@tcp remote: 0x478842ac3fb8e906 expref: 15 pid: 6888 timeout: 6255931597 lvb_type: 0 Mar 15 16:54:33 cmds1 kernel: LNet: Service thread pid 7312 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 16:54:33 cmds1 kernel: Pid: 7312, comm: mdt_rdpg00_003 Mar 15 16:54:33 cmds1 kernel: Mar 15 16:54:33 cmds1 kernel: Call Trace: Mar 15 16:54:33 cmds1 kernel: [] ? try_to_free_buffers+0x45/0xc0 Mar 15 16:54:33 cmds1 kernel: [] ? jbd2_journal_try_to_free_buffers+0xa7/0x150 [jbd2] Mar 15 16:54:33 cmds1 kernel: [] ? bdev_try_to_free_page+0x48/0x90 [ldiskfs] Mar 15 16:54:33 cmds1 kernel: [] ? blkdev_releasepage+0x36/0x50 Mar 15 16:54:33 cmds1 kernel: [] ? try_to_release_page+0x30/0x60 Mar 15 16:54:33 cmds1 kernel: [] ? shrink_page_list.clone.3+0x517/0x650 Mar 15 16:54:33 cmds1 kernel: [] ? mem_cgroup_lru_del_list+0x2b/0xb0 Mar 15 16:54:33 cmds1 kernel: [] ? isolate_lru_pages.clone.0+0xd7/0x170 Mar 15 16:54:33 cmds1 kernel: [] ? shrink_inactive_list+0x343/0x830 Mar 15 16:54:33 cmds1 kernel: [] ? shrink_active_list+0x297/0x370 Mar 15 16:54:33 cmds1 kernel: [] ? shrink_mem_cgroup_zone+0x3ae/0x610 Mar 15 16:54:33 cmds1 kernel: [] ? mem_cgroup_iter+0xfd/0x280 Mar 15 16:54:33 cmds1 kernel: [] ? shrink_zone+0x63/0xb0 Mar 15 16:54:33 cmds1 kernel: [] ? do_try_to_free_pages+0x115/0x610 Mar 15 16:54:33 cmds1 kernel: [] ? zone_watermark_ok+0x1f/0x30 Mar 15 16:54:33 cmds1 kernel: [] ? try_to_free_pages+0x92/0x120 Mar 15 16:54:33 cmds1 kernel: [] ? next_zone+0x30/0x40 Mar 15 16:54:33 cmds1 kernel: [] ? __alloc_pages_nodemask+0x478/0x8d0 Mar 15 16:54:33 cmds1 kernel: [] ? kmem_getpages+0x62/0x170 Mar 15 16:54:33 cmds1 kernel: [] ? fallback_alloc+0x1ba/0x270 Mar 15 16:54:33 cmds1 kernel: [] ? cache_grow+0x2cf/0x320 Mar 15 16:54:33 cmds1 kernel: [] ? ____cache_alloc_node+0x99/0x160 Mar 15 16:54:33 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 16:54:33 cmds1 kernel: [] ? __kmalloc+0x189/0x220 Mar 15 16:54:33 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 16:54:33 cmds1 kernel: [] ? ptlrpc_new_bulk+0x48/0x280 [ptlrpc] Mar 15 16:54:33 cmds1 kernel: [] ? ptlrpc_prep_bulk_exp+0x5b/0x180 [ptlrpc] Mar 15 16:54:33 cmds1 kernel: [] ? mdd_dir_page_build+0x0/0x210 [mdd] Mar 15 16:54:33 cmds1 kernel: [] ? mdt_sendpage+0x6b/0x240 [mdt] Mar 15 16:54:33 cmds1 kernel: [] ? mdt_readpage+0x497/0x960 [mdt] Mar 15 16:54:33 cmds1 kernel: [] ? mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 16:54:33 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 16:54:33 cmds1 kernel: [] ? mds_readpage_handle+0x15/0x20 [mdt] Mar 15 16:54:33 cmds1 kernel: [] ? ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 16:54:33 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 16:54:33 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 16:54:33 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 16:54:33 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 16:54:33 cmds1 kernel: [] ? ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 16:54:33 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:54:33 cmds1 kernel: [] ? child_rip+0xa/0x20 Mar 15 16:54:33 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:54:33 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:54:33 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 16:54:33 cmds1 kernel: Mar 15 16:54:33 cmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1426438473.7312 Mar 15 16:54:33 cmds1 kernel: Lustre: lock timed out (enqueued at 1426438273, 200s ago) Mar 15 16:54:41 cmds1 kernel: LustreError: 7312:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff882d9330b400 x1489201251992480/t0(0) o37->b19b9b9b-c5dd-6273-0815-b736f2e7ffdc@10.21.22.28@tcp:0/0 lens 448/440 e 1 to 0 dl 1426438455 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 16:54:41 cmds1 kernel: LustreError: 7312:0:(ldlm_lib.c:2730:target_bulk_io()) Skipped 1 previous similar message Mar 15 16:54:41 cmds1 kernel: Lustre: 7312:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (182:26s); client may timeout. req@ffff882d9330b400 x1489201251992480/t0(0) o37->b19b9b9b-c5dd-6273-0815-b736f2e7ffdc@10.21.22.28@tcp:0/0 lens 448/408 e 1 to 0 dl 1426438455 ref 1 fl Complete:/0/0 rc -107/-107 Mar 15 16:54:41 cmds1 kernel: LNet: Service thread pid 7312 completed after 207.57s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 16:54:41 cmds1 kernel: LNet: Skipped 1 previous similar message Mar 15 16:54:41 cmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1426438481.7321 Mar 15 16:54:52 cmds1 kernel: Lustre: 41550:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Mar 15 16:54:52 cmds1 kernel: req@ffff88418dc89c00 x1495256972539400/t0(0) o37->513ddad8-648c-4e1d-4def-6b4a17dbd93c@10.21.22.26@tcp:0/0 lens 448/440 e 0 to 0 dl 1426438497 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 16:55:08 cmds1 kernel: LNet: Service thread pid 26607 was inactive for 234.60s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 16:55:08 cmds1 kernel: Pid: 26607, comm: mdt_rdpg00_008 Mar 15 16:55:08 cmds1 kernel: Mar 15 16:55:08 cmds1 kernel: Call Trace: Mar 15 16:55:08 cmds1 kernel: [] ? try_to_free_buffers+0x45/0xc0 Mar 15 16:55:08 cmds1 kernel: [] ? jbd2_journal_try_to_free_buffers+0xa7/0x150 [jbd2] Mar 15 16:55:08 cmds1 kernel: [] ? apic_timer_interrupt+0xe/0x20 Mar 15 16:55:08 cmds1 kernel: [] ? bdev_try_to_free_page+0x48/0x90 [ldiskfs] Mar 15 16:55:08 cmds1 kernel: [] ? shrink_page_list.clone.3+0xd0/0x650 Mar 15 16:55:08 cmds1 kernel: [] ? mem_cgroup_lru_del_list+0x2b/0xb0 Mar 15 16:55:08 cmds1 kernel: [] ? isolate_lru_pages.clone.0+0xd7/0x170 Mar 15 16:55:08 cmds1 kernel: [] ? shrink_inactive_list+0x3dc/0x830 Mar 15 16:55:08 cmds1 kernel: [] ? shrink_mem_cgroup_zone+0x3ae/0x610 Mar 15 16:55:08 cmds1 kernel: [] ? mem_cgroup_iter+0xfd/0x280 Mar 15 16:55:08 cmds1 kernel: [] ? shrink_zone+0x63/0xb0 Mar 15 16:55:08 cmds1 kernel: [] ? do_try_to_free_pages+0x115/0x610 Mar 15 16:55:08 cmds1 kernel: [] ? zone_watermark_ok+0x1f/0x30 Mar 15 16:55:08 cmds1 kernel: [] ? try_to_free_pages+0x92/0x120 Mar 15 16:55:08 cmds1 kernel: [] ? next_zone+0x30/0x40 Mar 15 16:55:08 cmds1 kernel: [] ? __alloc_pages_nodemask+0x478/0x8d0 Mar 15 16:55:08 cmds1 kernel: [] ? kmem_getpages+0x62/0x170 Mar 15 16:55:08 cmds1 kernel: [] ? fallback_alloc+0x1ba/0x270 Mar 15 16:55:08 cmds1 kernel: [] ? cache_grow+0x2cf/0x320 Mar 15 16:55:08 cmds1 kernel: [] ? ____cache_alloc_node+0x99/0x160 Mar 15 16:55:08 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 16:55:08 cmds1 kernel: [] ? __kmalloc+0x189/0x220 Mar 15 16:55:08 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 16:55:08 cmds1 kernel: [] ? ptlrpc_new_bulk+0x48/0x280 [ptlrpc] Mar 15 16:55:08 cmds1 kernel: [] ? ptlrpc_prep_bulk_exp+0x5b/0x180 [ptlrpc] Mar 15 16:55:08 cmds1 kernel: [] ? mdd_dir_page_build+0x0/0x210 [mdd] Mar 15 16:55:08 cmds1 kernel: [] ? mdt_sendpage+0x6b/0x240 [mdt] Mar 15 16:55:08 cmds1 kernel: [] ? mdt_readpage+0x497/0x960 [mdt] Mar 15 16:55:08 cmds1 kernel: [] ? mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 16:55:08 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 16:55:08 cmds1 kernel: [] ? mds_readpage_handle+0x15/0x20 [mdt] Mar 15 16:55:08 cmds1 kernel: [] ? ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 16:55:08 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 16:55:08 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 16:55:08 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 16:55:08 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 16:55:08 cmds1 kernel: [] ? ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 16:55:08 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:55:08 cmds1 kernel: [] ? child_rip+0xa/0x20 Mar 15 16:55:08 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:55:08 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 16:55:08 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 16:55:08 cmds1 kernel: Mar 15 16:55:36 cmds1 kernel: LustreError: 42216:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff885b3da44c00 x1495256972588872/t0(0) o37->513ddad8-648c-4e1d-4def-6b4a17dbd93c@10.21.22.26@tcp:0/0 lens 448/440 e 0 to 0 dl 1426438801 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 16:55:36 cmds1 kernel: Lustre: 7821:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (755:39s); client may timeout. req@ffff88418dc89c00 x1495256972539400/t0(0) o37->513ddad8-648c-4e1d-4def-6b4a17dbd93c@10.21.22.26@tcp:0/0 lens 448/408 e 0 to 0 dl 1426438497 ref 1 fl Complete:/0/0 rc -107/-107 Mar 15 16:55:42 cmds1 kernel: Lustre: 38206:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (647:195s); client may timeout. req@ffff885fc0df9800 x1495261937997404/t0(0) o101->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 376/368 e 0 to 0 dl 1426438347 ref 1 fl Complete:/0/0 rc 0/0 Mar 15 16:55:42 cmds1 kernel: LustreError: 26607:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff882d6c3d7850 x1489201251993416/t0(0) o37->b19b9b9b-c5dd-6273-0815-b736f2e7ffdc@10.21.22.28@tcp:0/0 lens 448/440 e 1 to 0 dl 1426438455 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 16:55:42 cmds1 kernel: LustreError: 26607:0:(ldlm_lib.c:2730:target_bulk_io()) Skipped 1 previous similar message Mar 15 16:55:42 cmds1 kernel: LNet: Service thread pid 26607 completed after 268.77s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 17:04:41 cmds1 kernel: LNet: Service thread pid 14596 was inactive for 538.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 17:04:41 cmds1 kernel: Pid: 14596, comm: mdt_rdpg00_007 Mar 15 17:04:41 cmds1 kernel: Mar 15 17:04:41 cmds1 kernel: Call Trace: Mar 15 17:04:41 cmds1 kernel: [] ? shrink_inactive_list+0xdf/0x830 Mar 15 17:04:41 cmds1 kernel: [] ? shrink_active_list+0x297/0x370 Mar 15 17:04:41 cmds1 kernel: [] shrink_mem_cgroup_zone+0x3ae/0x610 Mar 15 17:04:41 cmds1 kernel: [] ? mem_cgroup_iter+0xfd/0x280 Mar 15 17:04:41 cmds1 kernel: [] shrink_zone+0x63/0xb0 Mar 15 17:04:41 cmds1 kernel: [] do_try_to_free_pages+0x115/0x610 Mar 15 17:04:41 cmds1 kernel: [] ? zone_watermark_ok+0x1f/0x30 Mar 15 17:04:41 cmds1 kernel: [] try_to_free_pages+0x92/0x120 Mar 15 17:04:41 cmds1 kernel: [] ? next_zone+0x30/0x40 Mar 15 17:04:41 cmds1 kernel: [] __alloc_pages_nodemask+0x478/0x8d0 Mar 15 17:04:41 cmds1 kernel: [] kmem_getpages+0x62/0x170 Mar 15 17:04:41 cmds1 kernel: [] fallback_alloc+0x1ba/0x270 Mar 15 17:04:41 cmds1 kernel: [] ? cache_grow+0x2cf/0x320 Mar 15 17:04:41 cmds1 kernel: [] ____cache_alloc_node+0x99/0x160 Mar 15 17:04:41 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 17:04:41 cmds1 kernel: [] __kmalloc+0x189/0x220 Mar 15 17:04:41 cmds1 kernel: [] cfs_alloc+0x30/0x60 [libcfs] Mar 15 17:04:41 cmds1 kernel: [] ptlrpc_new_bulk+0x48/0x280 [ptlrpc] Mar 15 17:04:41 cmds1 kernel: [] ptlrpc_prep_bulk_exp+0x5b/0x180 [ptlrpc] Mar 15 17:04:41 cmds1 kernel: [] ? mdd_dir_page_build+0x0/0x210 [mdd] Mar 15 17:04:41 cmds1 kernel: [] mdt_sendpage+0x6b/0x240 [mdt] Mar 15 17:04:41 cmds1 kernel: [] mdt_readpage+0x497/0x960 [mdt] Mar 15 17:04:41 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 17:04:41 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 17:04:41 cmds1 kernel: [] mds_readpage_handle+0x15/0x20 [mdt] Mar 15 17:04:41 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 17:04:41 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 17:04:41 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 17:04:41 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 17:04:41 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 17:04:41 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 17:04:41 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 17:04:41 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 17:04:41 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 17:04:41 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 17:04:41 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 17:04:41 cmds1 kernel: Mar 15 17:04:41 cmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1426439081.14596 Mar 15 17:07:20 cmds1 kernel: Lustre: 26607:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-97), not sending early reply Mar 15 17:07:20 cmds1 kernel: req@ffff882d6c3d8850 x1489201252303208/t0(0) o37->b19b9b9b-c5dd-6273-0815-b736f2e7ffdc@10.21.22.28@tcp:0/0 lens 448/440 e 1 to 0 dl 1426439245 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 17:09:02 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 15 17:09:02 cmds1 kernel: Lustre: Skipped 10 previous similar messages Mar 15 17:09:02 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 4 active RPCs Mar 15 17:09:02 cmds1 kernel: Lustre: Skipped 9 previous similar messages Mar 15 17:09:11 cmds1 kernel: Lustre: charlie-MDT0000: haven't heard from client b48a7480-2a75-0a37-3f10-9f2a79708c74 (at 10.21.22.26@tcp) in 227 seconds. I think it's dead, and I am evicting it. exp ffff885fe46bb400, cur 1426439351 expire 1426439201 last 1426439124 Mar 15 17:09:11 cmds1 kernel: Lustre: Skipped 2 previous similar messages Mar 15 17:09:12 cmds1 kernel: Lustre: MGS: haven't heard from client fa61e35a-5d4e-8dc8-f180-a39224dc6ff2 (at 10.21.22.26@tcp) in 228 seconds. I think it's dead, and I am evicting it. exp ffff882d9331f800, cur 1426439352 expire 1426439202 last 1426439124 Mar 15 17:09:12 cmds1 kernel: Lustre: Skipped 1 previous similar message Mar 15 17:13:55 cmds1 kernel: Lustre: 42075:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-143), not sending early reply Mar 15 17:13:55 cmds1 kernel: req@ffff88469c78c400 x1495261958607344/t0(0) o37->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 448/440 e 3 to 0 dl 1426439640 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 17:14:02 cmds1 kernel: Lustre: 24532:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Mar 15 17:14:02 cmds1 kernel: req@ffff88545f4a7000 x1495261958612128/t0(0) o101->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 376/472 e 0 to 0 dl 1426439647 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 17:14:02 cmds1 kernel: Lustre: 24532:0:(service.c:1339:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Mar 15 17:17:57 cmds1 kernel: LustreError: 35031:0:(ldlm_lockd.c:391:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.21.22.27@tcp ns: mdt-charlie-MDT0000_UUID lock: ffff881fe8e99b40/0xe1fcacc6574966f2 lrc: 3/0,0 mode: PR/PR res: [0x2000148ed:0xe9e7:0x0].0 bits 0x2 rrc: 2 type: IBT flags: 0x20 nid: 10.21.22.27@tcp remote: 0x1a02bfd7dc506a63 expref: 65 pid: 5106 timeout: 6257435894 lvb_type: 0 Mar 15 17:18:30 cmds1 kernel: Lustre: 4566:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-349), not sending early reply Mar 15 17:18:30 cmds1 kernel: req@ffff882d9300a800 x1495260306460856/t0(0) o101->ec1c3acf-010a-7725-01ec-c89056ebc0e4@10.21.22.27@tcp:0/0 lens 576/3448 e 1 to 0 dl 1426439915 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 17:18:30 cmds1 kernel: Lustre: 4566:0:(service.c:1339:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Mar 15 17:19:03 cmds1 kernel: LNet: Service thread pid 4559 was inactive for 741.58s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 17:19:03 cmds1 kernel: Pid: 4559, comm: mdt03_030 Mar 15 17:19:03 cmds1 kernel: Mar 15 17:19:03 cmds1 kernel: Call Trace: Mar 15 17:19:03 cmds1 kernel: [] ? _spin_unlock_bh+0x1b/0x20 Mar 15 17:19:03 cmds1 kernel: [] ? ksocknal_queue_tx_locked+0x136/0x530 [ksocklnd] Mar 15 17:19:03 cmds1 kernel: [] ? ksocknal_find_conn_locked+0x159/0x290 [ksocklnd] Mar 15 17:19:03 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 17:19:03 cmds1 kernel: [] cfs_waitq_timedwait+0x11/0x20 [libcfs] Mar 15 17:19:03 cmds1 kernel: [] upcall_cache_get_entry+0x253/0x860 [libcfs] Mar 15 17:19:03 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 17:19:03 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 17:19:03 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 17:19:03 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 17:19:03 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 17:19:03 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 17:19:03 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 17:19:03 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 17:19:03 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 17:19:03 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 17:19:03 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 17:19:03 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 17:19:03 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 17:19:03 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 17:19:03 cmds1 kernel: Mar 15 17:19:03 cmds1 kernel: Pid: 8784, comm: mdt03_021 Mar 15 17:19:03 cmds1 kernel: Mar 15 17:19:03 cmds1 kernel: Call Trace: Mar 15 17:19:03 cmds1 kernel: [] ? _spin_unlock_bh+0x1b/0x20 Mar 15 17:19:03 cmds1 kernel: [] ? ksocknal_queue_tx_locked+0x136/0x530 [ksocklnd] Mar 15 17:19:03 cmds1 kernel: [] ? ksocknal_find_conn_locked+0x159/0x290 [ksocklnd] Mar 15 17:19:03 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 17:19:03 cmds1 kernel: [] cfs_waitq_timedwait+0x11/0x20 [libcfs] Mar 15 17:19:03 cmds1 kernel: [] upcall_cache_get_entry+0x253/0x860 [libcfs] Mar 15 17:19:03 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 17:19:03 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 17:19:03 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 17:19:03 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 17:19:03 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 17:19:03 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 17:19:03 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 17:19:03 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 17:19:03 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 17:19:03 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 17:19:03 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 17:19:03 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 17:19:03 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 17:19:03 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 17:19:03 cmds1 kernel: Mar 15 17:19:03 cmds1 kernel: Pid: 24535, comm: mdt05_046 Mar 15 17:19:03 cmds1 kernel: Mar 15 17:19:03 cmds1 kernel: Call Trace: Mar 15 17:19:03 cmds1 kernel: [] ? _spin_unlock_bh+0x1b/0x20 Mar 15 17:19:03 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 17:19:03 cmds1 kernel: [] cfs_waitq_timedwait+0x11/0x20 [libcfs] Mar 15 17:19:03 cmds1 kernel: [] upcall_cache_get_entry+0x253/0x860 [libcfs] Mar 15 17:19:03 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 17:19:03 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 17:19:03 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 17:19:03 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 17:19:03 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 17:19:03 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 17:19:03 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 17:19:03 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 17:19:03 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 17:19:03 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 17:19:03 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 17:19:03 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 17:19:03 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 17:19:03 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 17:19:03 cmds1 kernel: Mar 15 17:19:03 cmds1 kernel: Pid: 6901, comm: mdt05_010 Mar 15 17:19:03 cmds1 kernel: Mar 15 17:19:03 cmds1 kernel: Call Trace: Mar 15 17:19:03 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 17:19:03 cmds1 kernel: [] cfs_waitq_timedwait+0x11/0x20 [libcfs] Mar 15 17:19:03 cmds1 kernel: [] upcall_cache_get_entry+0x253/0x860 [libcfs] Mar 15 17:19:03 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 17:19:03 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 17:19:03 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 17:19:03 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 17:19:03 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 17:19:03 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 17:19:03 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 17:19:03 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 17:19:03 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 17:19:03 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 17:19:03 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 17:19:03 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 17:19:03 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 17:19:03 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 17:19:03 cmds1 kernel: Mar 15 17:19:03 cmds1 kernel: Pid: 7347, comm: mdt03_014 Mar 15 17:19:03 cmds1 kernel: Mar 15 17:19:03 cmds1 kernel: Call Trace: Mar 15 17:19:03 cmds1 kernel: [] ? _spin_unlock_bh+0x1b/0x20 Mar 15 17:19:03 cmds1 kernel: [] ? ksocknal_queue_tx_locked+0x136/0x530 [ksocklnd] Mar 15 17:19:03 cmds1 kernel: [] ? ksocknal_find_conn_locked+0x159/0x290 [ksocklnd] Mar 15 17:19:03 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 17:19:03 cmds1 kernel: [] cfs_waitq_timedwait+0x11/0x20 [libcfs] Mar 15 17:19:03 cmds1 kernel: [] upcall_cache_get_entry+0x253/0x860 [libcfs] Mar 15 17:19:03 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 17:19:03 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 17:19:03 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 17:19:03 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 17:19:03 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 17:19:03 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 17:19:03 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 17:19:03 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 17:19:03 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 17:19:03 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 17:19:03 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 17:19:03 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 17:19:03 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 17:19:03 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 17:19:03 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 17:19:03 cmds1 kernel: Mar 15 17:19:03 cmds1 kernel: LNet: Service thread pid 7329 was inactive for 801.61s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 15 17:19:03 cmds1 kernel: LNet: Skipped 5 previous similar messages Mar 15 17:19:03 cmds1 kernel: LustreError: 4563:0:(ldlm_lockd.c:1376:ldlm_handle_enqueue0()) ### lock on destroyed export ffff882ec3db9000 ns: mdt-charlie-MDT0000_UUID lock: ffff882ea0383d80/0xe1fcacc6574a5b9e lrc: 3/0,0 mode: PR/PR res: [0x2000157b1:0x1c6aa:0x0].0 bits 0x13 rrc: 7 type: IBT flags: 0x200000000000 nid: 10.21.22.27@tcp remote: 0x1a02bfd7dc506a94 expref: 56 pid: 4563 timeout: 0 lvb_type: 0 Mar 15 17:19:03 cmds1 kernel: LNet: Service thread pid 8784 completed after 741.96s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 17:19:03 cmds1 kernel: Lustre: 4561:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (954:28s); client may timeout. req@ffff882e739db000 x1495260306462128/t0(0) o101->ec1c3acf-010a-7725-01ec-c89056ebc0e4@10.21.22.27@tcp:0/0 lens 576/536 e 1 to 0 dl 1426439915 ref 1 fl Complete:/0/0 rc -107/-107 Mar 15 17:19:03 cmds1 kernel: Lustre: 4561:0:(service.c:2031:ptlrpc_server_handle_request()) Skipped 1 previous similar message Mar 15 17:19:08 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) reconnecting Mar 15 17:19:08 cmds1 kernel: Lustre: Skipped 36 previous similar messages Mar 15 17:19:08 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) refused reconnection, still busy with 5 active RPCs Mar 15 17:19:08 cmds1 kernel: Lustre: Skipped 36 previous similar messages Mar 15 17:19:37 cmds1 kernel: LNet: Service thread pid 9260 was inactive for 200.00s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 15 17:19:37 cmds1 kernel: LNet: Skipped 17 previous similar messages Mar 15 17:19:37 cmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1426439977.9260 Mar 15 17:19:40 cmds1 kernel: Lustre: 7312:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-452), not sending early reply Mar 15 17:19:40 cmds1 kernel: req@ffff882d976e0c00 x1489201256459036/t0(0) o37->b19b9b9b-c5dd-6273-0815-b736f2e7ffdc@10.21.22.28@tcp:0/0 lens 448/440 e 1 to 0 dl 1426439985 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 17:19:40 cmds1 kernel: Lustre: 7312:0:(service.c:1339:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Mar 15 17:19:41 cmds1 kernel: LustreError: 42073:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff884d83a46c00 x1495261958611796/t0(0) o37->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 448/440 e 3 to 0 dl 1426439640 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 17:19:41 cmds1 kernel: Lustre: 42073:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (748:341s); client may timeout. req@ffff884d83a46c00 x1495261958611796/t0(0) o37->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 448/408 e 3 to 0 dl 1426439640 ref 1 fl Complete:/0/0 rc -107/-107 Mar 15 17:19:41 cmds1 kernel: Lustre: 42073:0:(service.c:2031:ptlrpc_server_handle_request()) Skipped 1 previous similar message Mar 15 17:19:41 cmds1 kernel: LNet: Service thread pid 42073 completed after 1088.63s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 17:19:41 cmds1 kernel: LNet: Skipped 15 previous similar messages Mar 15 17:19:41 cmds1 kernel: LustreError: 9260:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff882e49f8cc00 x1495260331356396/t0(0) o37->ec1c3acf-010a-7725-01ec-c89056ebc0e4@10.21.22.27@tcp:0/0 lens 448/440 e 3 to 0 dl 1426439965 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 17:19:41 cmds1 kernel: LustreError: 9260:0:(ldlm_lib.c:2730:target_bulk_io()) Skipped 5 previous similar messages Mar 15 17:19:41 cmds1 kernel: Lustre: 9260:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (188:16s); client may timeout. req@ffff882e49f8cc00 x1495260331356396/t0(0) o37->ec1c3acf-010a-7725-01ec-c89056ebc0e4@10.21.22.27@tcp:0/0 lens 448/408 e 3 to 0 dl 1426439965 ref 1 fl Complete:/0/0 rc -107/-107 Mar 15 17:19:41 cmds1 kernel: Lustre: 9260:0:(service.c:2031:ptlrpc_server_handle_request()) Skipped 5 previous similar messages Mar 15 17:19:42 cmds1 kernel: LNet: Service thread pid 14596 completed after 1439.18s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 17:19:42 cmds1 kernel: LNet: Skipped 5 previous similar messages Mar 15 17:19:44 cmds1 kernel: LustreError: 7344:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff882d976e0c00 x1489201256459036/t0(0) o37->b19b9b9b-c5dd-6273-0815-b736f2e7ffdc@10.21.22.28@tcp:0/0 lens 448/440 e 1 to 0 dl 1426439985 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 17:19:44 cmds1 kernel: LustreError: 7344:0:(ldlm_lib.c:2730:target_bulk_io()) Skipped 1 previous similar message Mar 15 17:19:44 cmds1 kernel: LNet: Service thread pid 7344 completed after 1056.18s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 17:21:32 cmds1 kernel: LustreError: 7312:0:(ldlm_lockd.c:391:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.21.22.28@tcp ns: mdt-charlie-MDT0000_UUID lock: ffff881db25b1b40/0xe1fcacc65760fe0f lrc: 3/0,0 mode: PR/PR res: [0x200013f4b:0x188c0:0x0].0 bits 0x2 rrc: 2 type: IBT flags: 0x20 nid: 10.21.22.28@tcp remote: 0x478842ac403c3f79 expref: 13 pid: 9841 timeout: 6257650746 lvb_type: 0 Mar 15 17:21:36 cmds1 kernel: LNet: Service thread pid 12227 was inactive for 218.00s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 15 17:21:36 cmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1426440096.12227 Mar 15 17:24:32 cmds1 kernel: Lustre: 5184:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-207), not sending early reply Mar 15 17:24:32 cmds1 kernel: req@ffff885fc4024850 x1495730120294424/t0(0) o503->054af007-8639-e36f-4c95-f5817becbd4b@10.21.22.26@tcp:0/0 lens 272/0 e 5 to 0 dl 1426440277 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Mar 15 17:24:49 cmds1 kernel: LNet: Service thread pid 17548 was inactive for 411.73s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 17:24:49 cmds1 kernel: LNet: Skipped 4 previous similar messages Mar 15 17:24:49 cmds1 kernel: Pid: 17548, comm: mdt_rdpg03_015 Mar 15 17:24:49 cmds1 kernel: Mar 15 17:24:49 cmds1 kernel: Call Trace: Mar 15 17:24:49 cmds1 kernel: [] ? try_to_free_buffers+0x45/0xc0 Mar 15 17:24:49 cmds1 kernel: [] ? jbd2_journal_try_to_free_buffers+0xa7/0x150 [jbd2] Mar 15 17:24:49 cmds1 kernel: [] ? apic_timer_interrupt+0xe/0x20 Mar 15 17:24:49 cmds1 kernel: [] ? bdev_try_to_free_page+0x48/0x90 [ldiskfs] Mar 15 17:24:49 cmds1 kernel: [] ? blkdev_releasepage+0x36/0x50 Mar 15 17:24:49 cmds1 kernel: [] ? try_to_release_page+0x30/0x60 Mar 15 17:24:49 cmds1 kernel: [] ? shrink_page_list.clone.3+0x517/0x650 Mar 15 17:24:49 cmds1 kernel: [] ? mem_cgroup_lru_del_list+0x2b/0xb0 Mar 15 17:24:49 cmds1 kernel: [] ? isolate_lru_pages.clone.0+0xd7/0x170 Mar 15 17:24:49 cmds1 kernel: [] ? shrink_inactive_list+0x343/0x830 Mar 15 17:24:49 cmds1 kernel: [] ? shrink_active_list+0x297/0x370 Mar 15 17:24:49 cmds1 kernel: [] ? shrink_mem_cgroup_zone+0x3ae/0x610 Mar 15 17:24:49 cmds1 kernel: [] ? mem_cgroup_iter+0xfd/0x280 Mar 15 17:24:49 cmds1 kernel: [] ? shrink_zone+0x63/0xb0 Mar 15 17:24:49 cmds1 kernel: [] ? do_try_to_free_pages+0x115/0x610 Mar 15 17:24:49 cmds1 kernel: [] ? zone_watermark_ok+0x1f/0x30 Mar 15 17:24:49 cmds1 kernel: [] ? try_to_free_pages+0x92/0x120 Mar 15 17:24:49 cmds1 kernel: [] ? next_zone+0x30/0x40 Mar 15 17:24:49 cmds1 kernel: [] ? __alloc_pages_nodemask+0x478/0x8d0 Mar 15 17:24:49 cmds1 kernel: [] ? kmem_getpages+0x62/0x170 Mar 15 17:24:49 cmds1 kernel: [] ? fallback_alloc+0x1ba/0x270 Mar 15 17:24:49 cmds1 kernel: [] ? cache_grow+0x2cf/0x320 Mar 15 17:24:49 cmds1 kernel: [] ? ____cache_alloc_node+0x99/0x160 Mar 15 17:24:49 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 17:24:49 cmds1 kernel: [] ? __kmalloc+0x189/0x220 Mar 15 17:24:49 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 17:24:49 cmds1 kernel: [] ? ptlrpc_new_bulk+0x48/0x280 [ptlrpc] Mar 15 17:24:49 cmds1 kernel: [] ? ptlrpc_prep_bulk_exp+0x5b/0x180 [ptlrpc] Mar 15 17:24:49 cmds1 kernel: [] ? mdd_dir_page_build+0x0/0x210 [mdd] Mar 15 17:24:49 cmds1 kernel: [] ? mdt_sendpage+0x6b/0x240 [mdt] Mar 15 17:24:49 cmds1 kernel: [] ? mdt_readpage+0x497/0x960 [mdt] Mar 15 17:24:49 cmds1 kernel: [] ? mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 17:24:49 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 17:24:49 cmds1 kernel: [] ? mds_readpage_handle+0x15/0x20 [mdt] Mar 15 17:24:49 cmds1 kernel: [] ? ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 17:24:49 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 17:24:49 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 17:24:49 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 17:24:49 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 17:24:49 cmds1 kernel: [] ? ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 17:24:49 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 17:24:49 cmds1 kernel: [] ? child_rip+0xa/0x20 Mar 15 17:24:49 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 17:24:49 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 17:24:49 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 17:24:49 cmds1 kernel: Mar 15 17:25:19 cmds1 kernel: LustreError: 7312:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff882d6c3cb050 x1489201261572176/t0(0) o37->b19b9b9b-c5dd-6273-0815-b736f2e7ffdc@10.21.22.28@tcp:0/0 lens 448/440 e 0 to 0 dl 1426440747 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 17:25:19 cmds1 kernel: LNet: Service thread pid 12227 completed after 441.30s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 17:26:02 cmds1 kernel: LNet: Service thread pid 17548 completed after 484.60s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 17:29:33 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) reconnecting Mar 15 17:29:33 cmds1 kernel: Lustre: Skipped 27 previous similar messages Mar 15 17:29:33 cmds1 kernel: Lustre: charlie-MDT0000: Client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) refused reconnection, still busy with 1 active RPCs Mar 15 17:29:33 cmds1 kernel: Lustre: Skipped 26 previous similar messages Mar 15 17:30:17 cmds1 kernel: Lustre: MGS: haven't heard from client 054af007-8639-e36f-4c95-f5817becbd4b (at 10.21.22.26@tcp) in 229 seconds. I think it's dead, and I am evicting it. exp ffff885fe77bec00, cur 1426440617 expire 1426440467 last 1426440388 Mar 15 17:34:13 cmds1 kernel: LNet: Service thread pid 7348 was inactive for 396.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 17:34:13 cmds1 kernel: Pid: 7348, comm: mdt03_015 Mar 15 17:34:13 cmds1 kernel: Mar 15 17:34:13 cmds1 kernel: Call Trace: Mar 15 17:34:13 cmds1 kernel: [] ? try_to_free_buffers+0x51/0xc0 Mar 15 17:34:13 cmds1 kernel: [] ? jbd2_journal_try_to_free_buffers+0xa7/0x150 [jbd2] Mar 15 17:34:13 cmds1 kernel: [] ? apic_timer_interrupt+0xe/0x20 Mar 15 17:34:13 cmds1 kernel: [] ? bdev_try_to_free_page+0x48/0x90 [ldiskfs] Mar 15 17:34:13 cmds1 kernel: [] ? shrink_page_list.clone.3+0xd0/0x650 Mar 15 17:34:13 cmds1 kernel: [] ? mem_cgroup_lru_del_list+0x2b/0xb0 Mar 15 17:34:13 cmds1 kernel: [] ? isolate_lru_pages.clone.0+0xd7/0x170 Mar 15 17:34:13 cmds1 kernel: [] ? reschedule_interrupt+0xe/0x20 Mar 15 17:34:13 cmds1 kernel: [] ? shrink_inactive_list+0x343/0x830 Mar 15 17:34:13 cmds1 kernel: [] ? shrink_active_list+0x297/0x370 Mar 15 17:34:13 cmds1 kernel: [] ? shrink_mem_cgroup_zone+0x3ae/0x610 Mar 15 17:34:13 cmds1 kernel: [] ? mem_cgroup_iter+0xfd/0x280 Mar 15 17:34:13 cmds1 kernel: [] ? shrink_zone+0x63/0xb0 Mar 15 17:34:13 cmds1 kernel: [] ? do_try_to_free_pages+0x115/0x610 Mar 15 17:34:13 cmds1 kernel: [] ? zone_watermark_ok+0x1f/0x30 Mar 15 17:34:13 cmds1 kernel: [] ? try_to_free_pages+0x92/0x120 Mar 15 17:34:13 cmds1 kernel: [] ? next_zone+0x30/0x40 Mar 15 17:34:13 cmds1 kernel: [] ? __alloc_pages_nodemask+0x478/0x8d0 Mar 15 17:34:13 cmds1 kernel: [] ? kmem_getpages+0x62/0x170 Mar 15 17:34:13 cmds1 kernel: [] ? fallback_alloc+0x1ba/0x270 Mar 15 17:34:13 cmds1 kernel: [] ? cache_grow+0x2cf/0x320 Mar 15 17:34:13 cmds1 kernel: [] ? ____cache_alloc_node+0x99/0x160 Mar 15 17:34:13 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 17:34:13 cmds1 kernel: [] ? __kmalloc+0x189/0x220 Mar 15 17:34:13 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 17:34:13 cmds1 kernel: [] ? osd_key_init+0x76/0x670 [osd_ldiskfs] Mar 15 17:34:13 cmds1 kernel: [] ? keys_fill+0x6f/0x190 [obdclass] Mar 15 17:34:13 cmds1 kernel: [] ? lu_context_init+0xab/0x260 [obdclass] Mar 15 17:34:13 cmds1 kernel: [] ? mdt_intent_layout+0x2fe/0x630 [mdt] Mar 15 17:34:13 cmds1 kernel: [] ? lu_env_init+0x1e/0x30 [obdclass] Mar 15 17:34:13 cmds1 kernel: [] ? mdt_lvbo_fill+0x1ab/0x840 [mdt] Mar 15 17:34:13 cmds1 kernel: [] ? mdt_lvbo_fill+0x0/0x840 [mdt] Mar 15 17:34:13 cmds1 kernel: [] ? ldlm_handle_enqueue0+0x61d/0x10b0 [ptlrpc] Mar 15 17:34:13 cmds1 kernel: [] ? mdt_enqueue+0x46/0xe0 [mdt] Mar 15 17:34:13 cmds1 kernel: [] ? mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 17:34:13 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 17:34:13 cmds1 kernel: [] ? mds_regular_handle+0x15/0x20 [mdt] Mar 15 17:34:13 cmds1 kernel: [] ? ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 17:34:13 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 17:34:13 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 17:34:13 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 17:34:13 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 17:34:13 cmds1 kernel: [] ? ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 17:34:13 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 17:34:13 cmds1 kernel: [] ? child_rip+0xa/0x20 Mar 15 17:34:13 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 17:34:13 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 17:34:13 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 17:34:13 cmds1 kernel: Mar 15 17:34:13 cmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1426440853.7348 Mar 15 17:35:16 cmds1 kernel: LNet: Service thread pid 7348 completed after 458.87s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 17:36:03 cmds1 kernel: LustreError: 6843:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff88469c78c400 x1495261958607344/t0(0) o37->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 448/440 e 3 to 0 dl 1426439640 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 17:36:03 cmds1 kernel: Lustre: 6843:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (748:1323s); client may timeout. req@ffff88469c78c400 x1495261958607344/t0(0) o37->f8554ae8-7eca-d50c-9612-c35702c5035e@10.21.22.29@tcp:0/0 lens 448/408 e 3 to 0 dl 1426439640 ref 1 fl Complete:/0/0 rc -107/-107 Mar 15 17:36:03 cmds1 kernel: Lustre: 6843:0:(service.c:2031:ptlrpc_server_handle_request()) Skipped 1 previous similar message Mar 15 17:36:03 cmds1 kernel: LNet: Service thread pid 6843 completed after 2070.93s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 17:38:58 cmds1 kernel: Lustre: 5090:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Mar 15 17:38:58 cmds1 kernel: req@ffff885fc43cc050 x1495730120294596/t0(0) o503->c6a67b58-2743-34f0-dc4b-4de5f2d99cb6@10.21.22.26@tcp:0/0 lens 272/0 e 0 to 0 dl 1426441143 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Mar 15 17:42:56 cmds1 kernel: Lustre: MGS: haven't heard from client c6a67b58-2743-34f0-dc4b-4de5f2d99cb6 (at 10.21.22.26@tcp) in 232 seconds. I think it's dead, and I am evicting it. exp ffff885190c83800, cur 1426441376 expire 1426441226 last 1426441144 Mar 15 17:44:09 cmds1 kernel: LustreError: 0:0:(ldlm_lockd.c:391:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.21.22.28@tcp ns: mdt-charlie-MDT0000_UUID lock: ffff881b2aeeb240/0xe1fcacc659733f82 lrc: 3/0,0 mode: PR/PR res: [0x200013f6a:0x1b9cc:0x0].0 bits 0x2 rrc: 2 type: IBT flags: 0x20 nid: 10.21.22.28@tcp remote: 0x478842ac406c25ab expref: 12 pid: 5097 timeout: 6259007597 lvb_type: 0 Mar 15 17:45:27 cmds1 kernel: Lustre: charlie-MDT0000: haven't heard from client f8554ae8-7eca-d50c-9612-c35702c5035e (at 10.21.22.29@tcp) in 229 seconds. I think it's dead, and I am evicting it. exp ffff8833cc3e4800, cur 1426441527 expire 1426441377 last 1426441298 Mar 15 17:45:39 cmds1 kernel: LustreError: 6890:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff8807e13a9400 x1489201264516656/t0(0) o37->b19b9b9b-c5dd-6273-0815-b736f2e7ffdc@10.21.22.28@tcp:0/0 lens 448/440 e 0 to 0 dl 1426441811 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 17:45:40 cmds1 kernel: Lustre: MGS: haven't heard from client b594affa-91e5-f967-587a-c806502b0e4f (at 10.21.22.29@tcp) in 242 seconds. I think it's dead, and I am evicting it. exp ffff882d76a16c00, cur 1426441540 expire 1426441390 last 1426441298 Mar 15 17:46:28 cmds1 kernel: LNet: Service thread pid 5184 was inactive for 1200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 17:46:28 cmds1 kernel: Pid: 5184, comm: ll_mgs_0003 Mar 15 17:46:28 cmds1 kernel: Mar 15 17:46:28 cmds1 kernel: Call Trace: Mar 15 17:46:28 cmds1 kernel: [] ? try_to_free_buffers+0x51/0xc0 Mar 15 17:46:28 cmds1 kernel: [] ? jbd2_journal_try_to_free_buffers+0xa7/0x150 [jbd2] Mar 15 17:46:28 cmds1 kernel: [] ? apic_timer_interrupt+0xe/0x20 Mar 15 17:46:28 cmds1 kernel: [] ? bdev_try_to_free_page+0x48/0x90 [ldiskfs] Mar 15 17:46:28 cmds1 kernel: [] ? shrink_page_list.clone.3+0xd0/0x650 Mar 15 17:46:28 cmds1 kernel: [] ? mem_cgroup_lru_del_list+0x2b/0xb0 Mar 15 17:46:28 cmds1 kernel: [] ? isolate_lru_pages.clone.0+0xd7/0x170 Mar 15 17:46:28 cmds1 kernel: [] ? apic_timer_interrupt+0xe/0x20 Mar 15 17:46:28 cmds1 kernel: [] ? shrink_inactive_list+0xdf/0x830 Mar 15 17:46:28 cmds1 kernel: [] ? shrink_active_list+0x297/0x370 Mar 15 17:46:28 cmds1 kernel: [] ? shrink_mem_cgroup_zone+0x3ae/0x610 Mar 15 17:46:28 cmds1 kernel: [] ? mem_cgroup_iter+0xfd/0x280 Mar 15 17:46:28 cmds1 kernel: [] ? shrink_zone+0x63/0xb0 Mar 15 17:46:28 cmds1 kernel: [] ? do_try_to_free_pages+0x115/0x610 Mar 15 17:46:28 cmds1 kernel: [] ? zone_watermark_ok+0x1f/0x30 Mar 15 17:46:28 cmds1 kernel: [] ? try_to_free_pages+0x92/0x120 Mar 15 17:46:28 cmds1 kernel: [] ? __alloc_pages_nodemask+0x478/0x8d0 Mar 15 17:46:28 cmds1 kernel: [] ? kmem_getpages+0x62/0x170 Mar 15 17:46:28 cmds1 kernel: [] ? fallback_alloc+0x1ba/0x270 Mar 15 17:46:28 cmds1 kernel: [] ? cache_grow+0x2cf/0x320 Mar 15 17:46:28 cmds1 kernel: [] ? ____cache_alloc_node+0x99/0x160 Mar 15 17:46:28 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 17:46:28 cmds1 kernel: [] ? __kmalloc+0x189/0x220 Mar 15 17:46:28 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 17:46:28 cmds1 kernel: [] ? null_alloc_rs+0x16f/0x3a0 [ptlrpc] Mar 15 17:46:28 cmds1 kernel: [] ? sptlrpc_svc_alloc_rs+0x74/0x2a0 [ptlrpc] Mar 15 17:46:28 cmds1 kernel: [] ? lustre_pack_reply_v2+0x93/0x280 [ptlrpc] Mar 15 17:46:28 cmds1 kernel: [] ? lustre_pack_reply_flags+0xae/0x1f0 [ptlrpc] Mar 15 17:46:28 cmds1 kernel: [] ? lustre_pack_reply+0x11/0x20 [ptlrpc] Mar 15 17:46:28 cmds1 kernel: [] ? req_capsule_server_pack+0x53/0x100 [ptlrpc] Mar 15 17:46:28 cmds1 kernel: [] ? llog_origin_handle_read_header+0x35c/0x5e0 [ptlrpc] Mar 15 17:46:28 cmds1 kernel: [] ? mgs_handle+0xad4/0x11c0 [mgs] Mar 15 17:46:28 cmds1 kernel: [] ? keys_fill+0x6f/0x190 [obdclass] Mar 15 17:46:28 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 17:46:28 cmds1 kernel: [] ? ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 17:46:28 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 17:46:28 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 17:46:28 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 17:46:28 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 17:46:28 cmds1 kernel: [] ? ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 17:46:28 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 17:46:28 cmds1 kernel: [] ? child_rip+0xa/0x20 Mar 15 17:46:28 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 17:46:28 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 17:46:28 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 17:46:28 cmds1 kernel: Mar 15 17:46:28 cmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1426441588.5184 Mar 15 17:46:42 cmds1 kernel: LustreError: 26607:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff8822b24b7c00 x1489201265130424/t0(0) o37->b19b9b9b-c5dd-6273-0815-b736f2e7ffdc@10.21.22.28@tcp:0/0 lens 448/440 e 0 to 0 dl 1426442104 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 17:53:28 cmds1 kernel: LNet: Service thread pid 7293 was inactive for 286.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 17:53:28 cmds1 kernel: Pid: 7293, comm: mdt00_010 Mar 15 17:53:28 cmds1 kernel: Mar 15 17:53:28 cmds1 kernel: Call Trace: Mar 15 17:53:28 cmds1 kernel: [] ? try_to_free_buffers+0x45/0xc0 Mar 15 17:53:28 cmds1 kernel: [] ? jbd2_journal_try_to_free_buffers+0xa7/0x150 [jbd2] Mar 15 17:53:28 cmds1 kernel: [] ? apic_timer_interrupt+0xe/0x20 Mar 15 17:53:28 cmds1 kernel: [] ? bdev_try_to_free_page+0x48/0x90 [ldiskfs] Mar 15 17:53:28 cmds1 kernel: [] ? blkdev_releasepage+0x36/0x50 Mar 15 17:53:28 cmds1 kernel: [] ? try_to_release_page+0x30/0x60 Mar 15 17:53:28 cmds1 kernel: [] ? shrink_page_list.clone.3+0x517/0x650 Mar 15 17:53:28 cmds1 kernel: [] ? mem_cgroup_lru_del_list+0x2b/0xb0 Mar 15 17:53:28 cmds1 kernel: [] ? isolate_lru_pages.clone.0+0xd7/0x170 Mar 15 17:53:28 cmds1 kernel: [] ? shrink_inactive_list+0x3dc/0x830 Mar 15 17:53:28 cmds1 kernel: [] ? shrink_mem_cgroup_zone+0x3ae/0x610 Mar 15 17:53:28 cmds1 kernel: [] ? mem_cgroup_iter+0xfd/0x280 Mar 15 17:53:28 cmds1 kernel: [] ? shrink_zone+0x63/0xb0 Mar 15 17:53:28 cmds1 kernel: [] ? do_try_to_free_pages+0x115/0x610 Mar 15 17:53:28 cmds1 kernel: [] ? zone_watermark_ok+0x1f/0x30 Mar 15 17:53:28 cmds1 kernel: [] ? try_to_free_pages+0x92/0x120 Mar 15 17:53:28 cmds1 kernel: [] ? next_zone+0x30/0x40 Mar 15 17:53:28 cmds1 kernel: [] ? __alloc_pages_nodemask+0x478/0x8d0 Mar 15 17:53:28 cmds1 kernel: [] ? kmem_getpages+0x62/0x170 Mar 15 17:53:28 cmds1 kernel: [] ? fallback_alloc+0x1ba/0x270 Mar 15 17:53:28 cmds1 kernel: [] ? cache_grow+0x2cf/0x320 Mar 15 17:53:28 cmds1 kernel: [] ? ____cache_alloc_node+0x99/0x160 Mar 15 17:53:28 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 17:53:28 cmds1 kernel: [] ? __kmalloc+0x189/0x220 Mar 15 17:53:28 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 17:53:28 cmds1 kernel: [] ? osd_key_init+0x1e/0x670 [osd_ldiskfs] Mar 15 17:53:28 cmds1 kernel: [] ? keys_fill+0x6f/0x190 [obdclass] Mar 15 17:53:28 cmds1 kernel: [] ? lu_context_init+0xab/0x260 [obdclass] Mar 15 17:53:28 cmds1 kernel: [] ? mdt_intent_layout+0x2fe/0x630 [mdt] Mar 15 17:53:28 cmds1 kernel: [] ? lu_env_init+0x1e/0x30 [obdclass] Mar 15 17:53:28 cmds1 kernel: [] ? mdt_lvbo_fill+0x1ab/0x840 [mdt] Mar 15 17:53:28 cmds1 kernel: [] ? mdt_lvbo_fill+0x0/0x840 [mdt] Mar 15 17:53:28 cmds1 kernel: [] ? ldlm_handle_enqueue0+0x61d/0x10b0 [ptlrpc] Mar 15 17:53:28 cmds1 kernel: [] ? mdt_enqueue+0x46/0xe0 [mdt] Mar 15 17:53:28 cmds1 kernel: [] ? mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 17:53:28 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 17:53:28 cmds1 kernel: [] ? mds_regular_handle+0x15/0x20 [mdt] Mar 15 17:53:28 cmds1 kernel: [] ? ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 17:53:28 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 17:53:28 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 17:53:28 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 17:53:28 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 17:53:28 cmds1 kernel: [] ? ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 17:53:28 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 17:53:28 cmds1 kernel: [] ? child_rip+0xa/0x20 Mar 15 17:53:28 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 17:53:28 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 17:53:28 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 17:53:28 cmds1 kernel: Mar 15 17:53:28 cmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1426442008.7293 Mar 15 17:59:22 cmds1 kernel: LustreError: 0:0:(ldlm_lockd.c:391:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.21.22.27@tcp ns: mdt-charlie-MDT0000_UUID lock: ffff882ddd1a76c0/0xe1fcacc65e04edfd lrc: 3/0,0 mode: PR/PR res: [0x2000148ed:0x1decd:0x0].0 bits 0x2 rrc: 2 type: IBT flags: 0x20 nid: 10.21.22.27@tcp remote: 0x1a02bfd7ddbd5966 expref: 193 pid: 6831 timeout: 6259920594 lvb_type: 0 Mar 15 18:01:49 cmds1 kernel: Lustre: 7354:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-187), not sending early reply Mar 15 18:01:49 cmds1 kernel: req@ffff8808e2262000 x1489201265815448/t0(0) o101->b19b9b9b-c5dd-6273-0815-b736f2e7ffdc@10.21.22.28@tcp:0/0 lens 376/472 e 2 to 0 dl 1426442514 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 18:03:42 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 15 18:03:42 cmds1 kernel: Lustre: Skipped 17 previous similar messages Mar 15 18:03:42 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 4 active RPCs Mar 15 18:03:42 cmds1 kernel: Lustre: Skipped 16 previous similar messages Mar 15 18:04:14 cmds1 kernel: Lustre: 7354:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-66), not sending early reply Mar 15 18:04:14 cmds1 kernel: req@ffff882e40dab800 x1489201265826100/t0(0) o34->b19b9b9b-c5dd-6273-0815-b736f2e7ffdc@10.21.22.28@tcp:0/0 lens 456/3456 e 1 to 0 dl 1426442659 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 18:05:20 cmds1 kernel: Lustre: 7354:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-66), not sending early reply Mar 15 18:05:20 cmds1 kernel: req@ffff8814311eb000 x1489201265829024/t0(0) o101->b19b9b9b-c5dd-6273-0815-b736f2e7ffdc@10.21.22.28@tcp:0/0 lens 576/3448 e 1 to 0 dl 1426442725 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 18:05:22 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 15 18:05:22 cmds1 kernel: Lustre: Skipped 3 previous similar messages Mar 15 18:05:22 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 4 active RPCs Mar 15 18:05:22 cmds1 kernel: Lustre: Skipped 3 previous similar messages Mar 15 18:06:53 cmds1 kernel: Lustre: 5186:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-207), not sending early reply Mar 15 18:06:53 cmds1 kernel: req@ffff885fc43da850 x1495732850786328/t0(0) o503->fba7a744-b245-20e7-f471-95e5b2351285@10.21.22.29@tcp:0/0 lens 272/0 e 5 to 0 dl 1426442818 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Mar 15 18:07:57 cmds1 kernel: LNet: Service thread pid 4563 was inactive for 437.40s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 18:07:57 cmds1 kernel: Pid: 4563, comm: mdt03_032 Mar 15 18:07:57 cmds1 kernel: Mar 15 18:07:57 cmds1 kernel: Call Trace: Mar 15 18:07:57 cmds1 kernel: [] ? enqueue_task_fair+0x64/0x100 Mar 15 18:07:57 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 18:07:57 cmds1 kernel: [] wait_for_common+0x123/0x180 Mar 15 18:07:57 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 18:07:57 cmds1 kernel: [] ? __queue_work+0x41/0x50 Mar 15 18:07:57 cmds1 kernel: [] wait_for_completion+0x1d/0x20 Mar 15 18:07:57 cmds1 kernel: [] call_usermodehelper_exec+0x10c/0x120 Mar 15 18:07:57 cmds1 kernel: [] mdt_identity_do_upcall+0x13d/0x4c0 [mdt] Mar 15 18:07:57 cmds1 kernel: [] upcall_cache_get_entry+0x1b4/0x860 [libcfs] Mar 15 18:07:57 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ? __req_capsule_get+0x166/0x700 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 18:07:57 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 18:07:57 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 18:07:57 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 18:07:57 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 18:07:57 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 18:07:57 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 18:07:57 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 18:07:57 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 18:07:57 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 18:07:57 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 18:07:57 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 18:07:57 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 18:07:57 cmds1 kernel: Mar 15 18:07:57 cmds1 kernel: Pid: 6826, comm: mdt03_004 Mar 15 18:07:57 cmds1 kernel: Mar 15 18:07:57 cmds1 kernel: Call Trace: Mar 15 18:07:57 cmds1 kernel: [] ? enqueue_task_fair+0x64/0x100 Mar 15 18:07:57 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 18:07:57 cmds1 kernel: [] wait_for_common+0x123/0x180 Mar 15 18:07:57 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 18:07:57 cmds1 kernel: [] ? __queue_work+0x41/0x50 Mar 15 18:07:57 cmds1 kernel: [] wait_for_completion+0x1d/0x20 Mar 15 18:07:57 cmds1 kernel: [] call_usermodehelper_exec+0x10c/0x120 Mar 15 18:07:57 cmds1 kernel: [] mdt_identity_do_upcall+0x13d/0x4c0 [mdt] Mar 15 18:07:57 cmds1 kernel: [] upcall_cache_get_entry+0x1b4/0x860 [libcfs] Mar 15 18:07:57 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ? __req_capsule_get+0x166/0x700 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 18:07:57 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 18:07:57 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 18:07:57 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 18:07:57 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 18:07:57 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 18:07:57 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 18:07:57 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 18:07:57 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 18:07:57 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 18:07:57 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 18:07:57 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 18:07:57 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 18:07:57 cmds1 kernel: Mar 15 18:07:57 cmds1 kernel: Pid: 37997, comm: mdt03_042 Mar 15 18:07:57 cmds1 kernel: Mar 15 18:07:57 cmds1 kernel: Call Trace: Mar 15 18:07:57 cmds1 kernel: [] ? _spin_unlock_bh+0x1b/0x20 Mar 15 18:07:57 cmds1 kernel: [] ? ksocknal_queue_tx_locked+0x136/0x530 [ksocklnd] Mar 15 18:07:57 cmds1 kernel: [] ? ksocknal_find_conn_locked+0x159/0x290 [ksocklnd] Mar 15 18:07:57 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 18:07:57 cmds1 kernel: [] cfs_waitq_timedwait+0x11/0x20 [libcfs] Mar 15 18:07:57 cmds1 kernel: [] upcall_cache_get_entry+0x253/0x860 [libcfs] Mar 15 18:07:57 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 18:07:57 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 18:07:57 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 18:07:57 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 18:07:57 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 18:07:57 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 18:07:57 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 18:07:57 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 18:07:57 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 18:07:57 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 18:07:57 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 18:07:57 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 18:07:57 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 18:07:57 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 18:07:57 cmds1 kernel: Mar 15 18:07:57 cmds1 kernel: Pid: 7383, comm: mdt_rdpg03_005 Mar 15 18:07:57 cmds1 kernel: Mar 15 18:07:57 cmds1 kernel: Call Trace: Mar 15 18:07:57 cmds1 kernel: [] ? shrink_inactive_list+0x343/0x830 Mar 15 18:07:57 cmds1 kernel: [] ? shrink_active_list+0x297/0x370 Mar 15 18:07:57 cmds1 kernel: [] shrink_mem_cgroup_zone+0x3ae/0x610 Mar 15 18:07:57 cmds1 kernel: [] ? mem_cgroup_iter+0xfd/0x280 Mar 15 18:07:57 cmds1 kernel: [] shrink_zone+0x63/0xb0 Mar 15 18:07:57 cmds1 kernel: [] do_try_to_free_pages+0x115/0x610 Mar 15 18:07:57 cmds1 kernel: [] ? zone_watermark_ok+0x1f/0x30 Mar 15 18:07:57 cmds1 kernel: [] try_to_free_pages+0x92/0x120 Mar 15 18:07:57 cmds1 kernel: [] ? next_zone+0x30/0x40 Mar 15 18:07:57 cmds1 kernel: [] __alloc_pages_nodemask+0x478/0x8d0 Mar 15 18:07:57 cmds1 kernel: [] kmem_getpages+0x62/0x170 Mar 15 18:07:57 cmds1 kernel: [] fallback_alloc+0x1ba/0x270 Mar 15 18:07:57 cmds1 kernel: [] ? cache_grow+0x2cf/0x320 Mar 15 18:07:57 cmds1 kernel: [] ____cache_alloc_node+0x99/0x160 Mar 15 18:07:57 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 18:07:57 cmds1 kernel: [] __kmalloc+0x189/0x220 Mar 15 18:07:57 cmds1 kernel: [] cfs_alloc+0x30/0x60 [libcfs] Mar 15 18:07:57 cmds1 kernel: [] ptlrpc_new_bulk+0x48/0x280 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ptlrpc_prep_bulk_exp+0x5b/0x180 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ? mdd_dir_page_build+0x0/0x210 [mdd] Mar 15 18:07:57 cmds1 kernel: [] mdt_sendpage+0x6b/0x240 [mdt] Mar 15 18:07:57 cmds1 kernel: [] mdt_readpage+0x497/0x960 [mdt] Mar 15 18:07:57 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 18:07:57 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] mds_readpage_handle+0x15/0x20 [mdt] Mar 15 18:07:57 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 18:07:57 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 18:07:57 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 18:07:57 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 18:07:57 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 18:07:57 cmds1 kernel: Mar 15 18:07:57 cmds1 kernel: Pid: 5105, comm: mdt03_000 Mar 15 18:07:57 cmds1 kernel: Mar 15 18:07:57 cmds1 kernel: Call Trace: Mar 15 18:07:57 cmds1 kernel: [] ? _spin_unlock_bh+0x1b/0x20 Mar 15 18:07:57 cmds1 kernel: [] ? ksocknal_queue_tx_locked+0x136/0x530 [ksocklnd] Mar 15 18:07:57 cmds1 kernel: [] ? ksocknal_find_conn_locked+0x159/0x290 [ksocklnd] Mar 15 18:07:57 cmds1 kernel: [] schedule_timeout+0x215/0x2e0 Mar 15 18:07:57 cmds1 kernel: [] cfs_waitq_timedwait+0x11/0x20 [libcfs] Mar 15 18:07:57 cmds1 kernel: [] upcall_cache_get_entry+0x253/0x860 [libcfs] Mar 15 18:07:57 cmds1 kernel: [] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 18:07:57 cmds1 kernel: [] mdt_identity_get+0x17/0x40 [mdt] Mar 15 18:07:57 cmds1 kernel: [] mdt_init_ucred+0x15b/0x3a0 [mdt] Mar 15 18:07:57 cmds1 kernel: [] mdt_intent_getattr+0x1e1/0x490 [mdt] Mar 15 18:07:57 cmds1 kernel: [] ? mdt_unpack_req_pack_rep+0x230/0x4d0 [mdt] Mar 15 18:07:57 cmds1 kernel: [] mdt_intent_policy+0x39e/0x720 [mdt] Mar 15 18:07:57 cmds1 kernel: [] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 18:07:57 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 18:07:57 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 18:07:57 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 18:07:57 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 18:07:57 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 18:07:57 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 18:07:57 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 18:07:57 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 18:07:57 cmds1 kernel: Mar 15 18:07:57 cmds1 kernel: LNet: Service thread pid 5098 was inactive for 823.20s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 15 18:07:57 cmds1 kernel: LNet: Service thread pid 6826 completed after 449.81s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 18:07:58 cmds1 kernel: Lustre: 5098:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (671:152s); client may timeout. req@ffff8814311eb000 x1489201265829024/t0(0) o101->b19b9b9b-c5dd-6273-0815-b736f2e7ffdc@10.21.22.28@tcp:0/0 lens 576/536 e 1 to 0 dl 1426442725 ref 1 fl Complete:/0/0 rc 0/0 Mar 15 18:07:58 cmds1 kernel: LustreError: 7346:0:(ldlm_lockd.c:1376:ldlm_handle_enqueue0()) ### lock on destroyed export ffff882ec35c7800 ns: mdt-charlie-MDT0000_UUID lock: ffff88157d816b40/0xe1fcacc660913988 lrc: 3/0,0 mode: PR/PR res: [0x2000013a0:0x3:0x0].0 bits 0x13 rrc: 3 type: IBT flags: 0x200000000000 nid: 10.21.22.27@tcp remote: 0x1a02bfd7dd735a7f expref: 4 pid: 7346 timeout: 0 lvb_type: 0 Mar 15 18:07:58 cmds1 kernel: LustreError: 7346:0:(ldlm_lockd.c:1376:ldlm_handle_enqueue0()) Skipped 23 previous similar messages Mar 15 18:08:06 cmds1 kernel: Lustre: 17548:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-423), not sending early reply Mar 15 18:08:06 cmds1 kernel: req@ffff882d9baedc00 x1495260342600992/t0(0) o37->ec1c3acf-010a-7725-01ec-c89056ebc0e4@10.21.22.27@tcp:0/0 lens 448/440 e 2 to 0 dl 1426442891 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 18:08:17 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 15 18:08:17 cmds1 kernel: Lustre: Skipped 6 previous similar messages Mar 15 18:08:17 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 1 active RPCs Mar 15 18:08:17 cmds1 kernel: Lustre: Skipped 6 previous similar messages Mar 15 18:09:11 cmds1 kernel: LustreError: 12227:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff882d9baedc00 x1495260342600992/t0(0) o37->ec1c3acf-010a-7725-01ec-c89056ebc0e4@10.21.22.27@tcp:0/0 lens 448/440 e 2 to 0 dl 1426442891 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 18:09:11 cmds1 kernel: Lustre: 12227:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (1028:60s); client may timeout. req@ffff882d9baedc00 x1495260342600992/t0(0) o37->ec1c3acf-010a-7725-01ec-c89056ebc0e4@10.21.22.27@tcp:0/0 lens 448/408 e 2 to 0 dl 1426442891 ref 1 fl Complete:/0/0 rc -107/-107 Mar 15 18:09:11 cmds1 kernel: Lustre: 12227:0:(service.c:2031:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Mar 15 18:09:11 cmds1 kernel: LNet: Service thread pid 12227 completed after 1087.96s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 18:09:11 cmds1 kernel: LNet: Skipped 6 previous similar messages Mar 15 18:09:17 cmds1 kernel: LustreError: 7383:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff882d92088000 x1495260356445244/t0(0) o37->ec1c3acf-010a-7725-01ec-c89056ebc0e4@10.21.22.27@tcp:0/0 lens 448/440 e 0 to 0 dl 1426442600 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 18:09:17 cmds1 kernel: Lustre: 7383:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (338:357s); client may timeout. req@ffff882d92088000 x1495260356445244/t0(0) o37->ec1c3acf-010a-7725-01ec-c89056ebc0e4@10.21.22.27@tcp:0/0 lens 448/408 e 0 to 0 dl 1426442600 ref 1 fl Complete:/0/0 rc -107/-107 Mar 15 18:10:44 cmds1 kernel: Lustre: 7293:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (792:530s); client may timeout. req@ffff8808e2262000 x1489201265815448/t0(0) o101->b19b9b9b-c5dd-6273-0815-b736f2e7ffdc@10.21.22.28@tcp:0/0 lens 376/368 e 2 to 0 dl 1426442514 ref 1 fl Complete:/0/0 rc 0/0 Mar 15 18:10:44 cmds1 kernel: LNet: Service thread pid 7293 completed after 1322.42s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 15 18:10:44 cmds1 kernel: LNet: Skipped 1 previous similar message Mar 15 18:10:57 cmds1 rshd[38566]: connect second port 1021: Connection refused Mar 15 18:10:57 cmds1 rshd[38570]: connect second port 1020: Connection refused Mar 15 18:12:40 cmds1 kernel: Lustre: MGS: haven't heard from client fba7a744-b245-20e7-f471-95e5b2351285 (at 10.21.22.29@tcp) in 232 seconds. I think it's dead, and I am evicting it. exp ffff883521de6000, cur 1426443160 expire 1426443010 last 1426442928 Mar 15 18:13:32 cmds1 kernel: LustreError: 26607:0:(ldlm_lockd.c:391:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.21.22.28@tcp ns: mdt-charlie-MDT0000_UUID lock: ffff8816947a4900/0xe1fcacc660aa7355 lrc: 3/0,0 mode: PR/PR res: [0x200013f6a:0x1b9bb:0x0].0 bits 0x2 rrc: 2 type: IBT flags: 0x20 nid: 10.21.22.28@tcp remote: 0x478842ac4076926b expref: 11 pid: 7293 timeout: 6260770350 lvb_type: 0 Mar 15 18:14:40 cmds1 kernel: LustreError: 26607:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc -107 req@ffff882eaf43c800 x1489201265902136/t0(0) o37->b19b9b9b-c5dd-6273-0815-b736f2e7ffdc@10.21.22.28@tcp:0/0 lens 448/440 e 3 to 0 dl 1426443300 ref 1 fl Interpret:/0/0 rc 0/0 Mar 15 18:21:18 cmds1 kernel: Lustre: 5186:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Mar 15 18:21:18 cmds1 kernel: req@ffff885fc43d3850 x1495732850786500/t0(0) o503->35cad79c-c81d-fd1f-c012-56600e350d4b@10.21.22.29@tcp:0/0 lens 272/0 e 0 to 0 dl 1426443683 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Mar 15 18:21:24 cmds1 kernel: Lustre: MGS: Client 35cad79c-c81d-fd1f-c012-56600e350d4b (at 10.21.22.29@tcp) reconnecting Mar 15 18:21:24 cmds1 kernel: Lustre: Skipped 7 previous similar messages Mar 15 18:21:24 cmds1 kernel: Lustre: MGS: Client 35cad79c-c81d-fd1f-c012-56600e350d4b (at 10.21.22.29@tcp) refused reconnection, still busy with 1 active RPCs Mar 15 18:21:24 cmds1 kernel: Lustre: Skipped 6 previous similar messages Mar 15 18:25:07 cmds1 kernel: LNet: Service thread pid 37997 was inactive for 422.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 18:25:07 cmds1 kernel: LNet: Skipped 4 previous similar messages Mar 15 18:25:07 cmds1 kernel: Pid: 37997, comm: mdt03_042 Mar 15 18:25:07 cmds1 kernel: Mar 15 18:25:07 cmds1 kernel: Call Trace: Mar 15 18:25:07 cmds1 kernel: [] ? shrink_inactive_list+0x343/0x830 Mar 15 18:25:07 cmds1 kernel: [] ? shrink_active_list+0x297/0x370 Mar 15 18:25:07 cmds1 kernel: [] shrink_mem_cgroup_zone+0x3ae/0x610 Mar 15 18:25:07 cmds1 kernel: [] ? mem_cgroup_iter+0xfd/0x280 Mar 15 18:25:07 cmds1 kernel: [] shrink_zone+0x63/0xb0 Mar 15 18:25:07 cmds1 kernel: [] do_try_to_free_pages+0x115/0x610 Mar 15 18:25:07 cmds1 kernel: [] ? zone_watermark_ok+0x1f/0x30 Mar 15 18:25:07 cmds1 kernel: [] try_to_free_pages+0x92/0x120 Mar 15 18:25:07 cmds1 kernel: [] ? next_zone+0x30/0x40 Mar 15 18:25:07 cmds1 kernel: [] __alloc_pages_nodemask+0x478/0x8d0 Mar 15 18:25:07 cmds1 kernel: [] kmem_getpages+0x62/0x170 Mar 15 18:25:07 cmds1 kernel: [] fallback_alloc+0x1ba/0x270 Mar 15 18:25:07 cmds1 kernel: [] ? cache_grow+0x2cf/0x320 Mar 15 18:25:07 cmds1 kernel: [] ____cache_alloc_node+0x99/0x160 Mar 15 18:25:07 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 18:25:07 cmds1 kernel: [] __kmalloc+0x189/0x220 Mar 15 18:25:07 cmds1 kernel: [] cfs_alloc+0x30/0x60 [libcfs] Mar 15 18:25:07 cmds1 kernel: [] osd_key_init+0x76/0x670 [osd_ldiskfs] Mar 15 18:25:07 cmds1 kernel: [] keys_fill+0x6f/0x190 [obdclass] Mar 15 18:25:07 cmds1 kernel: [] lu_context_init+0xab/0x260 [obdclass] Mar 15 18:25:07 cmds1 kernel: [] ? mdt_intent_layout+0x2fe/0x630 [mdt] Mar 15 18:25:07 cmds1 kernel: [] lu_env_init+0x1e/0x30 [obdclass] Mar 15 18:25:07 cmds1 kernel: [] mdt_lvbo_fill+0x1ab/0x840 [mdt] Mar 15 18:25:07 cmds1 kernel: [] ? mdt_lvbo_fill+0x0/0x840 [mdt] Mar 15 18:25:07 cmds1 kernel: [] ldlm_handle_enqueue0+0x61d/0x10b0 [ptlrpc] Mar 15 18:25:07 cmds1 kernel: [] mdt_enqueue+0x46/0xe0 [mdt] Mar 15 18:25:07 cmds1 kernel: [] mdt_handle_common+0x647/0x16d0 [mdt] Mar 15 18:25:07 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 18:25:07 cmds1 kernel: [] mds_regular_handle+0x15/0x20 [mdt] Mar 15 18:25:07 cmds1 kernel: [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 18:25:07 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 18:25:07 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 18:25:07 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 18:25:07 cmds1 kernel: [] ? __wake_up+0x53/0x70 Mar 15 18:25:07 cmds1 kernel: [] ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 18:25:07 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 18:25:07 cmds1 kernel: [] child_rip+0xa/0x20 Mar 15 18:25:07 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 18:25:07 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 18:25:07 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 18:25:07 cmds1 kernel: Mar 15 18:25:07 cmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1426443907.37997 Mar 15 18:25:14 cmds1 kernel: Lustre: MGS: haven't heard from client 35cad79c-c81d-fd1f-c012-56600e350d4b (at 10.21.22.29@tcp) in 230 seconds. I think it's dead, and I am evicting it. exp ffff8845ed4d7800, cur 1426443914 expire 1426443764 last 1426443684 Mar 15 18:28:48 cmds1 kernel: LNet: Service thread pid 5091 was inactive for 1200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 15 18:28:48 cmds1 kernel: Pid: 5091, comm: ll_mgs_0002 Mar 15 18:28:48 cmds1 kernel: Mar 15 18:28:48 cmds1 kernel: Call Trace: Mar 15 18:28:48 cmds1 kernel: [] ? try_to_free_buffers+0x51/0xc0 Mar 15 18:28:48 cmds1 kernel: [] ? jbd2_journal_try_to_free_buffers+0xa7/0x150 [jbd2] Mar 15 18:28:48 cmds1 kernel: [] ? apic_timer_interrupt+0xe/0x20 Mar 15 18:28:48 cmds1 kernel: [] ? bdev_try_to_free_page+0x48/0x90 [ldiskfs] Mar 15 18:28:48 cmds1 kernel: [] ? shrink_page_list.clone.3+0xd0/0x650 Mar 15 18:28:48 cmds1 kernel: [] ? mem_cgroup_lru_del_list+0x2b/0xb0 Mar 15 18:28:48 cmds1 kernel: [] ? isolate_lru_pages.clone.0+0xd7/0x170 Mar 15 18:28:48 cmds1 kernel: [] ? __pagevec_release+0x26/0x40 Mar 15 18:28:48 cmds1 kernel: [] ? shrink_inactive_list+0x726/0x830 Mar 15 18:28:48 cmds1 kernel: [] ? mem_cgroup_lru_del_list+0x2b/0xb0 Mar 15 18:28:48 cmds1 kernel: [] ? shrink_active_list+0x24c/0x370 Mar 15 18:28:48 cmds1 kernel: [] ? shrink_mem_cgroup_zone+0x3f5/0x610 Mar 15 18:28:48 cmds1 kernel: [] ? mem_cgroup_iter+0xfd/0x280 Mar 15 18:28:48 cmds1 kernel: [] ? shrink_zone+0x63/0xb0 Mar 15 18:28:48 cmds1 kernel: [] ? do_try_to_free_pages+0x115/0x610 Mar 15 18:28:48 cmds1 kernel: [] ? zone_watermark_ok+0x1f/0x30 Mar 15 18:28:48 cmds1 kernel: [] ? try_to_free_pages+0x92/0x120 Mar 15 18:28:48 cmds1 kernel: [] ? __alloc_pages_nodemask+0x478/0x8d0 Mar 15 18:28:48 cmds1 kernel: [] ? kmem_getpages+0x62/0x170 Mar 15 18:28:48 cmds1 kernel: [] ? fallback_alloc+0x1ba/0x270 Mar 15 18:28:48 cmds1 kernel: [] ? cache_grow+0x2cf/0x320 Mar 15 18:28:48 cmds1 kernel: [] ? ____cache_alloc_node+0x99/0x160 Mar 15 18:28:48 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 18:28:48 cmds1 kernel: [] ? __kmalloc+0x189/0x220 Mar 15 18:28:48 cmds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Mar 15 18:28:48 cmds1 kernel: [] ? null_alloc_rs+0x16f/0x3a0 [ptlrpc] Mar 15 18:28:48 cmds1 kernel: [] ? sptlrpc_svc_alloc_rs+0x74/0x2a0 [ptlrpc] Mar 15 18:28:48 cmds1 kernel: [] ? lustre_pack_reply_v2+0x93/0x280 [ptlrpc] Mar 15 18:28:48 cmds1 kernel: [] ? lustre_pack_reply_flags+0xae/0x1f0 [ptlrpc] Mar 15 18:28:48 cmds1 kernel: [] ? lustre_pack_reply+0x11/0x20 [ptlrpc] Mar 15 18:28:48 cmds1 kernel: [] ? req_capsule_server_pack+0x53/0x100 [ptlrpc] Mar 15 18:28:48 cmds1 kernel: [] ? llog_origin_handle_read_header+0x35c/0x5e0 [ptlrpc] Mar 15 18:28:48 cmds1 kernel: [] ? mgs_handle+0xad4/0x11c0 [mgs] Mar 15 18:28:48 cmds1 kernel: [] ? keys_fill+0x6f/0x190 [obdclass] Mar 15 18:28:48 cmds1 kernel: [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] Mar 15 18:28:48 cmds1 kernel: [] ? ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Mar 15 18:28:48 cmds1 kernel: [] ? cfs_timer_arm+0xe/0x10 [libcfs] Mar 15 18:28:48 cmds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Mar 15 18:28:48 cmds1 kernel: [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Mar 15 18:28:48 cmds1 kernel: [] ? default_wake_function+0x0/0x20 Mar 15 18:28:48 cmds1 kernel: [] ? ptlrpc_main+0xace/0x1700 [ptlrpc] Mar 15 18:28:48 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 18:28:48 cmds1 kernel: [] ? child_rip+0xa/0x20 Mar 15 18:28:48 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 18:28:48 cmds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Mar 15 18:28:48 cmds1 kernel: [] ? child_rip+0x0/0x20 Mar 15 18:28:48 cmds1 kernel: Mar 15 18:28:48 cmds1 kernel: LustreError: dumping log to /tmp/lustre-log.1426444128.5091 Mar 15 18:36:53 cmds1 kernel: Lustre: 48292:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-527), not sending early reply Mar 15 18:36:53 cmds1 kernel: req@ffff8823118bf400 x1495260368208888/t0(0) o101->ec1c3acf-010a-7725-01ec-c89056ebc0e4@10.21.22.27@tcp:0/0 lens 376/472 e 2 to 0 dl 1426444617 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 18:39:15 cmds1 kernel: Lustre: 5096:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-139), not sending early reply Mar 15 18:39:15 cmds1 kernel: req@ffff881b32ea4800 x1489201268162696/t0(0) o36->b19b9b9b-c5dd-6273-0815-b736f2e7ffdc@10.21.22.28@tcp:0/0 lens 656/696 e 2 to 0 dl 1426444760 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 18:39:30 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 15 18:39:30 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 15 18:39:30 cmds1 kernel: Lustre: 26607:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-199), not sending early reply Mar 15 18:39:30 cmds1 kernel: req@ffff882fd5cd8050 x1489201267688040/t0(0) o37->b19b9b9b-c5dd-6273-0815-b736f2e7ffdc@10.21.22.28@tcp:0/0 lens 448/440 e 2 to 0 dl 1426444775 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 18:39:40 cmds1 kernel: Lustre: 26607:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-199), not sending early reply Mar 15 18:39:40 cmds1 kernel: req@ffff882db3bb8c00 x1489201267925184/t0(0) o37->b19b9b9b-c5dd-6273-0815-b736f2e7ffdc@10.21.22.28@tcp:0/0 lens 448/440 e 2 to 0 dl 1426444785 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 18:40:03 cmds1 kernel: Lustre: 48292:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-126), not sending early reply Mar 15 18:40:03 cmds1 kernel: req@ffff882f0f345000 x1495260368334308/t0(0) o101->ec1c3acf-010a-7725-01ec-c89056ebc0e4@10.21.22.27@tcp:0/0 lens 576/3448 e 0 to 0 dl 1426444808 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 18:40:03 cmds1 kernel: Lustre: 48292:0:(service.c:1339:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Mar 15 18:40:26 cmds1 kernel: Lustre: 5096:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-139), not sending early reply Mar 15 18:40:26 cmds1 kernel: req@ffff88176d1ab400 x1489201268164388/t0(0) o101->b19b9b9b-c5dd-6273-0815-b736f2e7ffdc@10.21.22.28@tcp:0/0 lens 576/3448 e 2 to 0 dl 1426444831 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 18:40:26 cmds1 kernel: Lustre: 5096:0:(service.c:1339:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Mar 15 18:41:04 cmds1 kernel: Lustre: 48292:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-126), not sending early reply Mar 15 18:41:04 cmds1 kernel: req@ffff88218e444000 x1495260368335976/t0(0) o101->ec1c3acf-010a-7725-01ec-c89056ebc0e4@10.21.22.27@tcp:0/0 lens 576/3448 e 0 to 0 dl 1426444869 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 18:41:04 cmds1 kernel: Lustre: 48292:0:(service.c:1339:ptlrpc_at_send_early_reply()) Skipped 7 previous similar messages Mar 15 18:41:42 cmds1 kernel: Lustre: 48292:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-126), not sending early reply Mar 15 18:41:42 cmds1 kernel: req@ffff881a76fa7400 x1495260368336812/t0(0) o101->ec1c3acf-010a-7725-01ec-c89056ebc0e4@10.21.22.27@tcp:0/0 lens 576/3448 e 0 to 0 dl 1426444907 ref 2 fl Interpret:/0/0 rc 0/0 Mar 15 18:41:42 cmds1 kernel: Lustre: 48292:0:(service.c:1339:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Mar 15 18:49:47 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 15 18:49:47 cmds1 kernel: Lustre: Skipped 45 previous similar messages Mar 15 18:49:47 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 15 18:49:47 cmds1 kernel: Lustre: Skipped 45 previous similar messages Mar 15 18:59:55 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 15 18:59:55 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 18:59:55 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 15 18:59:55 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 19:10:12 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 15 19:10:12 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 19:10:12 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 15 19:10:12 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 19:20:20 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 15 19:20:20 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 19:20:20 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 15 19:20:20 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 19:30:37 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 15 19:30:37 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 19:30:37 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 15 19:30:37 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 19:40:45 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 15 19:40:45 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 19:40:45 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 15 19:40:45 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 19:51:02 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 15 19:51:02 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 19:51:02 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 15 19:51:02 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 20:01:09 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 15 20:01:09 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 20:01:09 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 15 20:01:09 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 20:11:27 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 15 20:11:27 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 20:11:27 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 15 20:11:27 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 20:21:34 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 15 20:21:34 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 20:21:34 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 15 20:21:34 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 20:31:52 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 15 20:31:52 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 20:31:52 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 15 20:31:52 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 20:41:59 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 15 20:41:59 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 20:41:59 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 15 20:41:59 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 20:52:18 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 15 20:52:18 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 20:52:18 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 15 20:52:18 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 21:02:24 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 15 21:02:24 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 21:02:24 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 15 21:02:24 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 21:12:42 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 15 21:12:42 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 21:12:42 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 15 21:12:42 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 21:22:49 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 15 21:22:49 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 21:22:49 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 15 21:22:49 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 21:33:07 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 15 21:33:07 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 21:33:07 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 15 21:33:07 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 21:43:14 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 15 21:43:14 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 21:43:14 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 15 21:43:14 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 21:53:32 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 15 21:53:32 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 21:53:32 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 15 21:53:32 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 22:03:39 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 15 22:03:39 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 22:03:39 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 15 22:03:39 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 22:13:57 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 15 22:13:57 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 22:13:57 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 15 22:13:57 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 22:24:04 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 15 22:24:04 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 22:24:04 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 15 22:24:04 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 22:34:22 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 15 22:34:22 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 22:34:22 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 15 22:34:22 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 22:44:29 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 15 22:44:29 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 22:44:29 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 15 22:44:29 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 22:54:47 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 15 22:54:47 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 22:54:47 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 15 22:54:47 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 23:04:54 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 15 23:04:54 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 23:04:54 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 15 23:04:54 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 23:15:12 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 15 23:15:12 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 23:15:12 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 15 23:15:12 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 23:25:19 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 15 23:25:19 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 23:25:19 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 15 23:25:19 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 23:35:37 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 15 23:35:37 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 23:35:37 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 15 23:35:37 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 23:45:44 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 15 23:45:44 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 23:45:44 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 15 23:45:44 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 23:56:02 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 15 23:56:02 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 15 23:56:02 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 15 23:56:02 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 00:06:09 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 16 00:06:09 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 00:06:09 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 16 00:06:09 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 00:16:27 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 16 00:16:27 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 00:16:27 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 16 00:16:27 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 00:26:34 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 16 00:26:34 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 00:26:34 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 16 00:26:34 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 00:36:52 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 16 00:36:52 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 00:36:52 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 16 00:36:52 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 00:46:59 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 16 00:46:59 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 00:46:59 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 16 00:46:59 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 00:57:17 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 16 00:57:17 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 00:57:17 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 16 00:57:17 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 01:07:24 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 16 01:07:24 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 01:07:24 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 16 01:07:24 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 01:17:42 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 16 01:17:42 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 01:17:42 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 16 01:17:42 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 01:27:49 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 16 01:27:49 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 01:27:49 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 16 01:27:49 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 01:38:07 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 16 01:38:07 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 01:38:07 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 16 01:38:07 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 01:48:14 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 16 01:48:14 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 01:48:14 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 16 01:48:14 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 01:58:32 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 16 01:58:32 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 01:58:32 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 16 01:58:32 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 02:08:39 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 16 02:08:39 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 02:08:39 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 16 02:08:39 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 02:18:57 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 16 02:18:57 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 02:18:57 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 16 02:18:57 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 02:29:04 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 16 02:29:04 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 02:29:04 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 16 02:29:04 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 02:39:22 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 16 02:39:22 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 02:39:22 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 16 02:39:22 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 02:49:29 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 16 02:49:29 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 02:49:29 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 16 02:49:29 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 02:59:47 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 16 02:59:47 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 02:59:47 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 16 02:59:47 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 03:09:54 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 16 03:09:54 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 03:09:54 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 16 03:09:54 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 03:20:12 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 16 03:20:12 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 03:20:12 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 16 03:20:12 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 03:30:19 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 16 03:30:19 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 03:30:19 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 16 03:30:19 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 03:40:37 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 16 03:40:37 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 03:40:37 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 16 03:40:37 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 03:50:44 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 16 03:50:44 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 03:50:44 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 16 03:50:44 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 04:01:02 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 16 04:01:02 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 04:01:02 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 16 04:01:02 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 04:11:09 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 16 04:11:09 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 04:11:09 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 16 04:11:09 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 04:21:27 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 16 04:21:27 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 04:21:27 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 16 04:21:27 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 04:31:34 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 16 04:31:34 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 04:31:34 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 16 04:31:34 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 04:41:52 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 16 04:41:52 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 04:41:52 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 16 04:41:52 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 04:51:59 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 16 04:51:59 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 04:51:59 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 16 04:51:59 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 05:02:17 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 16 05:02:17 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 05:02:17 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 16 05:02:17 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 05:12:24 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 16 05:12:24 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 05:12:24 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 16 05:12:24 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 05:22:42 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 16 05:22:42 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 05:22:42 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 16 05:22:42 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 05:32:49 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 16 05:32:49 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 05:32:49 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 16 05:32:49 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 05:43:07 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 16 05:43:07 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 05:43:07 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 16 05:43:07 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 05:53:14 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 16 05:53:14 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 05:53:14 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 16 05:53:14 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 06:03:32 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 16 06:03:32 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 06:03:32 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 16 06:03:32 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 06:13:39 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 16 06:13:39 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 06:13:39 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 16 06:13:39 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 06:23:57 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 16 06:23:57 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 06:23:57 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 16 06:23:57 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 06:34:04 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 16 06:34:04 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 06:34:04 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 16 06:34:04 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 06:44:22 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 16 06:44:22 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 06:44:22 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 16 06:44:22 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 06:54:29 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 16 06:54:29 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 06:54:29 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 16 06:54:29 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 07:04:47 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 16 07:04:47 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 07:04:47 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 16 07:04:47 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 07:14:54 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 16 07:14:54 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 07:14:54 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 16 07:14:54 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 07:25:11 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 16 07:25:12 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 07:25:12 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 16 07:25:12 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 07:35:19 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 16 07:35:19 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 07:35:19 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 16 07:35:19 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 07:45:36 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 16 07:45:36 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 07:45:36 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 16 07:45:36 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 07:55:44 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 16 07:55:44 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 07:55:44 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 16 07:55:44 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 08:06:01 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 16 08:06:01 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 08:06:01 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 16 08:06:01 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 08:16:09 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 16 08:16:09 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 08:16:09 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 16 08:16:09 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 08:26:26 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 16 08:26:26 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 08:26:26 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 16 08:26:26 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 08:36:34 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 16 08:36:34 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 08:36:34 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 16 08:36:34 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 08:46:51 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 16 08:46:51 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 08:46:51 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 16 08:46:51 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 08:56:58 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 16 08:56:59 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 08:56:59 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 16 08:56:59 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 09:07:16 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 16 09:07:16 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 09:07:16 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 16 09:07:16 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 09:17:23 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 16 09:17:23 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 09:17:23 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 16 09:17:23 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 09:27:41 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) reconnecting Mar 16 09:27:41 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 09:27:41 cmds1 kernel: Lustre: charlie-MDT0000: Client b19b9b9b-c5dd-6273-0815-b736f2e7ffdc (at 10.21.22.28@tcp) refused reconnection, still busy with 12 active RPCs Mar 16 09:27:41 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 09:37:48 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) reconnecting Mar 16 09:37:48 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 09:37:48 cmds1 kernel: Lustre: charlie-MDT0000: Client ec1c3acf-010a-7725-01ec-c89056ebc0e4 (at 10.21.22.27@tcp) refused reconnection, still busy with 32 active RPCs Mar 16 09:37:48 cmds1 kernel: Lustre: Skipped 48 previous similar messages Mar 16 09:44:05 cmds1 kernel: imklog 5.8.10, log source = /proc/kmsg started. Mar 16 09:44:05 cmds1 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="4659" x-info="http://www.rsyslog.com"] start