hd-client-00: Jan 27 20:29:20 hd-client-00 kernel: INFO: task IOR:31430 blocked for more than 120 seconds. Jan 27 20:29:20 hd-client-00 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 27 20:29:20 hd-client-00 kernel: IOR D 000000000000000b 0 31430 31407 0x00000080 Jan 27 20:29:20 hd-client-00 kernel: ffff8804813cfe08 0000000000000086 0000000000000000 ffff88100c872040 Jan 27 20:29:20 hd-client-00 kernel: ffff88100c872040 ffffffffa08c38e2 ffff88100c872040 00000010f1569fe0 Jan 27 20:29:20 hd-client-00 kernel: ffff88100c8725f8 ffff8804813cffd8 000000000000fb88 ffff88100c8725f8 Jan 27 20:29:20 hd-client-00 kernel: Call Trace: Jan 27 20:29:20 hd-client-00 kernel: [] ? ldlm_resource_iterate+0x82/0x1e0 [ptlrpc] Jan 27 20:29:20 hd-client-00 kernel: [] __mutex_lock_slowpath+0x13e/0x180 Jan 27 20:29:20 hd-client-00 kernel: [] mutex_lock+0x2b/0x50 Jan 27 20:29:20 hd-client-00 kernel: [] do_unlinkat+0x96/0x1c0 Jan 27 20:29:20 hd-client-00 kernel: [] ? audit_syscall_entry+0x272/0x2a0 Jan 27 20:29:20 hd-client-00 kernel: [] ? do_page_fault+0x3e/0xa0 Jan 27 20:29:20 hd-client-00 kernel: [] sys_unlink+0x16/0x20 Jan 27 20:29:20 hd-client-00 kernel: [] system_call_fastpath+0x16/0x1b Jan 27 20:29:20 hd-client-00 kernel: INFO: task IOR:31448 blocked for more than 120 seconds. Jan 27 20:29:20 hd-client-00 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 27 20:29:20 hd-client-00 kernel: IOR D 000000000000000c 0 31448 31425 0x00000080 Jan 27 20:29:20 hd-client-00 kernel: ffff880977df3e08 0000000000000086 0000000000000000 ffff88100c89a080 Jan 27 20:29:20 hd-client-00 kernel: ffff88100c89a080 ffffffffa08c38e2 ffff88100c89a080 00000010266acfe0 Jan 27 20:29:20 hd-client-00 kernel: ffff88100c89a638 ffff880977df3fd8 000000000000fb88 ffff88100c89a638 Jan 27 20:29:20 hd-client-00 kernel: Call Trace: Jan 27 20:29:20 hd-client-00 kernel: [] ? ldlm_resource_iterate+0x82/0x1e0 [ptlrpc] Jan 27 20:29:20 hd-client-00 kernel: [] __mutex_lock_slowpath+0x13e/0x180 Jan 27 20:29:20 hd-client-00 kernel: [] mutex_lock+0x2b/0x50 Jan 27 20:29:20 hd-client-00 kernel: [] do_unlinkat+0x96/0x1c0 Jan 27 20:29:20 hd-client-00 kernel: [] ? audit_syscall_entry+0x272/0x2a0 Jan 27 20:29:20 hd-client-00 kernel: [] ? do_page_fault+0x3e/0xa0 Jan 27 20:29:20 hd-client-00 kernel: [] sys_unlink+0x16/0x20 Jan 27 20:29:20 hd-client-00 kernel: [] system_call_fastpath+0x16/0x1b ? Jan 27 22:38:43 hd-client-00 kernel: Modules linked in: lmv(U) mgc(U) lustre(U) lov(U) osc(U) lquota(U) mdc(U) fid(U) fld(U) ptlrpc(U) nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 bnx2fc fcoe libfcoe libfc 8021q scsi_transport_fc garp stp scsi_tgt llc sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad mlx4_ib iw_cxgb4 iw_cxgb3 ko2iblnd(U) rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr obdclass(U) lnet(U) lvfs(U) libcfs(U) acpi_pad power_meter dcdbas sb_edac edac_core iTCO_wdt iTCO_vendor_support shpchp sg ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif wmi mpt2sas scsi_transport_sas raid_class ahci mlx4_en mlx4_core megaraid_sas tg3 dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: scsi_wait_scan] Jan 27 22:38:43 hd-client-00 kernel: Modules linked in: lmv(U) mgc(U) lustre(U) lov(U) osc(U) lquota(U) mdc(U) fid(U) fld(U) ptlrpc(U) nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 bnx2fc fcoe libfcoe libfc 8021q scsi_transport_fc garp stp scsi_tgt llc sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad mlx4_ib iw_cxgb4 iw_cxgb3 ko2iblnd(U) rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr obdclass(U) lnet(U) lvfs(U) libcfs(U) acpi_pad power_meter dcdbas sb_edac edac_core iTCO_wdt iTCO_vendor_support shpchp sg ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif wmi mpt2sas scsi_transport_sas raid_class ahci mlx4_en mlx4_core megaraid_sas tg3 dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: scsi_wait_scan] Jan 27 22:38:43 hd-client-00 kernel: Pid: 6834, comm: ldlm_bl_08 Not tainted 2.6.32-279.2.1.el6.x86_64 #1 Dell Inc. PowerEdge R620/0KCKR5 Jan 27 22:38:43 hd-client-00 kernel: RIP: 0010:[] [] lprocfs_counter_sub+0x46/0x1c6 [lvfs] Jan 27 22:38:43 hd-client-00 kernel: RSP: 0018:ffff880badc9ba40 EFLAGS: 00000246 Jan 27 22:38:43 hd-client-00 kernel: RAX: 0000000000000002 RBX: ffff880badc9ba60 RCX: ffff880aab5cabe0 Jan 27 22:38:43 hd-client-00 kernel: RDX: 0000000000000028 RSI: 0000000000000000 RDI: ffff881011bca800 Jan 27 22:38:43 hd-client-00 kernel: RBP: ffffffff8100bc0e R08: 5a5a5a5a5a5a5a5a R09: 5a5a5a5a5a5a5a5a Jan 27 22:38:43 hd-client-00 kernel: R10: 0000000000000000 R11: 5a5a5a5a5a5a5a5a R12: 0000000000000018 Jan 27 22:38:43 hd-client-00 kernel: R13: ffff88201899e300 R14: ffffea0026429168 R15: ffffffff81127c5f Jan 27 22:38:43 hd-client-00 kernel: FS: 00007f3c4a744700(0000) GS:ffff881078880000(0000) knlGS:0000000000000000 Jan 27 22:38:43 hd-client-00 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Jan 27 22:38:43 hd-client-00 kernel: CR2: 000000000281e000 CR3: 000000019cf04000 CR4: 00000000000406e0 Jan 27 22:38:43 hd-client-00 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jan 27 22:38:43 hd-client-00 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Jan 27 22:38:43 hd-client-00 kernel: Process ldlm_bl_08 (pid: 6834, threadinfo ffff880badc9a000, task ffff880a72270aa0) Jan 27 22:38:43 hd-client-00 kernel: Stack: Jan 27 22:38:43 hd-client-00 kernel: ffff88108e811880 ffff88123b5de930 ffff88201899e0c0 ffff88019c5e84c8 Jan 27 22:38:43 hd-client-00 kernel: ffff880badc9ba80 ffffffffa0b11713 ffff880badc9ba80 ffff8809f8c21140 Jan 27 22:38:43 hd-client-00 kernel: ffff880badc9bac0 ffffffffa04bc1b3 ffff880badc9bac0 ffff8809f8c21140 Jan 27 22:38:43 hd-client-00 kernel: Call Trace: Jan 27 22:38:43 hd-client-00 kernel: [] ? lovsub_page_fini+0x43/0x210 [lov] Jan 27 22:38:43 hd-client-00 kernel: [] ? cl_page_free+0x103/0x5f0 [obdclass] Jan 27 22:38:43 hd-client-00 kernel: [] ? cl_page_put+0x2ba/0x580 [obdclass] Jan 27 22:38:43 hd-client-00 kernel: [] ? lov_page_fini+0x65/0x2a0 [lov] Jan 27 22:38:43 hd-client-00 kernel: [] ? cl_page_free+0x103/0x5f0 [obdclass] Jan 27 22:38:43 hd-client-00 kernel: [] ? cl_page_put+0x2ba/0x580 [obdclass] Jan 27 22:38:43 hd-client-00 kernel: [] ? unlock_page+0x27/0x30 Jan 27 22:38:43 hd-client-00 kernel: [] ? vvp_page_disown+0x33/0xb0 [lustre] Jan 27 22:38:43 hd-client-00 kernel: [] ? cl_page_disown0+0x88/0x180 [obdclass] Jan 27 22:38:43 hd-client-00 kernel: [] ? cl_page_list_disown+0xa8/0x1d0 [obdclass] Jan 27 22:38:43 hd-client-00 kernel: [] ? cl_2queue_disown+0x46/0x130 [obdclass] Jan 27 22:38:43 hd-client-00 kernel: [] ? cl_lock_page_out+0x1af/0x330 [obdclass] Jan 27 22:38:43 hd-client-00 kernel: [] ? pageout_cb+0x0/0x100 [obdclass] Jan 27 22:38:43 hd-client-00 kernel: [] ? osc_lock_flush+0x4f/0x90 [osc] Jan 27 22:38:43 hd-client-00 kernel: [] ? osc_lock_cancel+0x59/0x190 [osc] Jan 27 22:38:43 hd-client-00 kernel: [] ? cl_env_nested_get+0x5d/0xc0 [obdclass] Jan 27 22:38:43 hd-client-00 kernel: [] ? cl_lock_cancel0+0x75/0x160 [obdclass] Jan 27 22:38:43 hd-client-00 kernel: [] ? cl_lock_cancel+0x13b/0x140 [obdclass] Jan 27 22:38:43 hd-client-00 kernel: [] ? osc_ldlm_blocking_ast+0x13a/0x380 [osc] Jan 27 22:38:43 hd-client-00 kernel: [] ? ldlm_handle_bl_callback+0x123/0x2e0 [ptlrpc] Jan 27 22:38:43 hd-client-00 kernel: [] ? ldlm_bl_thread_main+0x281/0x3d0 [ptlrpc] Jan 27 22:38:43 hd-client-00 kernel: [] ? default_wake_function+0x0/0x20 Jan 27 22:38:43 hd-client-00 kernel: [] ? ldlm_bl_thread_main+0x0/0x3d0 [ptlrpc] Jan 27 22:38:43 hd-client-00 kernel: [] ? child_rip+0xa/0x20 Jan 27 22:38:43 hd-client-00 kernel: [] ? ldlm_bl_thread_main+0x0/0x3d0 [ptlrpc] Jan 27 22:38:43 hd-client-00 kernel: [] ? ldlm_bl_thread_main+0x0/0x3d0 [ptlrpc] Jan 27 22:38:43 hd-client-00 kernel: [] ? child_rip+0x0/0x20 Jan 27 22:38:43 hd-client-00 kernel: Code: 0f 1f 44 00 00 48 85 ff 48 89 fb 41 89 f4 49 89 d5 0f 84 a0 00 00 00 8b 47 04 a8 01 0f 85 b3 00 00 00 65 44 8b 34 25 b8 e0 00 00 <41> 83 c6 01 44 89 f0 48 83 7c c7 10 00 0f 84 08 01 00 00 31 c0 Jan 27 22:38:43 hd-client-00 kernel: Call Trace: Jan 27 22:38:43 hd-client-00 kernel: [] ? lovsub_page_fini+0x43/0x210 [lov] Jan 27 22:38:43 hd-client-00 kernel: [] ? cl_page_free+0x103/0x5f0 [obdclass] Jan 27 22:38:43 hd-client-00 kernel: [] ? cl_page_put+0x2ba/0x580 [obdclass] Jan 27 22:38:43 hd-client-00 kernel: [] ? lov_page_fini+0x65/0x2a0 [lov] Jan 27 22:38:43 hd-client-00 kernel: [] ? cl_page_free+0x103/0x5f0 [obdclass] Jan 27 22:38:43 hd-client-00 kernel: [] ? cl_page_put+0x2ba/0x580 [obdclass] Jan 27 22:38:43 hd-client-00 kernel: [] ? unlock_page+0x27/0x30 Jan 27 22:38:43 hd-client-00 kernel: [] ? vvp_page_disown+0x33/0xb0 [lustre] Jan 27 22:38:43 hd-client-00 kernel: [] ? cl_page_disown0+0x88/0x180 [obdclass] Jan 27 22:38:43 hd-client-00 kernel: [] ? cl_page_list_disown+0xa8/0x1d0 [obdclass] Jan 27 22:38:43 hd-client-00 kernel: [] ? cl_2queue_disown+0x46/0x130 [obdclass] Jan 27 22:38:43 hd-client-00 kernel: [] ? cl_lock_page_out+0x1af/0x330 [obdclass] Jan 27 22:38:43 hd-client-00 kernel: [] ? pageout_cb+0x0/0x100 [obdclass] Jan 27 22:38:43 hd-client-00 kernel: [] ? osc_lock_flush+0x4f/0x90 [osc] Jan 27 22:38:43 hd-client-00 kernel: [] ? osc_lock_cancel+0x59/0x190 [osc] Jan 27 22:38:43 hd-client-00 kernel: [] ? cl_env_nested_get+0x5d/0xc0 [obdclass] Jan 27 22:38:43 hd-client-00 kernel: [] ? cl_lock_cancel0+0x75/0x160 [obdclass] Jan 27 22:38:43 hd-client-00 kernel: [] ? cl_lock_cancel+0x13b/0x140 [obdclass] Jan 27 22:38:43 hd-client-00 kernel: [] ? osc_ldlm_blocking_ast+0x13a/0x380 [osc] Jan 27 22:38:43 hd-client-00 kernel: [] ? ldlm_handle_bl_callback+0x123/0x2e0 [ptlrpc] Jan 27 22:38:43 hd-client-00 kernel: [] ? ldlm_bl_thread_main+0x281/0x3d0 [ptlrpc] Jan 27 22:38:43 hd-client-00 kernel: [] ? default_wake_function+0x0/0x20 Jan 27 22:38:43 hd-client-00 kernel: [] ? ldlm_bl_thread_main+0x0/0x3d0 [ptlrpc] Jan 27 22:38:43 hd-client-00 kernel: [] ? child_rip+0xa/0x20 Jan 27 22:38:43 hd-client-00 kernel: [] ? ldlm_bl_thread_main+0x0/0x3d0 [ptlrpc] Jan 27 22:38:43 hd-client-00 kernel: [] ? ldlm_bl_thread_main+0x0/0x3d0 [ptlrpc] Jan 27 22:38:43 hd-client-00 kernel: [] ? child_rip+0x0/0x20 ? Jan 28 11:18:43 hd-client-00 python2.6: mpiexec_hd-client-00 (mpiexec 392): no msg recvd from mpd when expecting ack of request Jan 28 11:18:44 hd-client-00 mpdman: hd-client-00_mpdman_0: mpd_uncaught_except_tb handling:#012 : [Errno 111] Connection refused#012 /usr/bin/mpdlib.py 411 connect#012 raise socket.error, errinfo#012 /usr/bin/mpdman.py 220 run#012 self.conSock.connect((self.conIfhn,self.conPort))#012 /usr/bin/mpd.py 1581 launch_mpdman_via_fork#012 mpdman.run()#012 /usr/bin/mpd.py 1482 run_one_cli#012 (manPid,toManSock) = self.launch_mpdman_via_fork(msg,man_env)#012 /usr/bin/mpd.py 1353 do_mpdrun#012 self.run_one_cli(lorank,msg)#012 /usr/bin/mpd.py 643 handle_console_input#012 self.do_mpdrun(msg)#012 /usr/bin/mpdlib.py 780 handle_active_streams#012 handler(stream,*args)#012 /usr/bin/mpd.py 301 runmainloop#012 rv = self.streamHandler.handle_active_streams(timeout=8.0)#012 /usr/bin/mpd.py 270 run#012 self.runmainloop()#012 /usr/bin/mpd.py 1643 #012 mpd.run()#012 mpd_cli_app=/home/IOR/src/C/IOR#012 cwd=/home/mpt/scripts Jan 28 11:18:44 hd-client-00 mpdman: hd-client-00_mpdman_3 (run 287): invalid msg from lhs; expecting ringsize got: {} Jan 28 11:18:44 hd-client-00 mpdman: hd-client-00_mpdman_6 (run 287): invalid msg from lhs; expecting ringsize got: {} Jan 28 11:18:44 hd-client-00 mpdman: hd-client-00_mpdman_9 (run 287): invalid msg from lhs; expecting ringsize got: {} Jan 28 11:18:44 hd-client-00 mpdman: hd-client-00_mpdman_12 (run 287): invalid msg from lhs; expecting ringsize got: {} Jan 28 11:18:44 hd-client-00 mpdman: hd-client-00_mpdman_15 (run 287): invalid msg from lhs; expecting ringsize got: {} Jan 28 11:18:44 hd-client-00 mpdman: hd-client-00_mpdman_18 (run 287): invalid msg from lhs; expecting ringsize got: {} Jan 28 11:18:44 hd-client-00 mpdman: hd-client-00_mpdman_21 (run 287): invalid msg from lhs; expecting ringsize got: {} Jan 28 11:18:44 hd-client-00 mpdman: hd-client-00_mpdman_24 (run 287): invalid msg from lhs; expecting ringsize got: {} hd-client-01: Jan 28 11:18:44 hd-client-01 mpdman: hd-client-01_mpdman_1 (run 287): invalid msg from lhs; expecting ringsize got: {} Jan 28 11:18:44 hd-client-01 mpdman: hd-client-01_mpdman_4 (run 287): invalid msg from lhs; expecting ringsize got: {} Jan 28 11:18:44 hd-client-01 mpdman: hd-client-01_mpdman_7 (run 287): invalid msg from lhs; expecting ringsize got: {} Jan 28 11:18:44 hd-client-01 mpdman: hd-client-01_mpdman_10 (run 287): invalid msg from lhs; expecting ringsize got: {} Jan 28 11:18:44 hd-client-01 mpdman: hd-client-01_mpdman_13 (run 287): invalid msg from lhs; expecting ringsize got: {} Jan 28 11:18:44 hd-client-01 mpdman: hd-client-01_mpdman_16 (run 287): invalid msg from lhs; expecting ringsize got: {} Jan 28 11:18:44 hd-client-01 mpdman: hd-client-01_mpdman_19 (run 287): invalid msg from lhs; expecting ringsize got: {} Jan 28 11:18:44 hd-client-01 mpdman: hd-client-01_mpdman_22 (run 287): invalid msg from lhs; expecting ringsize got: {} Jan 28 11:18:44 hd-client-01 mpdman: hd-client-01_mpdman_25 (run 287): invalid msg from lhs; expecting ringsize got: {} ? Jan 29 20:42:22 hd-client-01 kernel: LustreError: 11402:0:(ldlm_request.c:1172:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway Jan 29 20:42:22 hd-client-01 kernel: LustreError: 11402:0:(ldlm_request.c:1172:ldlm_cli_cancel_req()) Skipped 177 previous similar messages Jan 29 20:42:22 hd-client-01 kernel: LustreError: 11402:0:(ldlm_request.c:1799:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 Jan 29 20:42:22 hd-client-01 kernel: LustreError: 11402:0:(ldlm_request.c:1799:ldlm_cli_cancel_list()) Skipped 177 previous similar messages Jan 29 20:42:22 hd-client-01 kernel: Lustre: Unmounted lustre-client Jan 29 21:15:13 hd-client-01 kernel: Lustre: 13135:0:(o2iblnd.c:2714:kiblnd_shutdown()) 192.168.2.118@o2ib: waiting for 1 peers to disconnect Jan 29 21:15:17 hd-client-01 kernel: Lustre: 13135:0:(o2iblnd.c:2714:kiblnd_shutdown()) 192.168.2.118@o2ib: waiting for 1 peers to disconnect Jan 29 21:15:25 hd-client-01 kernel: Lustre: 13135:0:(o2iblnd.c:2714:kiblnd_shutdown()) 192.168.2.118@o2ib: waiting for 1 peers to disconnect Jan 29 21:15:40 hd-client-01 mpd: hd-client-01_33470 (handle_rhs_input 1234): lost rhs; re-entering ring Jan 29 21:15:41 hd-client-01 kernel: Lustre: 13135:0:(o2iblnd.c:2714:kiblnd_shutdown()) 192.168.2.118@o2ib: waiting for 1 peers to disconnect Jan 29 21:16:13 hd-client-01 kernel: Lustre: 13135:0:(o2iblnd.c:2714:kiblnd_shutdown()) 192.168.2.118@o2ib: waiting for 1 peers to disconnect Jan 29 21:16:15 hd-client-01 mpd: hd-client-01_33470 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:16:15 hd-client-01 mpd: hd-client-01_33470 (enter_ring 873): lhs connect failed Jan 29 21:16:18 hd-client-01 mpd: hd-client-01_33470 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:16:18 hd-client-01 mpd: hd-client-01_33470 (enter_ring 873): lhs connect failed Jan 29 21:16:21 hd-client-01 mpd: hd-client-01_33470 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:16:21 hd-client-01 mpd: hd-client-01_33470 (enter_ring 873): lhs connect failed Jan 29 21:16:24 hd-client-01 mpd: hd-client-01_33470 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:16:24 hd-client-01 mpd: hd-client-01_33470 (enter_ring 873): lhs connect failed Jan 29 21:16:28 hd-client-01 mpd: hd-client-01_33470 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:16:28 hd-client-01 mpd: hd-client-01_33470 (enter_ring 873): lhs connect failed Jan 29 21:16:29 hd-client-01 kernel: Lustre: Removed LNI 192.168.2.118@o2ib Jan 29 21:16:30 hd-client-01 mpd: hd-client-01_33470 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:16:30 hd-client-01 mpd: hd-client-01_33470 (enter_ring 873): lhs connect failed Jan 29 21:16:34 hd-client-01 mpd: hd-client-01_33470 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:16:34 hd-client-01 mpd: hd-client-01_33470 (enter_ring 873): lhs connect failed Jan 29 21:16:37 hd-client-01 mpd: hd-client-01_33470 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:16:37 hd-client-01 mpd: hd-client-01_33470 (enter_ring 873): lhs connect failed Jan 29 21:16:40 hd-client-01 mpd: hd-client-01_33470 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:16:40 hd-client-01 mpd: hd-client-01_33470 (enter_ring 873): lhs connect failed Jan 29 21:16:43 hd-client-01 mpd: hd-client-01_33470 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:16:43 hd-client-01 mpd: hd-client-01_33470 (enter_ring 873): lhs connect failed Jan 29 21:16:48 hd-client-01 mpd: hd-client-01_33470 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:16:48 hd-client-01 mpd: hd-client-01_33470 (enter_ring 873): lhs connect failed Jan 29 21:16:50 hd-client-01 mpd: hd-client-01_33470 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:16:50 hd-client-01 mpd: hd-client-01_33470 (enter_ring 873): lhs connect failed Jan 29 21:16:54 hd-client-01 mpd: hd-client-01_33470 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:16:54 hd-client-01 mpd: hd-client-01_33470 (enter_ring 873): lhs connect failed Jan 29 21:16:58 hd-client-01 mpd: hd-client-01_33470 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:16:58 hd-client-01 mpd: hd-client-01_33470 (enter_ring 873): lhs connect failed Jan 29 21:17:02 hd-client-01 mpd: hd-client-01_33470 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:17:02 hd-client-01 mpd: hd-client-01_33470 (enter_ring 873): lhs connect failed Jan 29 21:17:05 hd-client-01 mpd: hd-client-01_33470 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:17:05 hd-client-01 mpd: hd-client-01_33470 (enter_ring 873): lhs connect failed Jan 29 21:17:06 hd-client-01 mpd: hd-client-01_33470 (reenter_ring 843): reenter_ring rc=-1 after numTries=16 Jan 29 21:17:06 hd-client-01 mpd: hd-client-01_33470 (handle_rhs_input 1241): failed to reenter ring Jan 29 21:17:06 hd-client-01 mpd: mpd ending mpdid=hd-client-01_33470 (inside cleanup) hd-client-02: Jan 28 04:22:44 hd-client-02 kernel: INFO: task IOR:14549 blocked for more than 120 seconds. Jan 28 04:22:44 hd-client-02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 28 04:22:44 hd-client-02 kernel: IOR D 0000000000000000 0 14549 14487 0x00000080 Jan 28 04:22:44 hd-client-02 kernel: ffff880e023f7e08 0000000000000082 0000000000000000 ffff88010a6ec080 Jan 28 04:22:44 hd-client-02 kernel: ffff88010a6ec080 ffffffffa07a68e2 ffff88010a6ec080 00000010f2ef8540 Jan 28 04:22:44 hd-client-02 kernel: ffff88010a6ec638 ffff880e023f7fd8 000000000000fb88 ffff88010a6ec638 Jan 28 04:22:44 hd-client-02 kernel: Call Trace: Jan 28 04:22:44 hd-client-02 kernel: [] ? ldlm_resource_iterate+0x82/0x1e0 [ptlrpc] Jan 28 04:22:44 hd-client-02 kernel: [] __mutex_lock_slowpath+0x13e/0x180 Jan 28 04:22:44 hd-client-02 kernel: [] mutex_lock+0x2b/0x50 Jan 28 04:22:44 hd-client-02 kernel: [] do_unlinkat+0x96/0x1c0 Jan 28 04:22:44 hd-client-02 kernel: [] ? audit_syscall_entry+0x272/0x2a0 Jan 28 04:22:44 hd-client-02 kernel: [] ? do_page_fault+0x3e/0xa0 Jan 28 04:22:44 hd-client-02 kernel: [] sys_unlink+0x16/0x20 Jan 28 04:22:44 hd-client-02 kernel: [] system_call_fastpath+0x16/0x1b Jan 28 04:22:44 hd-client-02 kernel: INFO: task IOR:14553 blocked for more than 120 seconds. Jan 28 04:22:44 hd-client-02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 28 04:22:44 hd-client-02 kernel: IOR D 0000000000000003 0 14553 14491 0x00000080 Jan 28 04:22:44 hd-client-02 kernel: ffff881000281e08 0000000000000082 0000000000000000 ffff88039a354ae0 Jan 28 04:22:44 hd-client-02 kernel: ffff88039a354ae0 ffffffffa07a68e2 ffff88039a354ae0 00000010f837c540 Jan 28 04:22:44 hd-client-02 kernel: ffff88039a355098 ffff881000281fd8 000000000000fb88 ffff88039a355098 Jan 28 04:22:44 hd-client-02 kernel: Call Trace: Jan 28 04:22:44 hd-client-02 kernel: [] ? ldlm_resource_iterate+0x82/0x1e0 [ptlrpc] Jan 28 04:22:44 hd-client-02 kernel: [] __mutex_lock_slowpath+0x13e/0x180 Jan 28 04:22:44 hd-client-02 kernel: [] mutex_lock+0x2b/0x50 Jan 28 04:22:44 hd-client-02 kernel: [] do_unlinkat+0x96/0x1c0 Jan 28 04:22:44 hd-client-02 kernel: [] ? audit_syscall_entry+0x272/0x2a0 Jan 28 04:22:44 hd-client-02 kernel: [] ? do_page_fault+0x3e/0xa0 Jan 28 04:22:44 hd-client-02 kernel: [] sys_unlink+0x16/0x20 Jan 28 04:22:44 hd-client-02 kernel: [] system_call_fastpath+0x16/0x1b ? Jan 28 11:18:44 hd-client-02 mpdman: hd-client-02_mpdman_2 (run 287): invalid msg from lhs; expecting ringsize got: {} Jan 28 11:18:44 hd-client-02 mpdman: hd-client-02_mpdman_5 (run 287): invalid msg from lhs; expecting ringsize got: {} Jan 28 11:18:44 hd-client-02 mpdman: hd-client-02_mpdman_8 (run 287): invalid msg from lhs; expecting ringsize got: {} Jan 28 11:18:44 hd-client-02 mpdman: hd-client-02_mpdman_11 (run 287): invalid msg from lhs; expecting ringsize got: {} Jan 28 11:18:44 hd-client-02 mpdman: hd-client-02_mpdman_14 (run 287): invalid msg from lhs; expecting ringsize got: {} Jan 28 11:18:44 hd-client-02 mpdman: hd-client-02_mpdman_17 (run 287): invalid msg from lhs; expecting ringsize got: {} Jan 28 11:18:44 hd-client-02 mpdman: hd-client-02_mpdman_20 (run 287): invalid msg from lhs; expecting ringsize got: {} Jan 28 11:18:44 hd-client-02 mpdman: hd-client-02_mpdman_23 (run 287): invalid msg from lhs; expecting ringsize got: {} ? Jan 29 21:15:40 hd-client-02 mpd: hd-client-02_36675 (runmainloop 320): no pulse_ack from rhs; re-entering ring Jan 29 21:15:43 hd-client-02 kernel: Lustre: 7570:0:(o2iblnd.c:2714:kiblnd_shutdown()) 192.168.2.128@o2ib: waiting for 1 peers to disconnect Jan 29 21:15:47 hd-client-02 kernel: Lustre: 7570:0:(o2iblnd.c:2714:kiblnd_shutdown()) 192.168.2.128@o2ib: waiting for 1 peers to disconnect Jan 29 21:15:50 hd-client-02 mpd: hd-client-02_36675 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:15:50 hd-client-02 mpd: hd-client-02_36675 (enter_ring 873): lhs connect failed Jan 29 21:15:54 hd-client-02 mpd: hd-client-02_36675 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:15:54 hd-client-02 mpd: hd-client-02_36675 (enter_ring 873): lhs connect failed Jan 29 21:15:55 hd-client-02 kernel: Lustre: 7570:0:(o2iblnd.c:2714:kiblnd_shutdown()) 192.168.2.128@o2ib: waiting for 1 peers to disconnect Jan 29 21:15:57 hd-client-02 mpd: hd-client-02_36675 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:15:57 hd-client-02 mpd: hd-client-02_36675 (enter_ring 873): lhs connect failed Jan 29 21:16:00 hd-client-02 mpd: hd-client-02_36675 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:16:00 hd-client-02 mpd: hd-client-02_36675 (enter_ring 873): lhs connect failed Jan 29 21:16:03 hd-client-02 mpd: hd-client-02_36675 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:16:03 hd-client-02 mpd: hd-client-02_36675 (enter_ring 873): lhs connect failed Jan 29 21:16:07 hd-client-02 mpd: hd-client-02_36675 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:16:07 hd-client-02 mpd: hd-client-02_36675 (enter_ring 873): lhs connect failed Jan 29 21:16:11 hd-client-02 kernel: Lustre: 7570:0:(o2iblnd.c:2714:kiblnd_shutdown()) 192.168.2.128@o2ib: waiting for 1 peers to disconnect Jan 29 21:16:11 hd-client-02 mpd: hd-client-02_36675 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:16:11 hd-client-02 mpd: hd-client-02_36675 (enter_ring 873): lhs connect failed Jan 29 21:16:14 hd-client-02 mpd: hd-client-02_36675 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:16:14 hd-client-02 mpd: hd-client-02_36675 (enter_ring 873): lhs connect failed Jan 29 21:16:18 hd-client-02 kernel: Lustre: Removed LNI 192.168.2.128@o2ib Jan 29 21:16:18 hd-client-02 mpd: hd-client-02_36675 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:16:18 hd-client-02 mpd: hd-client-02_36675 (enter_ring 873): lhs connect failed Jan 29 21:16:21 hd-client-02 mpd: hd-client-02_36675 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:16:21 hd-client-02 mpd: hd-client-02_36675 (enter_ring 873): lhs connect failed Jan 29 21:16:25 hd-client-02 mpd: hd-client-02_36675 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:16:25 hd-client-02 mpd: hd-client-02_36675 (enter_ring 873): lhs connect failed Jan 29 21:16:28 hd-client-02 mpd: hd-client-02_36675 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:16:28 hd-client-02 mpd: hd-client-02_36675 (enter_ring 873): lhs connect failed Jan 29 21:16:31 hd-client-02 mpd: hd-client-02_36675 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:16:31 hd-client-02 mpd: hd-client-02_36675 (enter_ring 873): lhs connect failed Jan 29 21:16:34 hd-client-02 mpd: hd-client-02_36675 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:16:34 hd-client-02 mpd: hd-client-02_36675 (enter_ring 873): lhs connect failed Jan 29 21:16:39 hd-client-02 mpd: hd-client-02_36675 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:16:39 hd-client-02 mpd: hd-client-02_36675 (enter_ring 873): lhs connect failed Jan 29 21:16:42 hd-client-02 mpd: hd-client-02_36675 (connect_lhs 918): failed to connect to lhs at hd-client-00 45154 Jan 29 21:16:42 hd-client-02 mpd: hd-client-02_36675 (enter_ring 873): lhs connect failed Jan 29 21:16:42 hd-client-02 mpd: hd-client-02_36675 (reenter_ring 843): reenter_ring rc=-1 after numTries=16 Jan 29 21:16:42 hd-client-02 mpd: hd-client-02_36675 (runmainloop 327): failed to reenter ring Jan 29 21:16:42 hd-client-02 mpd: mpd ending mpdid=hd-client-02_36675 (inside cleanup)