Sep 13 08:03:40 medusa-mds1 kernel: Lustre: MGS: Client 6f7e8f1a-31f9-89a5-35ff-098a97797da6 (at 192.249.7.6@tcp9) refused reconnection, still busy with 1 active RPCs Sep 13 08:03:40 medusa-mds1 kernel: Lustre: MGS: Client 6f7e8f1a-31f9-89a5-35ff-098a97797da6 (at 192.249.7.6@tcp9) refused reconnection, still busy with 1 active RPCs Sep 13 08:03:40 medusa-mds1 kernel: Lustre: MGS: Client 6f7e8f1a-31f9-89a5-35ff-098a97797da6 (at 192.249.7.6@tcp9) refused reconnection, still busy with 1 active RPCs Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 5225:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1410606968/real 0] req@ffff880fc3b07400 x1475708372625592/t0(0) o13->medusa-OST0019-osc@172.16.1.8@o2ib:7/4 lens 224/368 e 0 to 1 dl 1410606975 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Sep 13 08:03:40 medusa-mds1 kernel: Lustre: medusa-OST0019-osc: Connection to medusa-OST0019 (at 172.16.1.8@o2ib) was lost; in progress operations using this service will wait for recovery to complete Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 5225:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1410606968/real 1410606968] req@ffff880b53fddc00 x1475708372625588/t0(0) o13->medusa-OST0028-osc@172.16.1.2@o2ib:7/4 lens 224/368 e 0 to 1 dl 1410606975 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Sep 13 08:03:40 medusa-mds1 kernel: Lustre: medusa-OST0028-osc: Connection to medusa-OST0028 (at 172.16.1.2@o2ib) was lost; in progress operations using this service will wait for recovery to complete Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 5223:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1410606969/real 0] req@ffff880a12b11c00 x1475708372625640/t0(0) o13->medusa-OST0035-osc@172.16.1.3@o2ib:7/4 lens 224/368 e 0 to 1 dl 1410606976 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Sep 13 08:03:40 medusa-mds1 kernel: Lustre: medusa-OST0035-osc: Connection to medusa-OST0035 (at 172.16.1.3@o2ib) was lost; in progress operations using this service will wait for recovery to complete Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 5220:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1410606970/real 0] req@ffff880fff167800 x1475708372625748/t0(0) o13->medusa-OST0011-osc@172.16.1.8@o2ib:7/4 lens 224/368 e 0 to 1 dl 1410606977 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 5220:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 6 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: Lustre: medusa-OST0011-osc: Connection to medusa-OST0011 (at 172.16.1.8@o2ib) was lost; in progress operations using this service will wait for recovery to complete Sep 13 08:03:40 medusa-mds1 kernel: Lustre: Skipped 6 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 5214:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1410606972/real 0] req@ffff88095fad8800 x1475708372625908/t0(0) o13->medusa-OST001d-osc@172.16.1.8@o2ib:7/4 lens 224/368 e 0 to 1 dl 1410606979 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 5214:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 17 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: Lustre: medusa-OST001d-osc: Connection to medusa-OST001d (at 172.16.1.8@o2ib) was lost; in progress operations using this service will wait for recovery to complete Sep 13 08:03:40 medusa-mds1 kernel: Lustre: Skipped 17 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 5222:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1410606979/real 0] req@ffff880cd0be6400 x1475708372626216/t0(0) o13->medusa-OST0055-osc@172.16.1.12@o2ib:7/4 lens 224/368 e 0 to 1 dl 1410606986 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 5222:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: Lustre: medusa-OST0055-osc: Connection to medusa-OST0055 (at 172.16.1.12@o2ib) was lost; in progress operations using this service will wait for recovery to complete Sep 13 08:03:40 medusa-mds1 kernel: Lustre: Skipped 9 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: LNetError: 2702:0:(o2iblnd_cb.c:3012:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Sep 13 08:03:40 medusa-mds1 kernel: LNetError: 2702:0:(o2iblnd_cb.c:3075:kiblnd_check_conns()) Timed out RDMA with 172.16.1.3@o2ib (101): c: 51, oc: 0, rc: 63 Sep 13 08:03:40 medusa-mds1 kernel: LNetError: 2702:0:(o2iblnd_cb.c:3012:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Sep 13 08:03:40 medusa-mds1 kernel: LNetError: 2702:0:(o2iblnd_cb.c:3075:kiblnd_check_conns()) Timed out RDMA with 172.16.1.5@o2ib (101): c: 52, oc: 0, rc: 63 Sep 13 08:03:40 medusa-mds1 kernel: LNetError: 2702:0:(o2iblnd_cb.c:3012:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Sep 13 08:03:40 medusa-mds1 kernel: LNetError: 2702:0:(o2iblnd_cb.c:3075:kiblnd_check_conns()) Timed out RDMA with 172.16.1.8@o2ib (102): c: 52, oc: 0, rc: 63 Sep 13 08:03:40 medusa-mds1 kernel: LNetError: 2702:0:(o2iblnd_cb.c:3012:kiblnd_check_txs_locked()) Timed out tx: active_txs, 6 seconds Sep 13 08:03:40 medusa-mds1 kernel: LNetError: 2702:0:(o2iblnd_cb.c:3012:kiblnd_check_txs_locked()) Skipped 2 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: LNetError: 2702:0:(o2iblnd_cb.c:3075:kiblnd_check_conns()) Timed out RDMA with 172.16.26.1@o2ib (110): c: 61, oc: 0, rc: 63 Sep 13 08:03:40 medusa-mds1 kernel: LNetError: 2702:0:(o2iblnd_cb.c:3075:kiblnd_check_conns()) Skipped 2 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: LNetError: 2702:0:(o2iblnd_cb.c:3012:kiblnd_check_txs_locked()) Timed out tx: active_txs, 4 seconds Sep 13 08:03:40 medusa-mds1 kernel: LNetError: 2702:0:(o2iblnd_cb.c:3075:kiblnd_check_conns()) Timed out RDMA with 172.16.32.1@o2ib (104): c: 58, oc: 0, rc: 63 Sep 13 08:03:40 medusa-mds1 kernel: Lustre: lock timed out (enqueued at 1410606883, 200s ago) Sep 13 08:03:40 medusa-mds1 kernel: LNetError: 2702:0:(o2iblnd_cb.c:3012:kiblnd_check_txs_locked()) Timed out tx: active_txs, 20 seconds Sep 13 08:03:40 medusa-mds1 kernel: LNetError: 2702:0:(o2iblnd_cb.c:3012:kiblnd_check_txs_locked()) Skipped 8 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: LNetError: 2702:0:(o2iblnd_cb.c:3075:kiblnd_check_conns()) Timed out RDMA with 172.16.23.2@o2ib (120): c: 61, oc: 0, rc: 63 Sep 13 08:03:40 medusa-mds1 kernel: LNetError: 2702:0:(o2iblnd_cb.c:3075:kiblnd_check_conns()) Skipped 8 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: LNetError: 2702:0:(o2iblnd_cb.c:3012:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds Sep 13 08:03:40 medusa-mds1 kernel: LNetError: 2702:0:(o2iblnd_cb.c:3012:kiblnd_check_txs_locked()) Skipped 10 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: LNetError: 2702:0:(o2iblnd_cb.c:3075:kiblnd_check_conns()) Timed out RDMA with 172.16.30.1@o2ib (141): c: 62, oc: 0, rc: 63 Sep 13 08:03:40 medusa-mds1 kernel: LNetError: 2702:0:(o2iblnd_cb.c:3075:kiblnd_check_conns()) Skipped 10 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: Lustre: lock timed out (enqueued at 1410606909, 200s ago) Sep 13 08:03:40 medusa-mds1 kernel: INFO: task kswapd0:178 blocked for more than 120 seconds. Sep 13 08:03:40 medusa-mds1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Sep 13 08:03:40 medusa-mds1 kernel: kswapd0 D 0000000000000006 0 178 2 0x00000000 Sep 13 08:03:40 medusa-mds1 kernel: ffff8808322a1a80 0000000000000046 ffffea0001cc0f98 ffff8808322a1a50 Sep 13 08:03:40 medusa-mds1 kernel: ffffea0000fe4ed8 ffff8808322a1b50 0000000000000020 000000000000001f Sep 13 08:03:40 medusa-mds1 kernel: ffff88083226fab8 ffff8808322a1fd8 000000000000fb88 ffff88083226fab8 Sep 13 08:03:40 medusa-mds1 kernel: Call Trace: Sep 13 08:03:40 medusa-mds1 kernel: [] ? prepare_to_wait+0x4e/0x80 Sep 13 08:03:40 medusa-mds1 kernel: [] start_this_handle+0x27a/0x4a0 [jbd2] Sep 13 08:03:40 medusa-mds1 kernel: [] ? autoremove_wake_function+0x0/0x40 Sep 13 08:03:40 medusa-mds1 kernel: [] jbd2_journal_start+0xd0/0x110 [jbd2] Sep 13 08:03:40 medusa-mds1 kernel: [] ldiskfs_journal_start_sb+0x56/0xe0 [ldiskfs] Sep 13 08:03:40 medusa-mds1 kernel: [] ldiskfs_dquot_drop+0x34/0x80 [ldiskfs] Sep 13 08:03:40 medusa-mds1 kernel: [] vfs_dq_drop+0x52/0x60 Sep 13 08:03:40 medusa-mds1 kernel: [] clear_inode+0x93/0x140 Sep 13 08:03:40 medusa-mds1 kernel: [] dispose_list+0x40/0x120 Sep 13 08:03:40 medusa-mds1 kernel: [] shrink_icache_memory+0x274/0x2e0 Sep 13 08:03:40 medusa-mds1 kernel: [] shrink_slab+0x12a/0x1a0 Sep 13 08:03:40 medusa-mds1 kernel: [] balance_pgdat+0x59a/0x820 Sep 13 08:03:40 medusa-mds1 kernel: [] kswapd+0x134/0x3c0 Sep 13 08:03:40 medusa-mds1 kernel: [] ? autoremove_wake_function+0x0/0x40 Sep 13 08:03:40 medusa-mds1 kernel: [] ? kswapd+0x0/0x3c0 Sep 13 08:03:40 medusa-mds1 kernel: [] kthread+0x96/0xa0 Sep 13 08:03:40 medusa-mds1 kernel: [] child_rip+0xa/0x20 Sep 13 08:03:40 medusa-mds1 kernel: [] ? kthread+0x0/0xa0 Sep 13 08:03:40 medusa-mds1 kernel: [] ? child_rip+0x0/0x20 Sep 13 08:03:40 medusa-mds1 kernel: INFO: task kswapd1:179 blocked for more than 120 seconds. Sep 13 08:03:40 medusa-mds1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Sep 13 08:03:40 medusa-mds1 kernel: kswapd1 D 000000000000000b 0 179 2 0x00000000 Sep 13 08:03:40 medusa-mds1 kernel: ffff8808322a5a80 0000000000000046 ffffea0029909788 ffff8808322a5a50 Sep 13 08:03:40 medusa-mds1 kernel: ffffea0033a3f310 ffff8808322a5b50 0000000000000020 0000000000000017 Sep 13 08:03:40 medusa-mds1 kernel: ffff88083226f058 ffff8808322a5fd8 000000000000fb88 ffff88083226f058 Sep 13 08:03:40 medusa-mds1 kernel: Call Trace: Sep 13 08:03:40 medusa-mds1 kernel: [] ? prepare_to_wait+0x4e/0x80 Sep 13 08:03:40 medusa-mds1 kernel: [] start_this_handle+0x27a/0x4a0 [jbd2] Sep 13 08:03:40 medusa-mds1 kernel: [] ? autoremove_wake_function+0x0/0x40 Sep 13 08:03:40 medusa-mds1 kernel: [] jbd2_journal_start+0xd0/0x110 [jbd2] Sep 13 08:03:40 medusa-mds1 kernel: [] ldiskfs_journal_start_sb+0x56/0xe0 [ldiskfs] Sep 13 08:03:40 medusa-mds1 kernel: [] ldiskfs_dquot_drop+0x34/0x80 [ldiskfs] Sep 13 08:03:40 medusa-mds1 kernel: [] vfs_dq_drop+0x52/0x60 Sep 13 08:03:40 medusa-mds1 kernel: [] clear_inode+0x93/0x140 Sep 13 08:03:40 medusa-mds1 kernel: [] dispose_list+0x40/0x120 Sep 13 08:03:40 medusa-mds1 kernel: [] shrink_icache_memory+0x274/0x2e0 Sep 13 08:03:40 medusa-mds1 kernel: [] shrink_slab+0x12a/0x1a0 Sep 13 08:03:40 medusa-mds1 kernel: [] balance_pgdat+0x59a/0x820 Sep 13 08:03:40 medusa-mds1 kernel: [] kswapd+0x134/0x3c0 Sep 13 08:03:40 medusa-mds1 kernel: [] ? autoremove_wake_function+0x0/0x40 Sep 13 08:03:40 medusa-mds1 kernel: [] ? kswapd+0x0/0x3c0 Sep 13 08:03:40 medusa-mds1 kernel: [] kthread+0x96/0xa0 Sep 13 08:03:40 medusa-mds1 kernel: [] child_rip+0xa/0x20 Sep 13 08:03:40 medusa-mds1 kernel: [] ? kthread+0x0/0xa0 Sep 13 08:03:40 medusa-mds1 kernel: [] ? child_rip+0x0/0x20 Sep 13 08:03:40 medusa-mds1 kernel: INFO: task rsyslogd:2452 blocked for more than 120 seconds. Sep 13 08:03:40 medusa-mds1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Sep 13 08:03:40 medusa-mds1 kernel: rsyslogd D 0000000000000003 0 2452 1 0x00000000 Sep 13 08:03:40 medusa-mds1 kernel: ffff880829a8d5a8 0000000000000086 ffffea0001079cf8 ffff880829a8d578 Sep 13 08:03:40 medusa-mds1 kernel: ffffea0001f7ecc0 ffff880829a8d678 0000000000000020 000000000000001e Sep 13 08:03:40 medusa-mds1 kernel: ffff880831b33098 ffff880829a8dfd8 000000000000fb88 ffff880831b33098 Sep 13 08:03:40 medusa-mds1 kernel: Call Trace: Sep 13 08:03:40 medusa-mds1 kernel: [] ? prepare_to_wait+0x4e/0x80 Sep 13 08:03:40 medusa-mds1 kernel: [] start_this_handle+0x27a/0x4a0 [jbd2] Sep 13 08:03:40 medusa-mds1 kernel: [] ? cache_alloc_refill+0x15b/0x240 Sep 13 08:03:40 medusa-mds1 kernel: [] ? autoremove_wake_function+0x0/0x40 Sep 13 08:03:40 medusa-mds1 kernel: [] jbd2_journal_start+0xd0/0x110 [jbd2] Sep 13 08:03:40 medusa-mds1 kernel: [] ldiskfs_journal_start_sb+0x56/0xe0 [ldiskfs] Sep 13 08:03:40 medusa-mds1 kernel: [] ldiskfs_dquot_drop+0x34/0x80 [ldiskfs] Sep 13 08:03:40 medusa-mds1 kernel: [] vfs_dq_drop+0x52/0x60 Sep 13 08:03:40 medusa-mds1 kernel: [] clear_inode+0x93/0x140 Sep 13 08:03:40 medusa-mds1 kernel: [] dispose_list+0x40/0x120 Sep 13 08:03:40 medusa-mds1 kernel: [] shrink_icache_memory+0x274/0x2e0 Sep 13 08:03:40 medusa-mds1 kernel: [] shrink_slab+0x12a/0x1a0 Sep 13 08:03:40 medusa-mds1 kernel: [] do_try_to_free_pages+0x3f7/0x610 Sep 13 08:03:40 medusa-mds1 kernel: [] ? zone_watermark_ok+0x1f/0x30 Sep 13 08:03:40 medusa-mds1 kernel: [] try_to_free_pages+0x92/0x120 Sep 13 08:03:40 medusa-mds1 kernel: [] __alloc_pages_nodemask+0x478/0x8d0 Sep 13 08:03:40 medusa-mds1 kernel: [] alloc_pages_current+0xaa/0x110 Sep 13 08:03:40 medusa-mds1 kernel: [] __page_cache_alloc+0x87/0x90 Sep 13 08:03:40 medusa-mds1 kernel: [] __do_page_cache_readahead+0xdb/0x210 Sep 13 08:03:40 medusa-mds1 kernel: [] ra_submit+0x21/0x30 Sep 13 08:03:40 medusa-mds1 kernel: [] filemap_fault+0x4c3/0x500 Sep 13 08:03:40 medusa-mds1 kernel: [] __do_fault+0x54/0x530 Sep 13 08:03:40 medusa-mds1 kernel: [] handle_pte_fault+0xf7/0xb50 Sep 13 08:03:40 medusa-mds1 kernel: [] ? __switch_to+0x1ac/0x320 Sep 13 08:03:40 medusa-mds1 kernel: [] ? selinux_capable+0x46/0x60 Sep 13 08:03:40 medusa-mds1 kernel: [] handle_mm_fault+0x23a/0x310 Sep 13 08:03:40 medusa-mds1 kernel: [] __do_page_fault+0x139/0x480 Sep 13 08:03:40 medusa-mds1 kernel: [] ? pde_users_dec+0x25/0x60 Sep 13 08:03:40 medusa-mds1 kernel: [] do_page_fault+0x3e/0xa0 Sep 13 08:03:40 medusa-mds1 kernel: [] page_fault+0x25/0x30 Sep 13 08:03:40 medusa-mds1 kernel: INFO: task irqbalance:2460 blocked for more than 120 seconds. Sep 13 08:03:40 medusa-mds1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Sep 13 08:03:40 medusa-mds1 kernel: irqbalance D 000000000000000f 0 2460 1 0x00000000 Sep 13 08:03:40 medusa-mds1 kernel: ffff880832807648 0000000000000086 ffffea000292d420 ffff880832807618 Sep 13 08:03:40 medusa-mds1 kernel: ffffea0001b40a60 ffff880832807718 0000000000000020 0000000000000014 Sep 13 08:03:40 medusa-mds1 kernel: ffff880831b945f8 ffff880832807fd8 000000000000fb88 ffff880831b945f8 Sep 13 08:03:40 medusa-mds1 kernel: Call Trace: Sep 13 08:03:40 medusa-mds1 kernel: [] ? prepare_to_wait+0x4e/0x80 Sep 13 08:03:40 medusa-mds1 kernel: [] start_this_handle+0x27a/0x4a0 [jbd2] Sep 13 08:03:40 medusa-mds1 kernel: [] ? autoremove_wake_function+0x0/0x40 Sep 13 08:03:40 medusa-mds1 kernel: [] jbd2_journal_start+0xd0/0x110 [jbd2] Sep 13 08:03:40 medusa-mds1 kernel: [] ldiskfs_journal_start_sb+0x56/0xe0 [ldiskfs] Sep 13 08:03:40 medusa-mds1 kernel: [] ldiskfs_dquot_drop+0x34/0x80 [ldiskfs] Sep 13 08:03:40 medusa-mds1 kernel: [] vfs_dq_drop+0x52/0x60 Sep 13 08:03:40 medusa-mds1 kernel: [] clear_inode+0x93/0x140 Sep 13 08:03:40 medusa-mds1 kernel: [] dispose_list+0x40/0x120 Sep 13 08:03:40 medusa-mds1 kernel: [] shrink_icache_memory+0x274/0x2e0 Sep 13 08:03:40 medusa-mds1 kernel: [] shrink_slab+0x12a/0x1a0 Sep 13 08:03:40 medusa-mds1 kernel: [] do_try_to_free_pages+0x3f7/0x610 Sep 13 08:03:40 medusa-mds1 kernel: [] ? zone_watermark_ok+0x1f/0x30 Sep 13 08:03:40 medusa-mds1 kernel: [] try_to_free_pages+0x92/0x120 Sep 13 08:03:40 medusa-mds1 kernel: [] __alloc_pages_nodemask+0x478/0x8d0 Sep 13 08:03:40 medusa-mds1 kernel: [] kmem_getpages+0x62/0x170 Sep 13 08:03:40 medusa-mds1 kernel: [] fallback_alloc+0x1ba/0x270 Sep 13 08:03:40 medusa-mds1 kernel: [] ? cache_grow+0x2cf/0x320 Sep 13 08:03:40 medusa-mds1 kernel: [] ____cache_alloc_node+0x99/0x160 Sep 13 08:03:40 medusa-mds1 kernel: [] ? stat_open+0x56/0xc0 Sep 13 08:03:40 medusa-mds1 kernel: [] __kmalloc+0x189/0x220 Sep 13 08:03:40 medusa-mds1 kernel: [] stat_open+0x56/0xc0 Sep 13 08:03:40 medusa-mds1 kernel: [] proc_reg_open+0x9a/0x160 Sep 13 08:03:40 medusa-mds1 kernel: [] ? single_release+0x0/0x40 Sep 13 08:03:40 medusa-mds1 kernel: [] ? proc_reg_open+0x0/0x160 Sep 13 08:03:40 medusa-mds1 kernel: [] __dentry_open+0x10a/0x360 Sep 13 08:03:40 medusa-mds1 kernel: [] ? selinux_inode_permission+0x72/0xb0 Sep 13 08:03:40 medusa-mds1 kernel: [] ? security_inode_permission+0x1f/0x30 Sep 13 08:03:40 medusa-mds1 kernel: [] nameidata_to_filp+0x54/0x70 Sep 13 08:03:40 medusa-mds1 kernel: [] do_filp_open+0x6d0/0xdc0 Sep 13 08:03:40 medusa-mds1 kernel: [] ? unlink_anon_vmas+0x71/0xd0 Sep 13 08:03:40 medusa-mds1 kernel: [] ? cpumask_any_but+0x31/0x50 Sep 13 08:03:40 medusa-mds1 kernel: [] ? unmap_region+0x110/0x130 Sep 13 08:03:40 medusa-mds1 kernel: [] ? alloc_fd+0x92/0x160 Sep 13 08:03:40 medusa-mds1 kernel: [] do_sys_open+0x69/0x140 Sep 13 08:03:40 medusa-mds1 kernel: [] sys_open+0x20/0x30 Sep 13 08:03:40 medusa-mds1 kernel: [] system_call_fastpath+0x16/0x1b Sep 13 08:03:40 medusa-mds1 kernel: INFO: task rpcbind:2474 blocked for more than 120 seconds. Sep 13 08:03:40 medusa-mds1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Sep 13 08:03:40 medusa-mds1 kernel: rpcbind D 0000000000000006 0 2474 1 0x00000000 Sep 13 08:03:40 medusa-mds1 kernel: ffff880833349378 0000000000000086 ffffea0001e93e58 ffff880833349348 Sep 13 08:03:40 medusa-mds1 kernel: ffffea00004d94f8 ffff880833349448 0000000000000020 0000000000000000 Sep 13 08:03:40 medusa-mds1 kernel: ffff880832d6c5f8 ffff880833349fd8 000000000000fb88 ffff880832d6c5f8 Sep 13 08:03:40 medusa-mds1 kernel: Call Trace: Sep 13 08:03:40 medusa-mds1 kernel: [] ? prepare_to_wait+0x4e/0x80 Sep 13 08:03:40 medusa-mds1 kernel: [] start_this_handle+0x27a/0x4a0 [jbd2] Sep 13 08:03:40 medusa-mds1 kernel: [] ? cache_alloc_refill+0x15b/0x240 Sep 13 08:03:40 medusa-mds1 kernel: [] ? autoremove_wake_function+0x0/0x40 Sep 13 08:03:40 medusa-mds1 kernel: [] jbd2_journal_start+0xd0/0x110 [jbd2] Sep 13 08:03:40 medusa-mds1 kernel: [] ldiskfs_journal_start_sb+0x56/0xe0 [ldiskfs] Sep 13 08:03:40 medusa-mds1 kernel: [] ldiskfs_dquot_drop+0x34/0x80 [ldiskfs] Sep 13 08:03:40 medusa-mds1 kernel: [] vfs_dq_drop+0x52/0x60 Sep 13 08:03:40 medusa-mds1 kernel: [] clear_inode+0x93/0x140 Sep 13 08:03:40 medusa-mds1 kernel: [] dispose_list+0x40/0x120 Sep 13 08:03:40 medusa-mds1 kernel: unevictable:0 dirty:1 writeback:0 unstable:0 Sep 13 08:03:40 medusa-mds1 kernel: free:65795 slab_reclaimable:594809 slab_unreclaimable:1127556 Sep 13 08:03:40 medusa-mds1 kernel: mapped:14 shmem:72121 pagetables:610 bounce:0 Sep 13 08:03:40 medusa-mds1 kernel: Node 0 DMA free:15736kB min:28kB low:32kB high:40kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15348kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Sep 13 08:03:40 medusa-mds1 kernel: lowmem_reserve[]: 0 2959 32249 32249 Sep 13 08:03:40 medusa-mds1 kernel: lowmem_reserve[]: 0 0 29290 29290 Sep 13 08:03:40 medusa-mds1 kernel: Node 0 Normal free:58444kB min:59440kB low:74300kB high:89160kB active_anon:176920kB inactive_anon:121156kB active_file:13261452kB inactive_file:13260644kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:29992960kB mlocked:0kB dirty:4kB writeback:0kB mapped:52kB shmem:229268kB slab_reclaimable:891660kB slab_unreclaimable:1703360kB kernel_stack:7096kB pagetables:1856kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:886048 all_unreclaimable? no Sep 13 08:03:40 medusa-mds1 kernel: lowmem_reserve[]: 0 0 0 0 Sep 13 08:03:40 medusa-mds1 kernel: Node 1 Normal free:65852kB min:65592kB low:81988kB high:98388kB active_anon:41348kB inactive_anon:30456kB active_file:14236476kB inactive_file:14235792kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:33095680kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:56480kB slab_reclaimable:1183560kB slab_unreclaimable:2786132kB kernel_stack:4448kB pagetables:584kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1056032 all_unreclaimable? no Sep 13 08:03:40 medusa-mds1 kernel: lowmem_reserve[]: 0 0 0 0 Sep 13 08:03:40 medusa-mds1 kernel: Node 0 DMA: 0*4kB 1*8kB 1*16kB 1*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15736kB Sep 13 08:03:40 medusa-mds1 kernel: Node 0 DMA32: 6836*4kB 5244*8kB 2016*16kB 414*32kB 74*64kB 27*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 123248kB Sep 13 08:03:40 medusa-mds1 kernel: Node 0 Normal: 14712*4kB 2*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 58864kB Sep 13 08:03:40 medusa-mds1 kernel: Node 1 Normal: 16515*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 66060kB Sep 13 08:03:40 medusa-mds1 kernel: 14348123 total pagecache pages Sep 13 08:03:40 medusa-mds1 kernel: 0 pages in swap cache Sep 13 08:03:40 medusa-mds1 kernel: LNet: Skipped 1 previous similar message Sep 13 08:03:40 medusa-mds1 kernel: LNet: No route to 12345-721@gni via 172.16.1.101@o2ib (all routers down) Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 13668:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (204:1756s); client may timeout. req@ffff881009025850 x1477544871505868/t241451695903(0) o35->5bc62c82-094f-856e-e91a-267cfd4ddc48@721@gni:0/0 lens 392/424 e 1 to 0 dl 1410607088 ref 2 fl Complete:/0/0 rc 0/0 Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 13668:0:(service.c:2031:ptlrpc_server_handle_request()) Skipped 7 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: LNet: Service thread pid 13668 completed after 1960.48s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Sep 13 08:03:40 medusa-mds1 kernel: LNet: No route to 12345-736@gni via 172.16.1.101@o2ib (all routers down) Sep 13 08:03:40 medusa-mds1 kernel: LNet: Skipped 3 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: LNet: Service thread pid 13594 completed after 1992.68s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Sep 13 08:03:40 medusa-mds1 kernel: LNet: Skipped 3 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 13401:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (157:1873s); client may timeout. req@ffff880bf5fbd400 x1478857381227138/t241451695908(0) o101->07545e84-602e-a56c-2fbb-e3366461516e@172.17.89.6@o2ib88:0/0 lens 1024/624 e 1 to 0 dl 1410607041 ref 1 fl Complete:/0/0 rc 301/301 Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 13401:0:(service.c:2031:ptlrpc_server_handle_request()) Skipped 4 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 6306:0:(service.c:1889:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 1940s req@ffff880ce8b7a400 x1477544417528352/t0(0) o400->98ea6eb6-8f28-7d39-7697-3d512184aea5@712@gni:0/0 lens 224/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 6306:0:(service.c:1889:ptlrpc_server_handle_req_in()) Skipped 4 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: Lustre: mdt: This server is not able to keep up with request traffic (cpu-bound). Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 13401:0:(service.c:1500:ptlrpc_at_check_timed()) earlyQ=91 reqQ=72 recA=92, svcEst=600, delay=1938459(jiff) Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 13401:0:(service.c:1301:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-1789s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff880b84b66400 x1477544389130524/t0(0) o400->793a9f21-f349-8e57-cbc9-bc5c6753b00b@657@gni:0/0 lens 224/0 e 0 to 0 dl 1410607125 ref 2 fl New:H/0/ffffffff rc 0/-1 Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 13401:0:(service.c:1301:ptlrpc_at_send_early_reply()) Skipped 92 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: LustreError: 6306:0:(service.c:1999:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-657@gni: deadline 151:1790s ago Sep 13 08:03:40 medusa-mds1 kernel: req@ffff880b84b66400 x1477544389130524/t0(0) o400->793a9f21-f349-8e57-cbc9-bc5c6753b00b@657@gni:0/0 lens 224/0 e 0 to 0 dl 1410607125 ref 1 fl Interpret:H/0/ffffffff rc 0/-1 Sep 13 08:03:40 medusa-mds1 kernel: LustreError: 6306:0:(service.c:1999:ptlrpc_server_handle_request()) Skipped 4 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: Lustre: lock timed out (enqueued at 1410608798, 200s ago) Sep 13 08:03:40 medusa-mds1 kernel: Lustre: Skipped 10 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 13346:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (106:2066s); client may timeout. req@ffff880364d6cc00 x1478859808180849/t241451695910(0) o101->7dac140c-09c6-18bb-0ea4-fd6a237f2745@172.17.88.21@o2ib88:0/0 lens 1024/624 e 1 to 0 dl 1410606990 ref 1 fl Complete:/0/0 rc 301/301 Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 13346:0:(service.c:2031:ptlrpc_server_handle_request()) Skipped 76 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: LNet: Service thread pid 13346 completed after 2172.57s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Sep 13 08:03:40 medusa-mds1 kernel: LNet: Skipped 3 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: Lustre: lock timed out (enqueued at 1410608876, 200s ago) Sep 13 08:03:40 medusa-mds1 kernel: LustreError: 13537:0:(service.c:1999:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-457@gni: deadline 96:2014s ago Sep 13 08:03:40 medusa-mds1 kernel: req@ffff880e5792b400 x1477544375123488/t0(0) o400->c68a58c2-bc94-cda0-734b-9499383a5ad8@457@gni:0/0 lens 224/0 e 0 to 0 dl 1410607066 ref 1 fl Interpret:H/0/ffffffff rc 0/-1 Sep 13 08:03:40 medusa-mds1 kernel: LustreError: 13537:0:(service.c:1999:ptlrpc_server_handle_request()) Skipped 73 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 5221:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1410609116/real 0] req@ffff880bc3e18c00 x1475708372626500/t0(0) o6->medusa-OST004d-osc@172.16.1.12@o2ib:28/4 lens 664/432 e 0 to 1 dl 1410609123 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 5221:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: Lustre: medusa-OST004d-osc: Connection to medusa-OST004d (at 172.16.1.12@o2ib) was lost; in progress operations using this service will wait for recovery to complete Sep 13 08:03:40 medusa-mds1 kernel: Lustre: Skipped 3 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: LustreError: 13368:0:(service.c:1999:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-734@gni: deadline 168:1974s ago Sep 13 08:03:40 medusa-mds1 kernel: req@ffff880b9f4cf000 x1477544409159492/t0(0) o400->09acc184-270e-4450-ace5-dfc1d0b1ede7@734@gni:0/0 lens 224/0 e 0 to 0 dl 1410607159 ref 1 fl Interpret:H/0/ffffffff rc 0/-1 Sep 13 08:03:40 medusa-mds1 kernel: LustreError: 13368:0:(service.c:1999:ptlrpc_server_handle_request()) Skipped 7 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: LNet: Service thread pid 13341 completed after 2216.26s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Sep 13 08:03:40 medusa-mds1 kernel: LNet: Skipped 146 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 5218:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1410606973/real 1410609071] req@ffff880fea0ee000 x1475708372625928/t0(0) o13->medusa-OST0020-osc@172.16.1.2@o2ib:7/4 lens 224/368 e 0 to 1 dl 1410606980 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Sep 13 08:03:40 medusa-mds1 kernel: Lustre: medusa-OST0020-osc: Connection to medusa-OST0020 (at 172.16.1.2@o2ib) was lost; in progress operations using this service will wait for recovery to complete Sep 13 08:03:40 medusa-mds1 kernel: LNet: No route to 12345-495@gni via 172.16.1.101@o2ib (all routers down) Sep 13 08:03:40 medusa-mds1 kernel: LNet: Skipped 123 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: Lustre: mdt_readpage: This server is not able to keep up with request traffic (cpu-bound). Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 13877:0:(service.c:1500:ptlrpc_at_check_timed()) earlyQ=1 reqQ=0 recA=1, svcEst=600, delay=0(jiff) Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 13877:0:(service.c:1301:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-1887s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff88101a742850 x1477544384972004/t0(0) o35->cd61f37f-2527-5b43-4ef1-22e157f5e96b@495@gni:0/0 lens 392/0 e 1 to 0 dl 1410607301 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Sep 13 08:03:40 medusa-mds1 kernel: Lustre: mdt: This server is not able to keep up with request traffic (cpu-bound). Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 13563:0:(service.c:1500:ptlrpc_at_check_timed()) earlyQ=1 reqQ=0 recA=67, svcEst=600, delay=0(jiff) Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 13563:0:(service.c:1301:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-2121s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8807debfe800 x1477544404454028/t0(0) o36->fa185364-e595-f1b6-4f1e-168227428d43@701@gni:0/0 lens 536/696 e 1 to 0 dl 1410607087 ref 2 fl Interpret:/0/0 rc 0/0 Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 5219:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1410606969/real 1410609192] req@ffff880be7c9b000 x1475708372625628/t0(0) o13->medusa-OST0031-osc@172.16.1.3@o2ib:7/4 lens 224/368 e 0 to 1 dl 1410606976 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 5219:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 19 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: Lustre: medusa-OST0031-osc: Connection to medusa-OST0031 (at 172.16.1.3@o2ib) was lost; in progress operations using this service will wait for recovery to complete Sep 13 08:03:40 medusa-mds1 kernel: Lustre: Skipped 19 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 13438:0:(service.c:1301:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-2119s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8808c4f5b400 x1477544384821320/t0(0) o101->5f56201b-6159-9a47-3fcc-fd62f683e2ee@689@gni:0/0 lens 584/0 e 1 to 0 dl 1410607133 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Sep 13 08:03:40 medusa-mds1 kernel: Lustre: mdt: This server is not able to keep up with request traffic (cpu-bound). Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 10523:0:(service.c:1500:ptlrpc_at_check_timed()) earlyQ=1 reqQ=0 recA=42, svcEst=600, delay=0(jiff) Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 13438:0:(service.c:1301:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: Lustre: mdt: This server is not able to keep up with request traffic (cpu-bound). Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 13404:0:(service.c:1500:ptlrpc_at_check_timed()) earlyQ=1 reqQ=0 recA=42, svcEst=600, delay=0(jiff) Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 13483:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (151:2195s); client may timeout. req@ffff880f46647800 x1477544388384008/t0(0) o400->67ff2c97-4aca-0524-60a2-4e1decd8e79a@497@gni:0/0 lens 224/0 e 0 to 0 dl 1410607119 ref 1 fl Interpret:H/0/ffffffff rc 0/-1 Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 13483:0:(service.c:2031:ptlrpc_server_handle_request()) Skipped 158 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 13421:0:(service.c:1301:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-2344s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff88101a61fc00 x1477544403953060/t0(0) o101->1a9b8175-3500-9343-d95d-7beb0446c09f@649@gni:0/0 lens 584/616 e 0 to 0 dl 1410606979 ref 1 fl Complete:/0/0 rc 0/0 Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 13421:0:(service.c:1301:ptlrpc_at_send_early_reply()) Skipped 12 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: LustreError: 13591:0:(service.c:1999:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-172.17.88.66@o2ib88: deadline 16:2339s ago Sep 13 08:03:40 medusa-mds1 kernel: req@ffff880995b15800 x1478860079180368/t0(0) o400->9e3c467d-7cb9-ee47-1904-29926c5550e1@172.17.88.66@o2ib88:0/0 lens 192/0 e 0 to 0 dl 1410606990 ref 1 fl Interpret:H/0/ffffffff rc 0/-1 Sep 13 08:03:40 medusa-mds1 kernel: LustreError: 13591:0:(service.c:1999:ptlrpc_server_handle_request()) Skipped 22 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: ib_cm/1: page allocation failure. order:5, mode:0xd0 Sep 13 08:03:40 medusa-mds1 kernel: Pid: 2115, comm: ib_cm/1 Not tainted 2.6.32-358.23.2.el6_lustre.x86_64 #1 Sep 13 08:03:40 medusa-mds1 kernel: Call Trace: Sep 13 08:03:40 medusa-mds1 kernel: [] ? __alloc_pages_nodemask+0x757/0x8d0 Sep 13 08:03:40 medusa-mds1 kernel: [] ? kmem_getpages+0x62/0x170 Sep 13 08:03:40 medusa-mds1 kernel: [] ? fallback_alloc+0x1ba/0x270 Sep 13 08:03:40 medusa-mds1 kernel: [] ? cache_grow+0x2cf/0x320 Sep 13 08:03:40 medusa-mds1 kernel: [] ? ____cache_alloc_node+0x99/0x160 Sep 13 08:03:40 medusa-mds1 kernel: [] ? create_qp_common+0x8a7/0xc90 [mlx4_ib] Sep 13 08:03:40 medusa-mds1 kernel: [] ? __kmalloc+0x189/0x220 Sep 13 08:03:40 medusa-mds1 kernel: [] ? create_qp_common+0x8a7/0xc90 [mlx4_ib] Sep 13 08:03:40 medusa-mds1 kernel: [] ? kmem_cache_alloc_trace+0x1a3/0x1b0 Sep 13 08:03:40 medusa-mds1 kernel: [] ? mlx4_ib_create_qp+0x122/0x240 [mlx4_ib] Sep 13 08:03:40 medusa-mds1 kernel: [] ? rdma_create_qp+0x48/0xc0 [rdma_cm] Sep 13 08:03:40 medusa-mds1 kernel: [] ? kiblnd_create_conn+0xa9d/0x1690 [ko2iblnd] Sep 13 08:03:40 medusa-mds1 kernel: [] ? kiblnd_passive_connect+0x78c/0x1760 [ko2iblnd] Sep 13 08:03:40 medusa-mds1 kernel: [] ? ib_query_gid+0x16/0x20 [ib_core] Sep 13 08:03:40 medusa-mds1 kernel: [] ? find_gid_port+0xaa/0xd0 [rdma_cm] Sep 13 08:03:40 medusa-mds1 kernel: [] ? cma_attach_to_dev+0x5a/0x70 [rdma_cm] Sep 13 08:03:40 medusa-mds1 kernel: [] ? cma_acquire_dev+0x22e/0x280 [rdma_cm] Sep 13 08:03:40 medusa-mds1 kernel: [] ? kiblnd_cm_callback+0x6dd/0x1210 [ko2iblnd] Sep 13 08:03:40 medusa-mds1 kernel: [] ? cma_req_handler+0x3cb/0x720 [rdma_cm] Sep 13 08:03:40 medusa-mds1 kernel: CPU 13: hi: 0, btch: 1 usd: 0 Sep 13 08:03:40 medusa-mds1 kernel: CPU 14: hi: 0, btch: 1 usd: 0 Sep 13 08:03:40 medusa-mds1 kernel: CPU 15: hi: 0, btch: 1 usd: 0 Sep 13 08:03:40 medusa-mds1 kernel: Node 0 DMA32 per-cpu: Sep 13 08:03:40 medusa-mds1 kernel: CPU 0: hi: 186, btch: 31 usd: 0 Sep 13 08:03:40 medusa-mds1 kernel: CPU 1: hi: 186, btch: 31 usd: 0 Sep 13 08:03:40 medusa-mds1 kernel: CPU 2: hi: 186, btch: 31 usd: 0 Sep 13 08:03:40 medusa-mds1 kernel: CPU 3: hi: 186, btch: 31 usd: 0 Sep 13 08:03:40 medusa-mds1 kernel: CPU 4: hi: 186, btch: 31 usd: 0 Sep 13 08:03:40 medusa-mds1 kernel: CPU 5: hi: 186, btch: 31 usd: 0 Sep 13 08:03:40 medusa-mds1 kernel: CPU 6: hi: 186, btch: 31 usd: 0 Sep 13 08:03:40 medusa-mds1 kernel: CPU 7: hi: 186, btch: 31 usd: 0 Sep 13 08:03:40 medusa-mds1 kernel: CPU 8: hi: 186, btch: 31 usd: 0 Sep 13 08:03:40 medusa-mds1 kernel: CPU 9: hi: 186, btch: 31 usd: 0 Sep 13 08:03:40 medusa-mds1 kernel: Node 0 DMA free:15736kB min:28kB low:32kB high:40kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15348kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Sep 13 08:03:40 medusa-mds1 kernel: lowmem_reserve[]: 0 2959 32249 32249 Sep 13 08:03:40 medusa-mds1 kernel: Node 0 DMA32 free:123292kB min:6004kB low:7504kB high:9004kB active_anon:11816kB inactive_anon:1344kB active_file:1039616kB inactive_file:1039812kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3030392kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:2736kB slab_reclaimable:304000kB slab_unreclaimable:20708kB kernel_stack:168kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Sep 13 08:03:40 medusa-mds1 kernel: lowmem_reserve[]: 0 0 29290 29290 Sep 13 08:03:40 medusa-mds1 kernel: lowmem_reserve[]: 0 0 0 0 Sep 13 08:03:40 medusa-mds1 kernel: [] ? do_sys_open+0x69/0x140 Sep 13 08:03:40 medusa-mds1 kernel: [] ? sys_open+0x20/0x30 Sep 13 08:03:40 medusa-mds1 kernel: [] ? system_call_fastpath+0x16/0x1b Sep 13 08:03:40 medusa-mds1 kernel: Mem-Info: Sep 13 08:03:40 medusa-mds1 kernel: Node 0 DMA per-cpu: Sep 13 08:03:40 medusa-mds1 kernel: CPU 0: hi: 0, btch: 1 usd: 0 Sep 13 08:03:40 medusa-mds1 kernel: CPU 1: hi: 0, btch: 1 usd: 0 Sep 13 08:03:40 medusa-mds1 kernel: CPU 2: hi: 0, btch: 1 usd: 0 Sep 13 08:03:40 medusa-mds1 kernel: CPU 3: hi: 0, btch: 1 usd: 0 Sep 13 08:03:40 medusa-mds1 kernel: CPU 4: hi: 0, btch: 1 usd: 0 Sep 13 08:03:40 medusa-mds1 kernel: CPU 5: hi: 0, btch: 1 usd: 0 Sep 13 08:03:40 medusa-mds1 kernel: CPU 6: hi: 0, btch: 1 usd: 0 Sep 13 08:03:40 medusa-mds1 kernel: CPU 7: hi: 0, btch: 1 usd: 0 Sep 13 08:03:40 medusa-mds1 kernel: CPU 8: hi: 0, btch: 1 usd: 0 Sep 13 08:03:40 medusa-mds1 kernel: CPU 9: hi: 0, btch: 1 usd: 0 Sep 13 08:03:40 medusa-mds1 kernel: CPU 10: hi: 0, btch: 1 usd: 0 Sep 13 08:03:40 medusa-mds1 kernel: CPU 11: hi: 0, btch: 1 usd: 0 Sep 13 08:03:40 medusa-mds1 kernel: CPU 12: hi: 0, btch: 1 usd: 0 Sep 13 08:03:40 medusa-mds1 kernel: CPU 13: hi: 0, btch: 1 usd: 0 Sep 13 08:03:40 medusa-mds1 kernel: Node 0 DMA free:15736kB min:28kB low:32kB high:40kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15348kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Sep 13 08:03:40 medusa-mds1 kernel: lowmem_reserve[]: 0 2959 32249 32249 Sep 13 08:03:40 medusa-mds1 kernel: Node 0 DMA32 free:123292kB min:6004kB low:7504kB high:9004kB active_anon:11816kB inactive_anon:1344kB active_file:1040004kB inactive_file:1039424kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3030392kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:2736kB slab_reclaimable:304000kB slab_unreclaimable:20708kB kernel_stack:168kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:14348709 all_unreclaimable? yes Sep 13 08:03:40 medusa-mds1 kernel: lowmem_reserve[]: 0 0 29290 29290 Sep 13 08:03:40 medusa-mds1 kernel: lowmem_reserve[]: 0 0 0 0 Sep 13 08:03:40 medusa-mds1 kernel: [ 2774] 0 2774 1016 23 6 0 0 mingetty Sep 13 08:03:40 medusa-mds1 kernel: [ 2776] 0 2776 1016 23 6 0 0 mingetty Sep 13 08:03:40 medusa-mds1 kernel: [ 2909] 0 2909 19212 7154 3 0 0 collectl Sep 13 08:03:40 medusa-mds1 kernel: [ 2923] 99 2923 26038 8171 2 0 0 gmond Sep 13 08:03:40 medusa-mds1 kernel: [10837] 0 10837 25254 954 14 0 0 snmpd Sep 13 08:03:40 medusa-mds1 kernel: [10046] 0 10046 2272 14 1 0 0 sh Sep 13 08:03:40 medusa-mds1 kernel: Out of memory: Kill process 2435 (nslcd) score 1 or sacrifice child Sep 13 08:03:40 medusa-mds1 kernel: Killed process 2435, UID 65, (nslcd) total-vm:450964kB, anon-rss:4176kB, file-rss:24kB Sep 13 08:03:40 medusa-mds1 kernel: LNet: Service thread pid 13922 completed after 2540.56s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Sep 13 08:03:40 medusa-mds1 kernel: LNet: No route to 12345-707@gni via 172.16.1.101@o2ib (all routers down) Sep 13 08:03:40 medusa-mds1 kernel: LNet: Skipped 21 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 13495:0:(service.c:1301:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-2404s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff880ba204a000 x1477544427831824/t0(0) o101->1f330774-9168-6846-ba6e-e3f62ce7b522@707@gni:0/0 lens 576/0 e 1 to 0 dl 1410607131 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 13495:0:(service.c:1301:ptlrpc_at_send_early_reply()) Skipped 24 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: LNet: Skipped 72 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: LustreError: 10515:0:(service.c:1999:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-200@gni: deadline 96:2549s ago Sep 13 08:03:40 medusa-mds1 kernel: req@ffff880ce1ce7000 x1477544386502884/t0(0) o400->2de40298-90e0-244b-27f1-892011618dbd@200@gni:0/0 lens 224/0 e 0 to 0 dl 1410607083 ref 1 fl Interpret:H/0/ffffffff rc 0/-1 Sep 13 08:03:40 medusa-mds1 kernel: LustreError: 10515:0:(service.c:1999:ptlrpc_server_handle_request()) Skipped 11 previous similar messages Sep 13 08:03:40 medusa-mds1 kernel: Lustre: 5227:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1410606981/real 0] req@ffff880ccb1e4400 x1475708372626344/t0(0) o13->medusa-OST000b-osc@172.16.1.7@o2ib:7/4 lens 224/368 e 0 to 1 dl 1410606988 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Sep 13 08:03:40 medusa-mds1 kernel: [] ? shrink_zone+0x63/0xb0 Sep 13 08:03:40 medusa-mds1 kernel: [] ? do_try_to_free_pages+0x115/0x610 Sep 13 08:03:40 medusa-mds1 kernel: [] ? zone_watermark_ok+0x1f/0x30 Sep 13 08:03:40 medusa-mds1 kernel: [] ? try_to_free_pages+0x92/0x120 Sep 13 08:03:40 medusa-mds1 kernel: [] ? __alloc_pages_nodemask+0x478/0x8d0 Sep 13 08:03:40 medusa-mds1 kernel: [] ? kmem_getpages+0x62/0x170 Sep 13 08:03:40 medusa-mds1 kernel: [] ? fallback_alloc+0x1ba/0x270 Sep 13 08:03:40 medusa-mds1 kernel: [] ? cache_grow+0x2cf/0x320 Sep 13 08:03:40 medusa-mds1 kernel: [] ? ____cache_alloc_node+0x99/0x160 Sep 13 08:03:40 medusa-mds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Sep 13 08:03:40 medusa-mds1 kernel: [] ? __kmalloc+0x189/0x220 Sep 13 08:03:40 medusa-mds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Sep 13 08:03:40 medusa-mds1 kernel: [] ? LNetPut+0x95/0x860 [lnet] Sep 13 08:03:40 medusa-mds1 kernel: [] ? ptl_send_buf+0x1e0/0x550 [ptlrpc] Sep 13 08:03:40 medusa-mds1 kernel: [] ? null_authorize+0xb0/0x100 [ptlrpc] Sep 13 08:03:40 medusa-mds1 kernel: [] ? ptlrpc_send_reply+0x27b/0x7f0 [ptlrpc] Sep 13 08:03:40 medusa-mds1 kernel: [] ? ptlrpc_at_check_timed+0xd65/0x1620 [ptlrpc] Sep 13 08:03:40 medusa-mds1 kernel: [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Sep 13 08:03:40 medusa-mds1 kernel: [] ? ptlrpc_main+0xc10/0x1700 [ptlrpc] Sep 13 08:03:40 medusa-mds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Sep 13 08:03:40 medusa-mds1 kernel: [] ? child_rip+0xa/0x20 Sep 13 08:03:40 medusa-mds1 kernel: [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Sep 13 08:03:40 medusa-mds1 kernel: [] ? cache_grow+0x2cf/0x320 Sep 13 08:03:40 medusa-mds1 kernel: [] ____cache_alloc_node+0x99/0x160 Sep 13 08:03:40 medusa-mds1 kernel: [] ? cfs_alloc+0x30/0x60 [libcfs] Sep 13 08:03:40 medusa-mds1 kernel: [] __kmalloc+0x189/0x220 Sep 13 08:03:40 medusa-mds1 kernel: [] cfs_alloc+0x30/0x60 [libcfs] Sep 13 08:03:40 medusa-mds1 kernel: Call Trace: Sep 13 08:03:40 medusa-mds1 kernel: [] ? try_to_free_buffers+0x51/0xc0 Sep 13 08:03:40 medusa-mds1 kernel: [] __cond_resched+0x2a/0x40 Sep 13 08:03:40 medusa-mds1 kernel: [] _cond_resched+0x30/0x40 Sep 13 08:03:40 medusa-mds1 kernel: [] shrink_page_list.clone.3+0x52/0x650 Sep 13 08:03:40 medusa-mds1 kernel: [] ? mem_cgroup_lru_del_list+0x2b/0xb0 Sep 13 08:03:40 medusa-mds1 kernel: [] ? isolate_lru_pages.clone.0+0xd7/0x170 Sep 13 08:03:40 medusa-mds1 kernel: [] shrink_inactive_list+0x343/0x830 Sep 13 08:03:40 medusa-mds1 kernel: [] ? shrink_active_list+0x297/0x370 Sep 13 08:03:40 medusa-mds1 kernel: [] shrink_mem_cgroup_zone+0x3ae/0x610 Sep 13 08:03:40 medusa-mds1 kernel: [] ? mem_cgroup_iter+0xfd/0x280 Sep 13 08:03:40 medusa-mds1 kernel: [] shrink_zone+0x63/0xb0 Sep 13 08:03:40 medusa-mds1 kernel: [] do_try_to_free_pages+0x115/0x610