LNet: Service thread pid 3567 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Pid: 3567, comm: mdt_rdpg03_003 Call Trace: [] ? try_to_free_buffers+0x51/0xc0 [] ? jbd2_journal_try_to_free_buffers+0x48/0x150 [jbd2] [] ? bdev_try_to_free_page+0x48/0x90 [ldiskfs] [] ? blkdev_releasepage+0x36/0x50 [] ? shrink_page_list.clone.3+0x517/0x650 [] ? mem_cgroup_lru_del_list+0x2b/0xb0 [] ? isolate_lru_pages.clone.0+0xd7/0x170 [] ? shrink_inactive_list+0x191/0x830 [] ? shrink_active_list+0x297/0x370 [] ? shrink_mem_cgroup_zone+0x3ae/0x610 [] ? mem_cgroup_iter+0xfd/0x280 [] ? shrink_zone+0x63/0xb0 [] ? zone_reclaim+0x349/0x400 [] ? mempool_alloc_slab+0x15/0x20 [] ? get_page_from_freelist+0x69c/0x830 [] ? native_sched_clock+0x13/0x80 [] ? __alloc_pages_nodemask+0x113/0x8d0 [] ? blk_queue_bio+0x121/0x5d0 [] ? mempool_alloc_slab+0x15/0x20 [] ? alloc_pages_current+0xaa/0x110 [] ? __page_cache_alloc+0x87/0x90 [] ? find_or_create_page+0x4f/0xb0 [] ? __getblk+0xed/0x2a0 [] ? __breadahead+0x12/0x40 [] ? __ldiskfs_get_inode_loc+0x33e/0x3b0 [ldiskfs] [] ? ldiskfs_iget+0x86/0x800 [ldiskfs] [] ? fld_server_lookup+0x72/0x3d0 [fld] [] ? generic_detach_inode+0x18e/0x1f0 [] ? osd_iget+0x2e/0x2c0 [osd_ldiskfs] [] ? osd_ea_fid_get+0x176/0x2c0 [osd_ldiskfs] [] ? osd_remote_fid+0x9a/0x280 [osd_ldiskfs] [] ? osd_it_ea_rec+0xb45/0x1470 [osd_ldiskfs] [] ? call_filldir+0xb5/0x150 [ldiskfs] [] ? osd_ldiskfs_filldir+0x0/0x480 [osd_ldiskfs] [] ? ldiskfs_readdir+0x5a9/0x730 [ldiskfs] [] ? osd_ldiskfs_filldir+0x0/0x480 [osd_ldiskfs] [] ? htree_unlock+0x3d/0x2c6 [ldiskfs] [] ? lod_it_rec+0x21/0x90 [lod] [] ? mdd_dir_page_build+0xfc/0x210 [mdd] [] ? dt_index_walk+0x162/0x3d0 [obdclass] [] ? down_read+0x16/0x30 [] ? mdd_dir_page_build+0x0/0x210 [mdd] [] ? mdd_readpage+0x38b/0x5a0 [mdd] [] ? mdt_readpage+0x47f/0x960 [mdt] [] ? mdt_handle_common+0x647/0x16d0 [mdt] [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] [] ? mds_readpage_handle+0x15/0x20 [mdt] [] ? ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] [] ? cfs_timer_arm+0xe/0x10 [libcfs] [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] [] ? __wake_up+0x53/0x70 [] ? ptlrpc_main+0xace/0x1700 [ptlrpc] [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [] ? child_rip+0xa/0x20 [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [] ? child_rip+0x0/0x20 LustreError: dumping log to /tmp/lustre-log.1405097248.3567 LNet: Service thread pid 3567 completed after 226.55s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). LNet: Service thread pid 3567 was inactive for 452.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Pid: 3567, comm: mdt_rdpg03_003 Call Trace: [] ? shrink_inactive_list+0x191/0x830 [] ? shrink_active_list+0x297/0x370 [] shrink_mem_cgroup_zone+0x3ae/0x610 [] ? mem_cgroup_iter+0xfd/0x280 [] shrink_zone+0x63/0xb0 [] zone_reclaim+0x349/0x400 [] ? mempool_alloc_slab+0x15/0x20 [] get_page_from_freelist+0x69c/0x830 [] ? native_sched_clock+0x13/0x80 [] __alloc_pages_nodemask+0x113/0x8d0 [] ? blk_queue_bio+0x121/0x5d0 [] ? mempool_alloc_slab+0x15/0x20 [] alloc_pages_current+0xaa/0x110 [] __page_cache_alloc+0x87/0x90 [] find_or_create_page+0x4f/0xb0 [] __getblk+0xed/0x2a0 [] __breadahead+0x12/0x40 [] __ldiskfs_get_inode_loc+0x33e/0x3b0 [ldiskfs] [] ldiskfs_iget+0x86/0x800 [ldiskfs] [] ? fld_server_lookup+0x72/0x3d0 [fld] [] ? generic_detach_inode+0x18e/0x1f0 [] osd_iget+0x2e/0x2c0 [osd_ldiskfs] [] osd_ea_fid_get+0x176/0x2c0 [osd_ldiskfs] [] ? osd_remote_fid+0x9a/0x280 [osd_ldiskfs] [] osd_it_ea_rec+0xb45/0x1470 [osd_ldiskfs] [] ? call_filldir+0xb5/0x150 [ldiskfs] [] ? osd_ldiskfs_filldir+0x0/0x480 [osd_ldiskfs] [] ? ldiskfs_readdir+0x5a9/0x730 [ldiskfs] [] ? osd_ldiskfs_filldir+0x0/0x480 [osd_ldiskfs] [] ? htree_unlock+0x3d/0x2c6 [ldiskfs] [] lod_it_rec+0x21/0x90 [lod] [] mdd_dir_page_build+0xfc/0x210 [mdd] [] dt_index_walk+0x162/0x3d0 [obdclass] [] ? down_read+0x16/0x30 [] ? mdd_dir_page_build+0x0/0x210 [mdd] [] mdd_readpage+0x38b/0x5a0 [mdd] [] mdt_readpage+0x47f/0x960 [mdt] [] mdt_handle_common+0x647/0x16d0 [mdt] [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] [] mds_readpage_handle+0x15/0x20 [mdt] [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] [] ? cfs_timer_arm+0xe/0x10 [libcfs] [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] [] ? __wake_up+0x53/0x70 [] ptlrpc_main+0xace/0x1700 [ptlrpc] [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [] child_rip+0xa/0x20 [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [] ? child_rip+0x0/0x20 LustreError: dumping log to /tmp/lustre-log.1405097911.3567 Lustre: 3371:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-584), not sending early reply req@ffff880625d95400 x1471182058722532/t0(0) o37->c49ef08c-3b74-d0cd-5054-94e6eee13a3c@198.202.119.249@tcp:0/0 lens 448/440 e 2 to 0 dl 1405098648 ref 2 fl Interpret:/0/0 rc 0/0 LustreError: 3567:0:(ldlm_lib.c:2702:target_bulk_io()) @@@ timeout on bulk PUT after -76+76s req@ffff880625d95400 x1471182058722532/t0(0) o37->c49ef08c-3b74-d0cd-5054-94e6eee13a3c@198.202.119.249@tcp:0/0 lens 448/440 e 2 to 0 dl 1405098648 ref 1 fl Interpret:/0/0 rc 0/0 Lustre: 3567:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (1189:76s); client may timeout. req@ffff880625d95400 x1471182058722532/t0(0) o37->c49ef08c-3b74-d0cd-5054-94e6eee13a3c@198.202.119.249@tcp:0/0 lens 448/408 e 2 to 0 dl 1405098648 ref 1 fl Complete:/0/0 rc -110/-110 LNet: Service thread pid 3567 completed after 1265.78s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Lustre: 3253:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1405099109/real 1405099112] req@ffff88061e448400 x1473257066089112/t0(0) o6->puma-OST0004-osc@172.25.32.113@tcp:28/4 lens 664/432 e 0 to 1 dl 1405099116 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Lustre: puma-OST0004-osc: Connection to puma-OST0004 (at 172.25.32.113@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 30 previous similar messages Lustre: puma-OST0004-osc: Connection restored to puma-OST0004 (at 172.25.32.113@tcp) Lustre: Skipped 30 previous similar messages Lustre: 3264:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1405099155/real 1405099155] req@ffff8803174f7400 x1473257066112064/t0(0) o6->puma-OST001e-osc@172.25.32.241@tcp:28/4 lens 664/432 e 0 to 1 dl 1405099165 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Lustre: puma-OST001e-osc: Connection to puma-OST001e (at 172.25.32.241@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: 3260:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1405099155/real 1405099155] req@ffff880337161800 x1473257066112176/t0(0) o6->puma-OST001e-osc@172.25.32.241@tcp:28/4 lens 664/432 e 0 to 1 dl 1405099165 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Lustre: 3260:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 1 previous similar message Lustre: puma-OST000e-osc: Connection to puma-OST000e (at 172.25.32.241@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: 3266:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1405099157/real 1405099157] req@ffff880315e78000 x1473257066112408/t0(0) o6->puma-OST000e-osc@172.25.32.241@tcp:28/4 lens 664/432 e 0 to 1 dl 1405099167 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Lustre: 3266:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Lustre: puma-OST0011-osc: Connection to puma-OST0011 (at 172.25.33.242@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: 3253:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1405099159/real 1405099161] req@ffff88015b605400 x1473257066112544/t0(0) o13->puma-OST0016-osc@172.25.32.241@tcp:7/4 lens 224/368 e 0 to 1 dl 1405099169 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Lustre: 3253:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 1 previous similar message Lustre: MGS: Client aee3c692-8894-f56b-7990-bcf5e3697154 (at 172.25.33.242@tcp) reconnecting Lustre: puma-OST0009-osc: Connection restored to puma-OST0009 (at 172.25.33.242@tcp) Lustre: 3262:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1405099174/real 0] req@ffff88031f416800 x1473257066117352/t0(0) o6->puma-OST0016-osc@172.25.32.241@tcp:28/4 lens 664/432 e 0 to 1 dl 1405099184 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Lustre: 3262:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 1 previous similar message Lustre: puma-OST0016-osc: Connection to puma-OST0016 (at 172.25.32.241@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 2 previous similar messages Lustre: puma-OST0004-osc: Connection to puma-OST0004 (at 172.25.32.113@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: 3258:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1405099174/real 0] req@ffff88061b128c00 x1473257066117928/t0(0) o6->puma-OST0009-osc@172.25.33.242@tcp:28/4 lens 664/432 e 0 to 1 dl 1405099193 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Lustre: 3258:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 98 previous similar messages Lustre: MGS: Client dc42eabd-ae5e-9eeb-9eb2-e58c7264872b (at 198.202.118.128@tcp) reconnecting LustreError: 11-0: puma-OST0016-osc: Communicating with 172.25.32.241@tcp, operation ost_connect failed with -16. Lustre: puma-OST0006-osc: Connection restored to puma-OST0006 (at 172.25.32.241@tcp) Lustre: Skipped 4 previous similar messages Lustre: MGS: Client 660dd450-f879-6033-d09c-a46a7eea0d14 (at 172.25.32.113@tcp) reconnecting Lustre: Skipped 34 previous similar messages Lustre: 3260:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1405099194/real 1405099197] req@ffff880316091000 x1473257066120256/t0(0) o6->puma-OST0011-osc@172.25.33.242@tcp:28/4 lens 664/432 e 0 to 1 dl 1405099210 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Lustre: 3260:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Lustre: puma-OST0011-osc: Connection to puma-OST0011 (at 172.25.33.242@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 10 previous similar messages LustreError: 3455:0:(osp_precreate.c:484:osp_precreate_send()) puma-OST0019-osc: can't precreate: rc = -11 LustreError: 3455:0:(osp_precreate.c:989:osp_precreate_thread()) puma-OST0019-osc: cannot precreate objects: rc = -11 Lustre: MGS: Client fe1be653-0519-0538-cf79-6ce2af0259c8 (at 198.202.118.97@tcp) reconnecting Lustre: puma-OST000c-osc: Connection restored to puma-OST000c (at 172.25.32.113@tcp) Lustre: Skipped 4 previous similar messages LustreError: 11-0: puma-OST0019-osc: Communicating with 172.25.33.242@tcp, operation ost_connect failed with -16. LustreError: Skipped 6 previous similar messages Lustre: MGS: Client c680bcd2-88c7-7e9f-e32e-906f3bd886d2 (at 198.202.118.86@tcp) reconnecting Lustre: Skipped 27 previous similar messages Lustre: puma-OST0019-osc: Connection restored to puma-OST0019 (at 172.25.33.242@tcp) Lustre: Skipped 6 previous similar messages Lustre: MGS: Client cae6f82c-ea4b-3531-e19b-e2da307379af (at 198.202.118.110@tcp) reconnecting Lustre: Skipped 21 previous similar messages Lustre: MGS: Client 71a190b9-471c-b9e0-fefd-25605c425114 (at 198.202.118.99@tcp) reconnecting Lustre: 3265:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1405099251/real 1405099257] req@ffff88026ce90000 x1473257066204432/t0(0) o13->puma-OST0016-osc@172.25.32.241@tcp:7/4 lens 224/368 e 0 to 1 dl 1405099266 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Lustre: 3265:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 10 previous similar messages Lustre: puma-OST0016-osc: Connection to puma-OST0016 (at 172.25.32.241@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 3 previous similar messages LustreError: 11-0: puma-OST0014-osc: Communicating with 172.25.32.113@tcp, operation ost_connect failed with -16. Lustre: puma-OST0014-osc: Connection restored to puma-OST0014 (at 172.25.32.113@tcp) Lustre: Skipped 3 previous similar messages Lustre: MGS: Client d877da85-3ebf-a25a-66dd-dc582f2d5d40 (at 172.25.32.241@tcp) reconnecting Lustre: Skipped 15 previous similar messages Lustre: 3255:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1405099978/real 0] req@ffff8803208f7000 x1473257066552224/t0(0) o6->puma-OST0009-osc@172.25.33.242@tcp:28/4 lens 664/432 e 0 to 1 dl 1405099989 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Lustre: 3255:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 10 previous similar messages Lustre: puma-OST0009-osc: Connection to puma-OST0009 (at 172.25.33.242@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 2 previous similar messages LustreError: 138-a: puma-MDT0000: A client on nid 198.202.119.249@tcp was evicted due to a lock blocking callback time out: rc -107 LustreError: 3341:0:(ldlm_lockd.c:2348:ldlm_cancel_handler()) ldlm_cancel from 198.202.119.249@tcp arrived at 1405099992 with bad export cookie 554718929072976222 Lustre: MGS: Client 6e940078-4347-d139-f541-e589ad27558a (at 198.202.119.66@tcp) reconnecting Lustre: Skipped 1 previous similar message Lustre: puma-OST001c-osc: Connection restored to puma-OST001c (at 172.25.32.113@tcp) Lustre: Skipped 2 previous similar messages LustreError: 11-0: puma-OST0014-osc: Communicating with 172.25.32.113@tcp, operation ost_connect failed with -16. LustreError: Skipped 2 previous similar messages Lustre: puma-OST001e-osc: Connection restored to puma-OST001e (at 172.25.32.241@tcp) Lustre: Skipped 4 previous similar messages Lustre: 3260:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1405100506/real 0] req@ffff88063846f400 x1473257066710080/t0(0) o13->puma-OST0019-osc@172.25.33.242@tcp:7/4 lens 224/368 e 0 to 1 dl 1405100519 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Lustre: 3260:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 42 previous similar messages Lustre: puma-OST0019-osc: Connection to puma-OST0019 (at 172.25.33.242@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 11 previous similar messages Lustre: puma-OST0019-osc: Connection restored to puma-OST0019 (at 172.25.33.242@tcp) Lustre: Skipped 6 previous similar messages Lustre: MGS: Client ef25248d-a7d2-c3df-9137-a8bacdc4a1b8 (at 198.202.118.204@tcp) reconnecting Lustre: Skipped 25 previous similar messages LNet: Service thread pid 3488 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Pid: 3488, comm: mdt_rdpg03_002 Call Trace: [] ? try_to_free_buffers+0x51/0xc0 [] ? jbd2_journal_try_to_free_buffers+0xa7/0x150 [jbd2] [] ? bdev_try_to_free_page+0x48/0x90 [ldiskfs] [] ? unlock_page+0x1a/0x30 [] ? shrink_page_list.clone.3+0xd0/0x650 [] ? mem_cgroup_lru_del_list+0x2b/0xb0 [] ? isolate_lru_pages.clone.0+0xd7/0x170 [] ? shrink_inactive_list+0x343/0x830 [] ? shrink_mem_cgroup_zone+0x3ae/0x610 [] ? mem_cgroup_iter+0xfd/0x280 [] ? shrink_zone+0x63/0xb0 [] ? zone_reclaim+0x349/0x400 [] ? get_page_from_freelist+0x69c/0x830 [] ? make_request+0xb4a/0xe90 [raid10] [] ? __alloc_pages_nodemask+0x113/0x8d0 [] ? blk_queue_bio+0x510/0x5d0 [] ? __switch_to+0x1ac/0x320 [] ? mempool_alloc_slab+0x15/0x20 [] ? alloc_pages_current+0xaa/0x110 [] ? __page_cache_alloc+0x87/0x90 [] ? find_or_create_page+0x4f/0xb0 [] ? __getblk+0xed/0x2a0 [] ? __breadahead+0x12/0x40 [] ? __ldiskfs_get_inode_loc+0x33e/0x3b0 [ldiskfs] [] ? ldiskfs_iget+0x86/0x800 [ldiskfs] [] ? fld_server_lookup+0x72/0x3d0 [fld] [] ? generic_detach_inode+0x18e/0x1f0 [] ? osd_iget+0x2e/0x2c0 [osd_ldiskfs] [] ? osd_ea_fid_get+0x176/0x2c0 [osd_ldiskfs] [] ? osd_remote_fid+0x9a/0x280 [osd_ldiskfs] [] ? osd_it_ea_rec+0xb45/0x1470 [osd_ldiskfs] [] ? call_filldir+0xb5/0x150 [ldiskfs] [] ? osd_ldiskfs_filldir+0x0/0x480 [osd_ldiskfs] [] ? ldiskfs_readdir+0x5a9/0x730 [ldiskfs] [] ? osd_ldiskfs_filldir+0x0/0x480 [osd_ldiskfs] [] ? htree_unlock+0x3d/0x2c6 [ldiskfs] [] ? lod_it_rec+0x21/0x90 [lod] [] ? mdd_dir_page_build+0xfc/0x210 [mdd] [] ? dt_index_walk+0x162/0x3d0 [obdclass] [] ? mdd_dir_page_build+0x0/0x210 [mdd] [] ? mdd_readpage+0x38b/0x5a0 [mdd] [] ? mdt_readpage+0x47f/0x960 [mdt] [] ? mdt_handle_common+0x647/0x16d0 [mdt] [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] [] ? mds_readpage_handle+0x15/0x20 [mdt] [] ? ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] [] ? cfs_timer_arm+0xe/0x10 [libcfs] [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] [] ? __wake_up+0x53/0x70 [] ? ptlrpc_main+0xace/0x1700 [ptlrpc] [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [] ? child_rip+0xa/0x20 [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [] ? child_rip+0x0/0x20 LustreError: dumping log to /tmp/lustre-log.1405105816.3488 LNet: Service thread pid 3488 completed after 316.56s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Lustre: 3254:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1405106276/real 0] req@ffff880288c9c800 x1473257068678624/t0(0) o6->puma-OST0014-osc@172.25.32.113@tcp:28/4 lens 664/432 e 0 to 1 dl 1405106284 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Lustre: puma-OST0014-osc: Connection to puma-OST0014 (at 172.25.32.113@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: MGS: Client fe1be653-0519-0538-cf79-6ce2af0259c8 (at 198.202.118.97@tcp) reconnecting Lustre: Skipped 7 previous similar messages LustreError: 11-0: puma-OST0014-osc: Communicating with 172.25.32.113@tcp, operation ost_connect failed with -16. LustreError: Skipped 6 previous similar messages Lustre: puma-OST0014-osc: Connection restored to puma-OST0014 (at 172.25.32.113@tcp) INFO: task osp-syn-0:3406 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. osp-syn-0 D 0000000000000001 0 3406 2 0x00000080 ffff88031eab14c0 0000000000000046 0000000000000000 ffff8803f5c9a818 ffff88032157a7b0 ffff880291a34b58 ffff8802f9550130 ffff880291a24678 ffff88031ea77098 ffff88031eab1fd8 000000000000fb88 ffff88031ea77098 Call Trace: [] do_get_write_access+0x29d/0x520 [jbd2] [] ? wake_bit_function+0x0/0x50 [] ? zone_statistics+0x99/0xc0 [] jbd2_journal_get_write_access+0x31/0x50 [jbd2] [] __ldiskfs_journal_get_write_access+0x38/0x80 [ldiskfs] [] ? ldiskfs_bread+0x18/0x80 [ldiskfs] [] osd_ldiskfs_write_record+0x93/0x330 [osd_ldiskfs] [] osd_write+0x148/0x2a0 [osd_ldiskfs] [] dt_record_write+0x45/0x130 [obdclass] [] ? transfer_objects+0x5c/0x80 [] llog_osd_write_blob+0x57b/0x850 [obdclass] [] llog_osd_write_rec+0x274/0x1370 [obdclass] [] llog_write_rec+0xc8/0x290 [obdclass] [] llog_write+0x2f5/0x440 [obdclass] [] llog_cancel_rec+0xbc/0x7c0 [obdclass] [] llog_cat_cancel_records+0x107/0x340 [obdclass] [] osp_sync_process_committed+0x231/0x750 [osp] [] osp_sync_process_queues+0x94/0x15e0 [osp] [] ? osd_object_read_unlock+0x8b/0xd0 [osd_ldiskfs] [] ? default_wake_function+0x0/0x20 [] llog_process_thread+0x8fb/0xe00 [obdclass] [] ? osp_sync_process_queues+0x0/0x15e0 [osp] [] llog_process_or_fork+0x12d/0x660 [obdclass] [] llog_cat_process_cb+0x56a/0x620 [obdclass] [] llog_process_thread+0x8fb/0xe00 [obdclass] [] ? llog_cat_process_cb+0x0/0x620 [obdclass] [] llog_process_or_fork+0x12d/0x660 [obdclass] [] llog_cat_process_or_fork+0x89/0x350 [obdclass] [] ? __wake_up_common+0x59/0x90 [] ? osp_sync_process_queues+0x0/0x15e0 [osp] [] llog_cat_process+0x19/0x20 [obdclass] [] ? cfs_waitq_signal+0x1a/0x20 [libcfs] [] osp_sync_thread+0x240/0x7e0 [osp] [] ? osp_sync_thread+0x0/0x7e0 [osp] [] child_rip+0xa/0x20 [] ? osp_sync_thread+0x0/0x7e0 [osp] [] ? osp_sync_thread+0x0/0x7e0 [osp] [] ? child_rip+0x0/0x20 INFO: task osp-syn-1:3408 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. osp-syn-1 D 0000000000000005 0 3408 2 0x00000080 ffff88031eaf14c0 0000000000000046 0000000000000000 ffff8801cff097b0 ffff880291a04c28 ffff8801990c9338 ffff880623769bc0 ffff8802f9550268 ffff88031c2bdab8 ffff88031eaf1fd8 000000000000fb88 ffff88031c2bdab8 Call Trace: [] do_get_write_access+0x29d/0x520 [jbd2] [] ? wake_bit_function+0x0/0x50 [] jbd2_journal_get_write_access+0x31/0x50 [jbd2] [] __ldiskfs_journal_get_write_access+0x38/0x80 [ldiskfs] [] ? ldiskfs_bread+0x18/0x80 [ldiskfs] [] osd_ldiskfs_write_record+0x93/0x330 [osd_ldiskfs] [] osd_write+0x148/0x2a0 [osd_ldiskfs] [] dt_record_write+0x45/0x130 [obdclass] [] ? osd_declare_qid+0xd6/0x3f0 [osd_ldiskfs] [] llog_osd_write_blob+0x57b/0x850 [obdclass] [] llog_osd_write_rec+0x274/0x1370 [obdclass] [] llog_write_rec+0xc8/0x290 [obdclass] [] llog_write+0x2f5/0x440 [obdclass] [] llog_cancel_rec+0xbc/0x7c0 [obdclass] [] llog_cat_cancel_records+0x107/0x340 [obdclass] [] osp_sync_process_committed+0x231/0x750 [osp] [] osp_sync_process_queues+0x94/0x15e0 [osp] [] ? osd_object_read_unlock+0x8b/0xd0 [osd_ldiskfs] [] ? default_wake_function+0x0/0x20 [] llog_process_thread+0x8fb/0xe00 [obdclass] [] ? osp_sync_process_queues+0x0/0x15e0 [osp] [] llog_process_or_fork+0x12d/0x660 [obdclass] [] llog_cat_process_cb+0x56a/0x620 [obdclass] [] llog_process_thread+0x8fb/0xe00 [obdclass] [] ? llog_cat_process_cb+0x0/0x620 [obdclass] [] llog_process_or_fork+0x12d/0x660 [obdclass] [] llog_cat_process_or_fork+0x89/0x350 [obdclass] [] ? __wake_up_common+0x59/0x90 [] ? osp_sync_process_queues+0x0/0x15e0 [osp] [] llog_cat_process+0x19/0x20 [obdclass] [] ? cfs_waitq_signal+0x1a/0x20 [libcfs] [] osp_sync_thread+0x240/0x7e0 [osp] [] ? osp_sync_thread+0x0/0x7e0 [osp] [] child_rip+0xa/0x20 [] ? osp_sync_thread+0x0/0x7e0 [osp] [] ? osp_sync_thread+0x0/0x7e0 [osp] [] ? child_rip+0x0/0x20 INFO: task osp-syn-2:3410 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. osp-syn-2 D 000000000000000d 0 3410 2 0x00000080 ffff88031eb5b4c0 0000000000000046 0000000000000000 ffff8802f9550268 ffff8806237c4748 ffff8802834519b8 ffff88018a532540 ffff880624f2dcf8 ffff88031eb15058 ffff88031eb5bfd8 000000000000fb88 ffff88031eb15058 Call Trace: [] do_get_write_access+0x29d/0x520 [jbd2] [] ? wake_bit_function+0x0/0x50 [] jbd2_journal_get_write_access+0x31/0x50 [jbd2] [] __ldiskfs_journal_get_write_access+0x38/0x80 [ldiskfs] [] ? ldiskfs_bread+0x18/0x80 [ldiskfs] [] osd_ldiskfs_write_record+0x93/0x330 [osd_ldiskfs] [] osd_write+0x148/0x2a0 [osd_ldiskfs] [] dt_record_write+0x45/0x130 [obdclass] [] ? osd_declare_qid+0xd6/0x3f0 [osd_ldiskfs] [] llog_osd_write_blob+0x57b/0x850 [obdclass] [] llog_osd_write_rec+0x274/0x1370 [obdclass] [] llog_write_rec+0xc8/0x290 [obdclass] [] llog_write+0x2f5/0x440 [obdclass] [] llog_cancel_rec+0xbc/0x7c0 [obdclass] [] llog_cat_cancel_records+0x107/0x340 [obdclass] [] osp_sync_process_committed+0x231/0x750 [osp] [] osp_sync_process_queues+0x94/0x15e0 [osp] [] ? osd_object_read_unlock+0x8b/0xd0 [osd_ldiskfs] [] ? default_wake_function+0x0/0x20 [] llog_process_thread+0x8fb/0xe00 [obdclass] [] ? osp_sync_process_queues+0x0/0x15e0 [osp] [] llog_process_or_fork+0x12d/0x660 [obdclass] [] llog_cat_process_cb+0x56a/0x620 [obdclass] [] llog_process_thread+0x8fb/0xe00 [obdclass] [] ? llog_cat_process_cb+0x0/0x620 [obdclass] [] llog_process_or_fork+0x12d/0x660 [obdclass] [] llog_cat_process_or_fork+0x89/0x350 [obdclass] [] ? __wake_up_common+0x59/0x90 [] ? osp_sync_process_queues+0x0/0x15e0 [osp] [] llog_cat_process+0x19/0x20 [obdclass] [] ? cfs_waitq_signal+0x1a/0x20 [libcfs] [] osp_sync_thread+0x240/0x7e0 [osp] [] ? osp_sync_thread+0x0/0x7e0 [osp] [] child_rip+0xa/0x20 [] ? osp_sync_thread+0x0/0x7e0 [osp] [] ? osp_sync_thread+0x0/0x7e0 [osp] [] ? child_rip+0x0/0x20 INFO: task osp-syn-3:3412 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. osp-syn-3 D 0000000000000002 0 3412 2 0x00000080 ffff88031eb9d4c0 0000000000000046 0000000000000000 ffff880168fc7198 ffff88029adb4880 ffff880288daf678 ffff8801d223a3a0 ffff8801509b8f68 ffff88031eb9baf8 ffff88031eb9dfd8 000000000000fb88 ffff88031eb9baf8 Call Trace: [] do_get_write_access+0x29d/0x520 [jbd2] [] ? wake_bit_function+0x0/0x50 [] ? zone_statistics+0x70/0xc0 [] jbd2_journal_get_write_access+0x31/0x50 [jbd2] [] __ldiskfs_journal_get_write_access+0x38/0x80 [ldiskfs] [] ? ldiskfs_bread+0x18/0x80 [ldiskfs] [] osd_ldiskfs_write_record+0x93/0x330 [osd_ldiskfs] [] osd_write+0x148/0x2a0 [osd_ldiskfs] [] dt_record_write+0x45/0x130 [obdclass] [] ? osd_declare_qid+0xd6/0x3f0 [osd_ldiskfs] [] llog_osd_write_blob+0x57b/0x850 [obdclass] [] llog_osd_write_rec+0x274/0x1370 [obdclass] [] llog_write_rec+0xc8/0x290 [obdclass] [] llog_write+0x2f5/0x440 [obdclass] [] llog_cancel_rec+0xbc/0x7c0 [obdclass] [] llog_cat_cancel_records+0x107/0x340 [obdclass] [] osp_sync_process_committed+0x231/0x750 [osp] [] osp_sync_process_queues+0x94/0x15e0 [osp] [] ? osd_object_read_unlock+0x8b/0xd0 [osd_ldiskfs] [] ? default_wake_function+0x0/0x20 [] llog_process_thread+0x8fb/0xe00 [obdclass] [] ? osp_sync_process_queues+0x0/0x15e0 [osp] [] llog_process_or_fork+0x12d/0x660 [obdclass] [] llog_cat_process_cb+0x56a/0x620 [obdclass] [] llog_process_thread+0x8fb/0xe00 [obdclass] [] ? llog_cat_process_cb+0x0/0x620 [obdclass] [] llog_process_or_fork+0x12d/0x660 [obdclass] [] llog_cat_process_or_fork+0x89/0x350 [obdclass] [] ? __wake_up_common+0x59/0x90 [] ? osp_sync_process_queues+0x0/0x15e0 [osp] [] llog_cat_process+0x19/0x20 [obdclass] [] ? cfs_waitq_signal+0x1a/0x20 [libcfs] [] osp_sync_thread+0x240/0x7e0 [osp] [] ? osp_sync_thread+0x0/0x7e0 [osp] [] child_rip+0xa/0x20 [] ? osp_sync_thread+0x0/0x7e0 [osp] [] ? osp_sync_thread+0x0/0x7e0 [osp] [] ? child_rip+0x0/0x20 INFO: task osp-syn-4:3414 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. osp-syn-4 D 0000000000000005 0 3414 2 0x00000080 ffff88031ec0d4c0 0000000000000046 0000000000000000 ffff880291a04c28 ffff8801990c9338 ffff880623769bc0 ffff8802f9550268 ffff88042cc20880 ffff88031eb9a638 ffff88031ec0dfd8 000000000000fb88 ffff88031eb9a638 Call Trace: [] do_get_write_access+0x29d/0x520 [jbd2] [] ? wake_bit_function+0x0/0x50 [] jbd2_journal_get_write_access+0x31/0x50 [jbd2] [] __ldiskfs_journal_get_write_access+0x38/0x80 [ldiskfs] [] ? ldiskfs_bread+0x18/0x80 [ldiskfs] [] osd_ldiskfs_write_record+0x93/0x330 [osd_ldiskfs] [] osd_write+0x148/0x2a0 [osd_ldiskfs] [] dt_record_write+0x45/0x130 [obdclass] [] ? osd_declare_qid+0xd6/0x3f0 [osd_ldiskfs] [] llog_osd_write_blob+0x57b/0x850 [obdclass] [] llog_osd_write_rec+0x274/0x1370 [obdclass] [] llog_write_rec+0xc8/0x290 [obdclass] [] llog_write+0x2f5/0x440 [obdclass] [] llog_cancel_rec+0xbc/0x7c0 [obdclass] [] llog_cat_cancel_records+0x107/0x340 [obdclass] [] osp_sync_process_committed+0x231/0x750 [osp] [] osp_sync_process_queues+0x94/0x15e0 [osp] [] ? osd_object_read_unlock+0x8b/0xd0 [osd_ldiskfs] [] ? default_wake_function+0x0/0x20 [] llog_process_thread+0x8fb/0xe00 [obdclass] [] ? osp_sync_process_queues+0x0/0x15e0 [osp] [] llog_process_or_fork+0x12d/0x660 [obdclass] [] llog_cat_process_cb+0x56a/0x620 [obdclass] [] llog_process_thread+0x8fb/0xe00 [obdclass] [] ? llog_cat_process_cb+0x0/0x620 [obdclass] [] llog_process_or_fork+0x12d/0x660 [obdclass] [] llog_cat_process_or_fork+0x89/0x350 [obdclass] [] ? __wake_up_common+0x59/0x90 [] ? osp_sync_process_queues+0x0/0x15e0 [osp] [] llog_cat_process+0x19/0x20 [obdclass] [] ? cfs_waitq_signal+0x1a/0x20 [libcfs] [] osp_sync_thread+0x240/0x7e0 [osp] [] ? osp_sync_thread+0x0/0x7e0 [osp] [] child_rip+0xa/0x20 [] ? osp_sync_thread+0x0/0x7e0 [osp] [] ? osp_sync_thread+0x0/0x7e0 [osp] [] ? child_rip+0x0/0x20 INFO: task osp-syn-5:3416 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. osp-syn-5 D 0000000000000004 0 3416 2 0x00000080 ffff88031ec974c0 0000000000000046 0000000000000000 ffff8803c5aba268 ffff88030e439678 ffff88041c4a1a20 ffff880288cd9a20 ffff880288d4a6e0 ffff88031ec51058 ffff88031ec97fd8 000000000000fb88 ffff88031ec51058 Call Trace: [] do_get_write_access+0x29d/0x520 [jbd2] [] ? wake_bit_function+0x0/0x50 [] jbd2_journal_get_write_access+0x31/0x50 [jbd2] [] __ldiskfs_journal_get_write_access+0x38/0x80 [ldiskfs] [] ? ldiskfs_bread+0x18/0x80 [ldiskfs] [] osd_ldiskfs_write_record+0x93/0x330 [osd_ldiskfs] [] osd_write+0x148/0x2a0 [osd_ldiskfs] [] dt_record_write+0x45/0x130 [obdclass] [] ? osd_declare_qid+0xd6/0x3f0 [osd_ldiskfs] [] llog_osd_write_blob+0x57b/0x850 [obdclass] [] llog_osd_write_rec+0x274/0x1370 [obdclass] [] llog_write_rec+0xc8/0x290 [obdclass] [] llog_write+0x2f5/0x440 [obdclass] [] llog_cancel_rec+0xbc/0x7c0 [obdclass] [] llog_cat_cancel_records+0x107/0x340 [obdclass] [] osp_sync_process_committed+0x231/0x750 [osp] [] osp_sync_process_queues+0x94/0x15e0 [osp] [] ? osd_object_read_unlock+0x8b/0xd0 [osd_ldiskfs] [] ? default_wake_function+0x0/0x20 [] llog_process_thread+0x8fb/0xe00 [obdclass] [] ? osp_sync_process_queues+0x0/0x15e0 [osp] [] llog_process_or_fork+0x12d/0x660 [obdclass] [] llog_cat_process_cb+0x56a/0x620 [obdclass] [] llog_process_thread+0x8fb/0xe00 [obdclass] [] ? llog_cat_process_cb+0x0/0x620 [obdclass] [] llog_process_or_fork+0x12d/0x660 [obdclass] [] llog_cat_process_or_fork+0x89/0x350 [obdclass] [] ? __wake_up_common+0x59/0x90 [] ? osp_sync_process_queues+0x0/0x15e0 [osp] [] llog_cat_process+0x19/0x20 [obdclass] [] ? cfs_waitq_signal+0x1a/0x20 [libcfs] [] osp_sync_thread+0x240/0x7e0 [osp] [] ? osp_sync_thread+0x0/0x7e0 [osp] [] child_rip+0xa/0x20 [] ? osp_sync_thread+0x0/0x7e0 [osp] [] ? osp_sync_thread+0x0/0x7e0 [osp] [] ? child_rip+0x0/0x20 INFO: task osp-syn-6:3418 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. osp-syn-6 D 0000000000000002 0 3418 2 0x00000080 ffff88031ece74c0 0000000000000046 0000000000000000 ffff8801d223a3a0 ffff8801509b8f68 ffff8801509b60c8 ffff88018a532f00 ffff88061d0d25a8 ffff88031ec505f8 ffff88031ece7fd8 000000000000fb88 ffff88031ec505f8 Call Trace: [] do_get_write_access+0x29d/0x520 [jbd2] [] ? wake_bit_function+0x0/0x50 [] jbd2_journal_get_write_access+0x31/0x50 [jbd2] [] __ldiskfs_journal_get_write_access+0x38/0x80 [ldiskfs] [] ? ldiskfs_bread+0x18/0x80 [ldiskfs] [] osd_ldiskfs_write_record+0x93/0x330 [osd_ldiskfs] [] osd_write+0x148/0x2a0 [osd_ldiskfs] [] dt_record_write+0x45/0x130 [obdclass] [] ? osd_declare_qid+0xd6/0x3f0 [osd_ldiskfs] [] llog_osd_write_blob+0x57b/0x850 [obdclass] [] llog_osd_write_rec+0x274/0x1370 [obdclass] [] llog_write_rec+0xc8/0x290 [obdclass] [] llog_write+0x2f5/0x440 [obdclass] [] llog_cancel_rec+0xbc/0x7c0 [obdclass] [] llog_cat_cancel_records+0x107/0x340 [obdclass] [] osp_sync_process_committed+0x231/0x750 [osp] [] osp_sync_process_queues+0x94/0x15e0 [osp] [] ? osd_object_read_unlock+0x8b/0xd0 [osd_ldiskfs] [] ? default_wake_function+0x0/0x20 [] llog_process_thread+0x8fb/0xe00 [obdclass] [] ? osp_sync_process_queues+0x0/0x15e0 [osp] [] llog_process_or_fork+0x12d/0x660 [obdclass] [] llog_cat_process_cb+0x56a/0x620 [obdclass] [] llog_process_thread+0x8fb/0xe00 [obdclass] [] ? llog_cat_process_cb+0x0/0x620 [obdclass] [] llog_process_or_fork+0x12d/0x660 [obdclass] [] llog_cat_process_or_fork+0x89/0x350 [obdclass] [] ? __wake_up_common+0x59/0x90 [] ? osp_sync_process_queues+0x0/0x15e0 [osp] [] llog_cat_process+0x19/0x20 [obdclass] [] ? cfs_waitq_signal+0x1a/0x20 [libcfs] [] osp_sync_thread+0x240/0x7e0 [osp] [] ? osp_sync_thread+0x0/0x7e0 [osp] [] child_rip+0xa/0x20 [] ? osp_sync_thread+0x0/0x7e0 [osp] [] ? osp_sync_thread+0x0/0x7e0 [osp] [] ? child_rip+0x0/0x20 INFO: task osp-syn-7:3420 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. osp-syn-7 D 0000000000000006 0 3420 2 0x00000080 ffff88031ed794c0 0000000000000046 0000000000000000 ffff88018a532d60 ffff88018a532cf8 ffff88015638d6e0 ffff88015634dc28 ffff880291ae52d0 ffff88031ed0b058 ffff88031ed79fd8 000000000000fb88 ffff88031ed0b058 Call Trace: [] do_get_write_access+0x29d/0x520 [jbd2] [] ? wake_bit_function+0x0/0x50 [] jbd2_journal_get_write_access+0x31/0x50 [jbd2] [] __ldiskfs_journal_get_write_access+0x38/0x80 [ldiskfs] [] ? ldiskfs_bread+0x18/0x80 [ldiskfs] [] osd_ldiskfs_write_record+0x93/0x330 [osd_ldiskfs] [] osd_write+0x148/0x2a0 [osd_ldiskfs] [] dt_record_write+0x45/0x130 [obdclass] [] ? osd_declare_qid+0xd6/0x3f0 [osd_ldiskfs] [] llog_osd_write_blob+0x57b/0x850 [obdclass] [] llog_osd_write_rec+0x274/0x1370 [obdclass] [] llog_write_rec+0xc8/0x290 [obdclass] [] llog_write+0x2f5/0x440 [obdclass] [] llog_cancel_rec+0xbc/0x7c0 [obdclass] [] llog_cat_cancel_records+0x107/0x340 [obdclass] [] osp_sync_process_committed+0x231/0x750 [osp] [] osp_sync_process_queues+0x94/0x15e0 [osp] [] ? osd_object_read_unlock+0x8b/0xd0 [osd_ldiskfs] [] ? default_wake_function+0x0/0x20 [] llog_process_thread+0x8fb/0xe00 [obdclass] [] ? osp_sync_process_queues+0x0/0x15e0 [osp] [] llog_process_or_fork+0x12d/0x660 [obdclass] [] llog_cat_process_cb+0x56a/0x620 [obdclass] [] llog_process_thread+0x8fb/0xe00 [obdclass] [] ? llog_cat_process_cb+0x0/0x620 [obdclass] [] llog_process_or_fork+0x12d/0x660 [obdclass] [] llog_cat_process_or_fork+0x89/0x350 [obdclass] [] ? __wake_up_common+0x59/0x90 [] ? osp_sync_process_queues+0x0/0x15e0 [osp] [] llog_cat_process+0x19/0x20 [obdclass] [] ? cfs_waitq_signal+0x1a/0x20 [libcfs] [] osp_sync_thread+0x240/0x7e0 [osp] [] ? osp_sync_thread+0x0/0x7e0 [osp] [] child_rip+0xa/0x20 [] ? osp_sync_thread+0x0/0x7e0 [osp] [] ? osp_sync_thread+0x0/0x7e0 [osp] [] ? child_rip+0x0/0x20 INFO: task osp-syn-8:3422 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. osp-syn-8 D 0000000000000001 0 3422 2 0x00000080 ffff8806284794c0 0000000000000046 0000000000000000 ffff8801563f6880 ffff8803f5c9a818 ffff88032157a7b0 ffff880291a34b58 ffff8802f9550130 ffff880639c39058 ffff880628479fd8 000000000000fb88 ffff880639c39058 Call Trace: [] do_get_write_access+0x29d/0x520 [jbd2] [] ? wake_bit_function+0x0/0x50 [] jbd2_journal_get_write_access+0x31/0x50 [jbd2] [] __ldiskfs_journal_get_write_access+0x38/0x80 [ldiskfs] [] ? ldiskfs_bread+0x18/0x80 [ldiskfs] [] osd_ldiskfs_write_record+0x93/0x330 [osd_ldiskfs] [] osd_write+0x148/0x2a0 [osd_ldiskfs] [] dt_record_write+0x45/0x130 [obdclass] [] ? osd_declare_qid+0xd6/0x3f0 [osd_ldiskfs] [] llog_osd_write_blob+0x57b/0x850 [obdclass] [] llog_osd_write_rec+0x274/0x1370 [obdclass] [] llog_write_rec+0xc8/0x290 [obdclass] [] llog_write+0x2f5/0x440 [obdclass] [] llog_cancel_rec+0xbc/0x7c0 [obdclass] [] llog_cat_cancel_records+0x107/0x340 [obdclass] [] osp_sync_process_committed+0x231/0x750 [osp] [] osp_sync_process_queues+0x94/0x15e0 [osp] [] ? osd_object_read_unlock+0x8b/0xd0 [osd_ldiskfs] [] ? default_wake_function+0x0/0x20 [] llog_process_thread+0x8fb/0xe00 [obdclass] [] ? osp_sync_process_queues+0x0/0x15e0 [osp] [] llog_process_or_fork+0x12d/0x660 [obdclass] [] llog_cat_process_cb+0x56a/0x620 [obdclass] [] llog_process_thread+0x8fb/0xe00 [obdclass] [] ? llog_cat_process_cb+0x0/0x620 [obdclass] [] llog_process_or_fork+0x12d/0x660 [obdclass] [] llog_cat_process_or_fork+0x89/0x350 [obdclass] [] ? __wake_up_common+0x59/0x90 [] ? osp_sync_process_queues+0x0/0x15e0 [osp] [] llog_cat_process+0x19/0x20 [obdclass] [] ? cfs_waitq_signal+0x1a/0x20 [libcfs] [] osp_sync_thread+0x240/0x7e0 [osp] [] ? osp_sync_thread+0x0/0x7e0 [osp] [] child_rip+0xa/0x20 [] ? osp_sync_thread+0x0/0x7e0 [osp] [] ? osp_sync_thread+0x0/0x7e0 [osp] [] ? child_rip+0x0/0x20 INFO: task osp-syn-9:3424 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. osp-syn-9 D 000000000000000d 0 3424 2 0x00000080 ffff8806284bd4c0 0000000000000046 0000000000000000 ffff8806237c4748 ffff8802834519b8 ffff88018a532540 ffff880624f2dcf8 0000000000000001 ffff8806383cc638 ffff8806284bdfd8 000000000000fb88 ffff8806383cc638 Call Trace: [] do_get_write_access+0x29d/0x520 [jbd2] [] ? wake_bit_function+0x0/0x50 [] jbd2_journal_get_write_access+0x31/0x50 [jbd2] [] __ldiskfs_journal_get_write_access+0x38/0x80 [ldiskfs] [] ? ldiskfs_bread+0x18/0x80 [ldiskfs] [] osd_ldiskfs_write_record+0x93/0x330 [osd_ldiskfs] [] osd_write+0x148/0x2a0 [osd_ldiskfs] [] dt_record_write+0x45/0x130 [obdclass] [] ? osd_declare_qid+0xd6/0x3f0 [osd_ldiskfs] [] llog_osd_write_blob+0x57b/0x850 [obdclass] [] llog_osd_write_rec+0x274/0x1370 [obdclass] [] llog_write_rec+0xc8/0x290 [obdclass] [] llog_write+0x2f5/0x440 [obdclass] [] llog_cancel_rec+0xbc/0x7c0 [obdclass] [] llog_cat_cancel_records+0x107/0x340 [obdclass] [] osp_sync_process_committed+0x231/0x750 [osp] [] osp_sync_process_queues+0x94/0x15e0 [osp] [] ? osd_object_read_unlock+0x8b/0xd0 [osd_ldiskfs] [] ? default_wake_function+0x0/0x20 [] llog_process_thread+0x8fb/0xe00 [obdclass] [] ? osp_sync_process_queues+0x0/0x15e0 [osp] [] llog_process_or_fork+0x12d/0x660 [obdclass] [] llog_cat_process_cb+0x56a/0x620 [obdclass] [] llog_process_thread+0x8fb/0xe00 [obdclass] [] ? llog_cat_process_cb+0x0/0x620 [obdclass] [] llog_process_or_fork+0x12d/0x660 [obdclass] [] llog_cat_process_or_fork+0x89/0x350 [obdclass] [] ? __wake_up_common+0x59/0x90 [] ? osp_sync_process_queues+0x0/0x15e0 [osp] [] llog_cat_process+0x19/0x20 [obdclass] [] ? cfs_waitq_signal+0x1a/0x20 [libcfs] [] osp_sync_thread+0x240/0x7e0 [osp] [] ? osp_sync_thread+0x0/0x7e0 [osp] [] child_rip+0xa/0x20 [] ? osp_sync_thread+0x0/0x7e0 [osp] [] ? osp_sync_thread+0x0/0x7e0 [osp] [] ? child_rip+0x0/0x20 LNet: Service thread pid 4061 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Pid: 4061, comm: mdt03_019 Call Trace: [] ? ldiskfs_getblk+0xee/0x1f0 [ldiskfs] [] ? bh_lru_install+0x16e/0x1a0 [] do_get_write_access+0x29d/0x520 [jbd2] [] ? __find_get_block+0x97/0xe0 [] ? wake_bit_function+0x0/0x50 [] jbd2_journal_get_write_access+0x31/0x50 [jbd2] [] __ldiskfs_journal_get_write_access+0x38/0x80 [ldiskfs] [] ldiskfs_reserve_inode_write+0x73/0xa0 [ldiskfs] [] ldiskfs_mark_inode_dirty+0x4c/0x1f0 [ldiskfs] [] ldiskfs_dirty_inode+0x40/0x60 [ldiskfs] [] osd_attr_set+0x181/0x540 [osd_ldiskfs] [] lod_attr_set+0x12b/0x450 [lod] [] mdd_attr_set_internal+0x151/0x230 [mdd] [] mdd_attr_check_set_internal+0x275/0x2c0 [mdd] [] mdd_unlink+0x7a6/0xe30 [mdd] [] mdo_unlink+0x18/0x50 [mdt] [] mdt_reint_unlink+0x820/0x1010 [mdt] [] mdt_reint_rec+0x41/0xe0 [mdt] [] mdt_reint_internal+0x4c3/0x780 [mdt] [] mdt_reint+0x44/0xe0 [mdt] [] mdt_handle_common+0x647/0x16d0 [mdt] [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] [] mds_regular_handle+0x15/0x20 [mdt] [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] [] ? cfs_timer_arm+0xe/0x10 [libcfs] [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] [] ? __wake_up+0x53/0x70 [] ptlrpc_main+0xace/0x1700 [ptlrpc] [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [] child_rip+0xa/0x20 [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [] ? child_rip+0x0/0x20 LustreError: dumping log to /tmp/lustre-log.1405106494.4061 LNet: Service thread pid 4061 completed after 206.12s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). LNet: Service thread pid 3488 was inactive for 630.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Pid: 3488, comm: mdt_rdpg03_002 Call Trace: [] ? shrink_inactive_list+0x3b0/0x830 [] ? mem_cgroup_lru_del_list+0x2b/0xb0 [] ? shrink_active_list+0x297/0x370 [] shrink_mem_cgroup_zone+0x3ae/0x610 [] ? mem_cgroup_iter+0xfd/0x280 [] shrink_zone+0x63/0xb0 [] zone_reclaim+0x349/0x400 [] ? mempool_alloc_slab+0x15/0x20 [] get_page_from_freelist+0x69c/0x830 [] ? native_sched_clock+0x13/0x80 [] __alloc_pages_nodemask+0x113/0x8d0 [] ? blk_queue_bio+0x121/0x5d0 [] ? mempool_alloc_slab+0x15/0x20 [] ? mem_cgroup_get_reclaim_stat_from_page+0x20/0x70 [] alloc_pages_current+0xaa/0x110 [] __page_cache_alloc+0x87/0x90 [] find_or_create_page+0x4f/0xb0 [] __getblk+0xed/0x2a0 [] ? unlock_buffer+0x17/0x20 [] __breadahead+0x12/0x40 [] __ldiskfs_get_inode_loc+0x33e/0x3b0 [ldiskfs] [] ldiskfs_iget+0x86/0x800 [ldiskfs] [] ? fld_server_lookup+0x72/0x3d0 [fld] [] ? generic_detach_inode+0x18e/0x1f0 [] osd_iget+0x2e/0x2c0 [osd_ldiskfs] [] osd_ea_fid_get+0x176/0x2c0 [osd_ldiskfs] [] ? osd_remote_fid+0x9a/0x280 [osd_ldiskfs] [] osd_it_ea_rec+0xb45/0x1470 [osd_ldiskfs] [] ? call_filldir+0xb5/0x150 [ldiskfs] [] ? osd_ldiskfs_filldir+0x0/0x480 [osd_ldiskfs] [] ? ldiskfs_readdir+0x5a9/0x730 [ldiskfs] [] ? osd_ldiskfs_filldir+0x0/0x480 [osd_ldiskfs] [] ? htree_unlock+0x3d/0x2c6 [ldiskfs] [] lod_it_rec+0x21/0x90 [lod] [] mdd_dir_page_build+0xfc/0x210 [mdd] [] dt_index_walk+0x162/0x3d0 [obdclass] [] ? mdd_dir_page_build+0x0/0x210 [mdd] [] mdd_readpage+0x38b/0x5a0 [mdd] [] mdt_readpage+0x47f/0x960 [mdt] [] mdt_handle_common+0x647/0x16d0 [mdt] [] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc] [] mds_readpage_handle+0x15/0x20 [mdt] [] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] [] ? cfs_timer_arm+0xe/0x10 [libcfs] [] ? lc_watchdog_touch+0x6f/0x170 [libcfs] [] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] [] ? __wake_up+0x53/0x70 [] ptlrpc_main+0xace/0x1700 [ptlrpc] [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [] child_rip+0xa/0x20 [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [] ? child_rip+0x0/0x20 LustreError: dumping log to /tmp/lustre-log.1405107415.3488 Lustre: 3371:0:(service.c:1339:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-211), not sending early reply req@ffff88061b13b800 x1471182060929844/t0(0) o37->c49ef08c-3b74-d0cd-5054-94e6eee13a3c@198.202.119.249@tcp:0/0 lens 448/440 e 1 to 0 dl 1405107601 ref 2 fl Interpret:/0/0 rc 0/0 Lustre: puma-MDT0000: Client c49ef08c-3b74-d0cd-5054-94e6eee13a3c (at 198.202.119.249@tcp) reconnecting Lustre: puma-MDT0000: Client c49ef08c-3b74-d0cd-5054-94e6eee13a3c (at 198.202.119.249@tcp) refused reconnection, still busy with 1 active RPCs Lustre: puma-MDT0000: Client c49ef08c-3b74-d0cd-5054-94e6eee13a3c (at 198.202.119.249@tcp) reconnecting Lustre: puma-MDT0000: Client c49ef08c-3b74-d0cd-5054-94e6eee13a3c (at 198.202.119.249@tcp) refused reconnection, still busy with 1 active RPCs Lustre: puma-MDT0000: Client c49ef08c-3b74-d0cd-5054-94e6eee13a3c (at 198.202.119.249@tcp) reconnecting Lustre: puma-MDT0000: Client c49ef08c-3b74-d0cd-5054-94e6eee13a3c (at 198.202.119.249@tcp) refused reconnection, still busy with 1 active RPCs Lustre: puma-MDT0000: Client c49ef08c-3b74-d0cd-5054-94e6eee13a3c (at 198.202.119.249@tcp) reconnecting Lustre: puma-MDT0000: Client c49ef08c-3b74-d0cd-5054-94e6eee13a3c (at 198.202.119.249@tcp) refused reconnection, still busy with 1 active RPCs Lustre: puma-MDT0000: Client c49ef08c-3b74-d0cd-5054-94e6eee13a3c (at 198.202.119.249@tcp) reconnecting Lustre: puma-MDT0000: Client c49ef08c-3b74-d0cd-5054-94e6eee13a3c (at 198.202.119.249@tcp) refused reconnection, still busy with 1 active RPCs Lustre: puma-MDT0000: Client c49ef08c-3b74-d0cd-5054-94e6eee13a3c (at 198.202.119.249@tcp) reconnecting Lustre: puma-MDT0000: Client c49ef08c-3b74-d0cd-5054-94e6eee13a3c (at 198.202.119.249@tcp) refused reconnection, still busy with 1 active RPCs Lustre: puma-MDT0000: Client c49ef08c-3b74-d0cd-5054-94e6eee13a3c (at 198.202.119.249@tcp) reconnecting Lustre: puma-MDT0000: Client c49ef08c-3b74-d0cd-5054-94e6eee13a3c (at 198.202.119.249@tcp) refused reconnection, still busy with 1 active RPCs Lustre: puma-MDT0000: Client c49ef08c-3b74-d0cd-5054-94e6eee13a3c (at 198.202.119.249@tcp) reconnecting Lustre: puma-MDT0000: Client c49ef08c-3b74-d0cd-5054-94e6eee13a3c (at 198.202.119.249@tcp) refused reconnection, still busy with 1 active RPCs Lustre: puma-MDT0000: Client c49ef08c-3b74-d0cd-5054-94e6eee13a3c (at 198.202.119.249@tcp) reconnecting Lustre: Skipped 1 previous similar message Lustre: puma-MDT0000: Client c49ef08c-3b74-d0cd-5054-94e6eee13a3c (at 198.202.119.249@tcp) refused reconnection, still busy with 1 active RPCs Lustre: Skipped 1 previous similar message Lustre: puma-MDT0000: Client c49ef08c-3b74-d0cd-5054-94e6eee13a3c (at 198.202.119.249@tcp) reconnecting Lustre: Skipped 2 previous similar messages Lustre: puma-MDT0000: Client c49ef08c-3b74-d0cd-5054-94e6eee13a3c (at 198.202.119.249@tcp) refused reconnection, still busy with 1 active RPCs Lustre: Skipped 2 previous similar messages Lustre: puma-MDT0000: Client c49ef08c-3b74-d0cd-5054-94e6eee13a3c (at 198.202.119.249@tcp) reconnecting Lustre: Skipped 5 previous similar messages Lustre: puma-MDT0000: Client c49ef08c-3b74-d0cd-5054-94e6eee13a3c (at 198.202.119.249@tcp) refused reconnection, still busy with 1 active RPCs Lustre: Skipped 5 previous similar messages