2014-06-27 10:07:21 LDISKFS-fs warning (device dm-1): ldiskfs_multi_mount_protect: MMP interval 42 higher than expected, please wait. 2014-06-27 10:07:21 2014-06-27 10:07:58 LustreError: 137-5: lustre-OST0000_UUID: not available for connect from 192.168.125.10@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server. 2014-06-27 10:07:58 LustreError: Skipped 13 previous similar messages 2014-06-27 10:08:05 LDISKFS-fs (dm-1): recovery complete 2014-06-27 10:08:05 LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. quota=on. Opts: 2014-06-27 10:08:05 Lustre: lustre-OST0000: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450 2014-06-27 10:08:06 LDISKFS-fs warning (device dm-3): ldiskfs_multi_mount_protect: MMP interval 42 higher than expected, please wait. 2014-06-27 10:08:06 2014-06-27 10:08:06 Lustre: lustre-OST0000: Will be in recovery for at least 2:30, or until 316 clients reconnect 2014-06-27 10:08:17 LustreError: 137-5: lustre-OST000c_UUID: not available for connect from 192.168.124.142@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server. 2014-06-27 10:08:17 LustreError: Skipped 63 previous similar messages 2014-06-27 10:08:49 LustreError: 137-5: lustre-OST000c_UUID: not available for connect from 192.168.124.185@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server. 2014-06-27 10:08:49 LustreError: Skipped 551 previous similar messages 2014-06-27 10:08:52 LDISKFS-fs (dm-3): recovery complete 2014-06-27 10:08:52 LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. quota=on. Opts: 2014-06-27 10:08:52 Lustre: lustre-OST000c: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450 2014-06-27 10:08:53 Lustre: lustre-OST000c: Will be in recovery for at least 2:30, or until 316 clients reconnect 2014-06-27 10:08:53 LDISKFS-fs warning (device dm-0): ldiskfs_multi_mount_protect: MMP interval 42 higher than expected, please wait. 2014-06-27 10:08:53 2014-06-27 10:09:02 Lustre: ost: This server is not able to keep up with request traffic (cpu-bound). 2014-06-27 10:09:02 Lustre: 29362:0:(service.c:1511:ptlrpc_at_check_timed()) earlyQ=1 reqQ=0 recA=0, svcEst=45, delay=0(jiff) 2014-06-27 10:09:02 Lustre: 29362:0:(service.c:1308:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-11s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff880817cc3400 x1471990775896852/t0(0) o400->2395c7c5-3b37-e80f-f463-4ac738057527@192.168.124.83@o2ib:0/0 lens 224/0 e 1 to 0 dl 1403888931 ref 2 fl Complete:H/c0/ffffffff rc 0/-1 2014-06-27 10:09:03 Lustre: ost: This server is not able to keep up with request traffic (cpu-bound). 2014-06-27 10:09:03 Lustre: Skipped 1 previous similar message 2014-06-27 10:09:03 Lustre: 29362:0:(service.c:1511:ptlrpc_at_check_timed()) earlyQ=1 reqQ=0 recA=0, svcEst=45, delay=0(jiff) 2014-06-27 10:09:03 Lustre: 29362:0:(service.c:1511:ptlrpc_at_check_timed()) Skipped 1 previous similar message 2014-06-27 10:09:03 Lustre: 29362:0:(service.c:1308:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-11s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8803d8677800 x1471990350151404/t0(0) o400->a101954c-2353-41d2-db01-b0651d897e2f@192.168.124.23@o2ib:0/0 lens 224/0 e 1 to 0 dl 1403888932 ref 2 fl Complete:H/c0/ffffffff rc 0/-1 2014-06-27 10:09:03 Lustre: 29362:0:(service.c:1308:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages 2014-06-27 10:09:06 Lustre: ost: This server is not able to keep up with request traffic (cpu-bound). 2014-06-27 10:09:06 Lustre: Skipped 3 previous similar messages 2014-06-27 10:09:06 Lustre: 34251:0:(service.c:1511:ptlrpc_at_check_timed()) earlyQ=1 reqQ=0 recA=0, svcEst=45, delay=0(jiff) 2014-06-27 10:09:06 Lustre: 34251:0:(service.c:1511:ptlrpc_at_check_timed()) Skipped 3 previous similar messages 2014-06-27 10:09:06 Lustre: 34251:0:(service.c:1308:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-11s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8808853c1400 x1471990194664176/t0(0) o400->ec25110d-fb6f-a854-508e-ffaf37867192@192.168.124.1@o2ib:0/0 lens 224/0 e 1 to 0 dl 1403888935 ref 2 fl Complete:H/c0/ffffffff rc 0/-1 2014-06-27 10:09:06 Lustre: 34251:0:(service.c:1308:ptlrpc_at_send_early_reply()) Skipped 5 previous similar messages 2014-06-27 10:09:08 Lustre: ost: This server is not able to keep up with request traffic (cpu-bound). 2014-06-27 10:09:08 Lustre: Skipped 3 previous similar messages 2014-06-27 10:09:08 Lustre: 34301:0:(service.c:1511:ptlrpc_at_check_timed()) earlyQ=2 reqQ=0 recA=0, svcEst=45, delay=0(jiff) 2014-06-27 10:09:08 Lustre: 34301:0:(service.c:1511:ptlrpc_at_check_timed()) Skipped 3 previous similar messages 2014-06-27 10:09:08 Lustre: 29296:0:(service.c:1308:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-11s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8809a531c800 x1471990550913980/t0(0) o400->443e26a3-6704-14d9-1c6f-ac0b81130c5a@192.168.124.179@o2ib:0/0 lens 224/0 e 1 to 0 dl 1403888937 ref 2 fl Complete:H/c0/ffffffff rc 0/-1 2014-06-27 10:09:08 Lustre: 29296:0:(service.c:1308:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages 2014-06-27 10:09:12 Lustre: ost: This server is not able to keep up with request traffic (cpu-bound). 2014-06-27 10:09:12 Lustre: Skipped 11 previous similar messages 2014-06-27 10:09:12 Lustre: 29362:0:(service.c:1511:ptlrpc_at_check_timed()) earlyQ=1 reqQ=0 recA=0, svcEst=45, delay=0(jiff) 2014-06-27 10:09:12 Lustre: 29362:0:(service.c:1511:ptlrpc_at_check_timed()) Skipped 10 previous similar messages 2014-06-27 10:09:12 Lustre: 29362:0:(service.c:1308:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-11s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8801e6585000 x1471990428079348/t0(0) o400->2748b9e7-c140-6348-7ae9-efa519ac2d7d@192.168.125.5@o2ib:0/0 lens 224/0 e 1 to 0 dl 1403888941 ref 2 fl Complete:H/c0/ffffffff rc 0/-1 2014-06-27 10:09:12 Lustre: 29362:0:(service.c:1308:ptlrpc_at_send_early_reply()) Skipped 19 previous similar messages 2014-06-27 10:09:14 LustreError: 40007:0:(tgt_handler.c:1982:tgt_brw_write()) lustre-OST0000: Dropping timed-out write from 12345-192.168.125.10@o2ib because locking object 0x0:11376733 took 61 seconds (limit was 45). 2014-06-27 10:09:14 Lustre: lustre-OST0000: Bulk IO write error with 5a5570c6-3b43-b997-806e-941572020b9b (at 192.168.125.10@o2ib), client will retry: rc -110 2014-06-27 10:09:16 LustreError: 40007:0:(tgt_handler.c:1982:tgt_brw_write()) lustre-OST0000: Dropping timed-out write from 12345-192.168.124.104@o2ib because locking object 0x0:11338092 took 64 seconds (limit was 45). 2014-06-27 10:09:16 LustreError: 40007:0:(tgt_handler.c:1982:tgt_brw_write()) Skipped 2 previous similar messages 2014-06-27 10:09:16 Lustre: lustre-OST0000: Bulk IO write error with 7797dbc0-958e-8099-c06f-528944ff8ca4 (at 192.168.124.104@o2ib), client will retry: rc -110 2014-06-27 10:09:16 Lustre: Skipped 2 previous similar messages 2014-06-27 10:09:21 Lustre: ost_io: This server is not able to keep up with request traffic (cpu-bound). 2014-06-27 10:09:21 Lustre: Skipped 9 previous similar messages 2014-06-27 10:09:21 Lustre: 34251:0:(service.c:1511:ptlrpc_at_check_timed()) earlyQ=4 reqQ=0 recA=0, svcEst=45, delay=0(jiff) 2014-06-27 10:09:21 Lustre: 34251:0:(service.c:1511:ptlrpc_at_check_timed()) Skipped 9 previous similar messages 2014-06-27 10:09:21 Lustre: 34251:0:(service.c:1308:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-11s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff880db73abc00 x1471989978520872/t0(0) o400->3a8b6902-ffff-3afd-7a8c-d2dcd9b6ee94@192.168.124.98@o2ib:0/0 lens 224/0 e 1 to 0 dl 1403888950 ref 2 fl Complete:H/c0/ffffffff rc 0/-1 2014-06-27 10:09:21 Lustre: 34251:0:(service.c:1308:ptlrpc_at_send_early_reply()) Skipped 21 previous similar messages 2014-06-27 10:09:37 Lustre: ost: This server is not able to keep up with request traffic (cpu-bound). 2014-06-27 10:09:37 Lustre: Skipped 53 previous similar messages 2014-06-27 10:09:37 Lustre: 29417:0:(service.c:1511:ptlrpc_at_check_timed()) earlyQ=1 reqQ=0 recA=0, svcEst=45, delay=0(jiff) 2014-06-27 10:09:37 Lustre: 29417:0:(service.c:1511:ptlrpc_at_check_timed()) Skipped 69 previous similar messages 2014-06-27 10:09:37 Lustre: 29417:0:(service.c:1308:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-11s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8808d1bde800 x1471990082442716/t0(0) o400->0d0ec924-d623-2c7d-e741-eb2b4f7a0c14@192.168.124.122@o2ib:0/0 lens 224/0 e 1 to 0 dl 1403888966 ref 2 fl Complete:H/c0/ffffffff rc 0/-1 2014-06-27 10:09:37 Lustre: 29417:0:(service.c:1308:ptlrpc_at_send_early_reply()) Skipped 160 previous similar messages 2014-06-27 10:09:41 LDISKFS-fs (dm-0): recovery complete 2014-06-27 10:09:41 LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts: 2014-06-27 10:09:41 Lustre: lustre-OST0018: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450 2014-06-27 10:09:41 Lustre: lustre-OST0018: Will be in recovery for at least 2:30, or until 316 clients reconnect 2014-06-27 10:09:52 Lustre: lustre-OST0000: Client 2395c7c5-3b37-e80f-f463-4ac738057527 (at 192.168.124.83@o2ib) reconnecting, waiting for 316 clients in recovery for 1:54 2014-06-27 10:09:53 Lustre: lustre-OST0000: Client a101954c-2353-41d2-db01-b0651d897e2f (at 192.168.124.23@o2ib) reconnecting, waiting for 316 clients in recovery for 1:53 2014-06-27 10:09:53 Lustre: Skipped 1 previous similar message 2014-06-27 10:09:54 Lustre: lustre-OST0000: Client 0c032afd-d6fd-73bb-157a-02fb840fabe3 (at 192.168.124.26@o2ib) reconnecting, waiting for 316 clients in recovery for 1:52 2014-06-27 10:09:54 Lustre: Skipped 3 previous similar messages 2014-06-27 10:09:57 Lustre: lustre-OST0000: Client 48b1ec79-42e9-4b60-b859-54e03c4e9759 (at 192.168.124.102@o2ib) reconnecting, waiting for 316 clients in recovery for 1:49 2014-06-27 10:09:57 Lustre: Skipped 2 previous similar messages 2014-06-27 10:09:58 LustreError: 40007:0:(tgt_handler.c:1982:tgt_brw_write()) lustre-OST0000: Dropping timed-out write from 12345-192.168.124.187@o2ib because locking object 0x0:11333339 took 93 seconds (limit was 45). 2014-06-27 10:09:58 Lustre: lustre-OST0000: Bulk IO write error with 7bcdee09-8f9c-de31-155e-2116c1c9ae82 (at 192.168.124.187@o2ib), client will retry: rc -110 2014-06-27 10:10:02 program snmpd is using a deprecated SCSI ioctl, please convert it to SG_IO 2014-06-27 10:10:03 Lustre: lustre-OST0000: Client 2748b9e7-c140-6348-7ae9-efa519ac2d7d (at 192.168.125.5@o2ib) reconnecting, waiting for 316 clients in recovery for 2:26 2014-06-27 10:10:03 Lustre: Skipped 23 previous similar messages 2014-06-27 10:10:10 Lustre: ost_io: This server is not able to keep up with request traffic (cpu-bound). 2014-06-27 10:10:10 Lustre: Skipped 42 previous similar messages 2014-06-27 10:10:10 Lustre: 29415:0:(service.c:1511:ptlrpc_at_check_timed()) earlyQ=2 reqQ=0 recA=0, svcEst=45, delay=0(jiff) 2014-06-27 10:10:10 Lustre: 29415:0:(service.c:1511:ptlrpc_at_check_timed()) Skipped 42 previous similar messages 2014-06-27 10:10:10 Lustre: 29415:0:(service.c:1308:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-11s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff880f58b03400 x1471990037913972/t0(0) o101->23c4d25c-cd51-f83b-a3bd-92b7e719cf7a@192.168.119.22@o2ib:0/0 lens 328/0 e 1 to 0 dl 1403888999 ref 2 fl Complete:/40/ffffffff rc 0/-1 2014-06-27 10:10:10 Lustre: 29415:0:(service.c:1308:ptlrpc_at_send_early_reply()) Skipped 81 previous similar messages 2014-06-27 10:10:11 Lustre: lustre-OST0000: Client aa66e281-89b0-7445-deaf-59924eafe334 (at 192.168.124.32@o2ib) reconnecting, waiting for 316 clients in recovery for 2:18 2014-06-27 10:10:11 Lustre: Skipped 20 previous similar messages 2014-06-27 10:10:27 Lustre: lustre-OST0000: Client 903ae915-312e-48e7-ef8d-37e9064620d3 (at 192.168.124.172@o2ib) reconnecting, waiting for 316 clients in recovery for 2:02 2014-06-27 10:10:27 Lustre: Skipped 144 previous similar messages 2014-06-27 10:10:43 LustreError: 14887:0:(ldlm_lib.c:2302:target_queue_recovery_request()) @@@ dropping resent queued req req@ffff88037d098000 x1471989692489304/t0(4319115285) o4->e251613f-4c39-476f-6359-b0f745c90f55@192.168.118.132@o2ib:0/0 lens 488/0 e 0 to 0 dl 1403889104 ref 2 fl Interpret:/6/ffffffff rc 0/-1 2014-06-27 10:10:44 LustreError: 9929:0:(ldlm_lib.c:2302:target_queue_recovery_request()) @@@ dropping resent queued req req@ffff880d27401000 x1471989785237856/t0(4319115186) o4->e7682e15-de42-31c9-6f54-3cff0aca206e@192.168.118.100@o2ib:0/0 lens 488/0 e 0 to 0 dl 1403889105 ref 2 fl Interpret:/6/ffffffff rc 0/-1 2014-06-27 10:10:46 LustreError: 14893:0:(ldlm_lib.c:2302:target_queue_recovery_request()) @@@ dropping resent queued req req@ffff880fd377cc00 x1471990460136424/t0(4319115221) o10->c02f71b7-53e2-3ac3-0d2f-72df9fec05a8@192.168.124.116@o2ib:0/0 lens 440/0 e 0 to 0 dl 1403889107 ref 2 fl Interpret:/6/ffffffff rc 0/-1 2014-06-27 10:10:49 LustreError: 14893:0:(ldlm_lib.c:2302:target_queue_recovery_request()) @@@ dropping resent queued req req@ffff880a2520c400 x1471990380782272/t0(4319115183) o10->29400290-9e98-d499-974a-bdfb40fa179f@192.168.124.171@o2ib:0/0 lens 440/0 e 0 to 0 dl 1403889110 ref 2 fl Interpret:/6/ffffffff rc 0/-1 2014-06-27 10:11:00 Lustre: lustre-OST0000: Client c39e3565-1ea7-3b3b-ea0e-160c5232944b (at 192.168.124.241@o2ib) reconnecting, waiting for 316 clients in recovery for 1:29 2014-06-27 10:11:00 Lustre: Skipped 90 previous similar messages 2014-06-27 10:11:00 LustreError: 14893:0:(ldlm_lib.c:2302:target_queue_recovery_request()) @@@ dropping resent queued req req@ffff880a25213800 x1471989445459200/t0(4319115329) o4->3f03ae62-9262-bb34-4142-201d9e1b7d64@192.168.124.19@o2ib:0/0 lens 488/0 e 0 to 0 dl 1403889121 ref 2 fl Interpret:/6/ffffffff rc 0/-1 2014-06-27 10:11:00 LustreError: 14893:0:(ldlm_lib.c:2302:target_queue_recovery_request()) Skipped 3 previous similar messages 2014-06-27 10:11:14 Lustre: ost: This server is not able to keep up with request traffic (cpu-bound). 2014-06-27 10:11:14 Lustre: Skipped 399 previous similar messages 2014-06-27 10:11:14 Lustre: 24543:0:(service.c:1511:ptlrpc_at_check_timed()) earlyQ=3 reqQ=0 recA=0, svcEst=45, delay=0(jiff) 2014-06-27 10:11:14 Lustre: 24543:0:(service.c:1511:ptlrpc_at_check_timed()) Skipped 401 previous similar messages 2014-06-27 10:11:14 Lustre: 24543:0:(service.c:1308:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-11s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff880db2725c00 x1471989141760944/t0(0) o400->52f64780-cdf4-b8fe-24a1-e262c3d3df6c@192.168.123.6@o2ib:0/0 lens 224/0 e 1 to 0 dl 1403889063 ref 2 fl Complete:H/c0/ffffffff rc 0/-1 2014-06-27 10:11:14 Lustre: 24543:0:(service.c:1308:ptlrpc_at_send_early_reply()) Skipped 866 previous similar messages 2014-06-27 10:11:30 LustreError: 9929:0:(ldlm_lib.c:2302:target_queue_recovery_request()) @@@ dropping resent queued req req@ffff880fd3778400 x1471990065552320/t0(4319115192) o4->29861bfe-22b7-6a24-5c3a-c8325e57b483@192.168.119.14@o2ib:0/0 lens 488/0 e 0 to 0 dl 1403889151 ref 2 fl Interpret:/6/ffffffff rc 0/-1 2014-06-27 10:11:30 LustreError: 9929:0:(ldlm_lib.c:2302:target_queue_recovery_request()) Skipped 6 previous similar messages 2014-06-27 10:11:53 Lustre: lustre-OST000c: recovery is timed out, evict stale exports 2014-06-27 10:11:53 Lustre: lustre-OST000c: disconnecting 2 stale clients 2014-06-27 10:11:53 LustreError: 40070:0:(tgt_handler.c:1982:tgt_brw_write()) lustre-OST000c: Dropping timed-out write from 12345-192.168.124.104@o2ib because locking object 0x0:11337259 took 172 seconds (limit was 45). 2014-06-27 10:11:53 LustreError: 40070:0:(tgt_handler.c:1982:tgt_brw_write()) Skipped 5 previous similar messages 2014-06-27 10:11:53 Lustre: lustre-OST000c: Bulk IO write error with 7797dbc0-958e-8099-c06f-528944ff8ca4 (at 192.168.124.104@o2ib), client will retry: rc -110 2014-06-27 10:11:53 Lustre: Skipped 5 previous similar messages 2014-06-27 10:12:04 Lustre: lustre-OST0000: Client f00310a4-6da8-dd2f-434a-ad69651da049 (at 192.168.124.6@o2ib) reconnecting, waiting for 316 clients in recovery for 0:25 2014-06-27 10:12:04 Lustre: Skipped 785 previous similar messages 2014-06-27 10:12:20 LustreError: 14919:0:(ldlm_lib.c:2302:target_queue_recovery_request()) @@@ dropping resent queued req req@ffff8805b5ecac00 x1471990530269040/t0(4319303417) o4->6a6b7163-77cd-72cf-5fdf-2247d64df1d3@192.168.125.16@o2ib:0/0 lens 488/0 e 0 to 0 dl 1403889201 ref 2 fl Interpret:/6/ffffffff rc 0/-1 2014-06-27 10:12:20 LustreError: 14919:0:(ldlm_lib.c:2302:target_queue_recovery_request()) Skipped 147 previous similar messages 2014-06-27 10:12:30 Lustre: lustre-OST0000: recovery is timed out, evict stale exports 2014-06-27 10:12:30 Lustre: lustre-OST0000: disconnecting 2 stale clients 2014-06-27 10:12:30 Lustre: lustre-OST0000: deleting orphan objects from 0x0:11396298 to 0x0:11397921 2014-06-27 10:12:30 Lustre: lustre-OST0000: Recovery over after 4:24, of 316 clients 314 recovered and 2 were evicted. 2014-06-27 10:12:41 Lustre: lustre-OST0018: recovery is timed out, evict stale exports 2014-06-27 10:12:41 Lustre: lustre-OST0018: disconnecting 2 stale clients 2014-06-27 10:12:41 LustreError: 40133:0:(tgt_handler.c:1982:tgt_brw_write()) lustre-OST0018: Dropping timed-out write from 12345-192.168.124.30@o2ib because locking object 0x0:11359872 took 172 seconds (limit was 45). 2014-06-27 10:12:41 LustreError: 40133:0:(tgt_handler.c:1982:tgt_brw_write()) Skipped 11 previous similar messages 2014-06-27 10:12:41 Lustre: lustre-OST0018: Bulk IO write error with 980366f8-b219-dc77-89fb-71651834e42c (at 192.168.124.30@o2ib), client will retry: rc -110 2014-06-27 10:12:41 Lustre: Skipped 11 previous similar messages 2014-06-27 10:12:57 LustreError: 14813:0:(ldlm_lib.c:2302:target_queue_recovery_request()) @@@ dropping resent queued req req@ffff880789356800 x1471990533596884/t0(4319115195) o4->53b7a0e5-fcc2-0c5b-3582-52626ae75862@192.168.124.220@o2ib:0/0 lens 488/0 e 0 to 0 dl 1403889238 ref 2 fl Interpret:/6/ffffffff rc 0/-1 2014-06-27 10:12:57 LustreError: 14813:0:(ldlm_lib.c:2302:target_queue_recovery_request()) Skipped 10 previous similar messages 2014-06-27 10:13:22 LustreError: 40070:0:(tgt_handler.c:1982:tgt_brw_write()) lustre-OST000c: Dropping timed-out write from 12345-192.168.118.106@o2ib because locking object 0x0:11286065 took 222 seconds (limit was 45). 2014-06-27 10:13:22 LustreError: 40070:0:(tgt_handler.c:1982:tgt_brw_write()) Skipped 12 previous similar messages 2014-06-27 10:13:22 Lustre: lustre-OST000c: Bulk IO write error with 487cbb57-5cc3-ad5c-aa53-d86d426638c2 (at 192.168.118.106@o2ib), client will retry: rc -110 2014-06-27 10:13:22 Lustre: Skipped 12 previous similar messages 2014-06-27 10:13:25 Lustre: ost: This server is not able to keep up with request traffic (cpu-bound). 2014-06-27 10:13:25 Lustre: Skipped 462 previous similar messages 2014-06-27 10:13:25 Lustre: 29329:0:(service.c:1511:ptlrpc_at_check_timed()) earlyQ=1 reqQ=0 recA=0, svcEst=45, delay=0(jiff) 2014-06-27 10:13:25 Lustre: 29329:0:(service.c:1511:ptlrpc_at_check_timed()) Skipped 462 previous similar messages 2014-06-27 10:13:25 Lustre: 29329:0:(service.c:1308:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-11s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8802aadef800 x1471989928996724/t0(0) o400->c9b4baad-9def-1c5d-89c0-6097ccad318e@192.168.124.13@o2ib:0/0 lens 224/0 e 1 to 0 dl 1403889194 ref 2 fl Complete:H/c0/ffffffff rc 0/-1 2014-06-27 10:13:25 Lustre: 29329:0:(service.c:1308:ptlrpc_at_send_early_reply()) Skipped 659 previous similar messages 2014-06-27 10:13:48 Lustre: lustre-OST0000: haven't heard from client 7476c2d4-84d9-be6b-52c3-854fc177a8d1 (at 192.168.118.139@o2ib) in 323 seconds. I think it's dead, and I am evicting it. exp ffff881030e53800, cur 1403889228 expire 1403889078 last 1403888905 2014-06-27 10:13:48 Lustre: Skipped 146 previous similar messages 2014-06-27 10:14:12 Lustre: lustre-OST0018: Client 46f32d50-3450-c805-aefa-8178fd8dc662 (at 192.168.118.101@o2ib) reconnecting, waiting for 316 clients in recovery for 1:58 2014-06-27 10:14:12 Lustre: Skipped 698 previous similar messages 2014-06-27 10:14:15 LustreError: 14634:0:(ldlm_lib.c:2302:target_queue_recovery_request()) @@@ dropping resent queued req req@ffff8803da005800 x1471989692489304/t0(4319115285) o4->e251613f-4c39-476f-6359-b0f745c90f55@192.168.118.132@o2ib:0/0 lens 488/0 e 0 to 0 dl 1403889316 ref 2 fl Interpret:/6/ffffffff rc 0/-1 2014-06-27 10:14:15 LustreError: 14634:0:(ldlm_lib.c:2302:target_queue_recovery_request()) Skipped 109 previous similar messages 2014-06-27 10:15:02 program snmpd is using a deprecated SCSI ioctl, please convert it to SG_IO 2014-06-27 10:15:13 LustreError: 40070:0:(tgt_handler.c:1982:tgt_brw_write()) lustre-OST000c: Dropping timed-out write from 12345-192.168.119.20@o2ib because locking object 0x0:11286027 took 333 seconds (limit was 45). 2014-06-27 10:15:13 LustreError: 40070:0:(tgt_handler.c:1982:tgt_brw_write()) Skipped 26 previous similar messages 2014-06-27 10:15:13 Lustre: lustre-OST000c: Bulk IO write error with 095b8097-ce1e-a744-c4da-563033c57c95 (at 192.168.119.20@o2ib), client will retry: rc -110 2014-06-27 10:15:13 Lustre: Skipped 26 previous similar messages 2014-06-27 10:16:25 LustreError: 14634:0:(ldlm_lib.c:2302:target_queue_recovery_request()) @@@ dropping resent queued req req@ffff8807d03c5000 x1471990810532988/t0(4319115216) o4->7797dbc0-958e-8099-c06f-528944ff8ca4@192.168.124.104@o2ib:0/0 lens 488/0 e 0 to 0 dl 1403889446 ref 2 fl Interpret:/6/ffffffff rc 0/-1 2014-06-27 10:16:25 LustreError: 14634:0:(ldlm_lib.c:2302:target_queue_recovery_request()) Skipped 127 previous similar messages 2014-06-27 10:17:04 LustreError: 40070:0:(tgt_handler.c:1982:tgt_brw_write()) lustre-OST000c: Dropping timed-out write from 12345-192.168.124.104@o2ib because locking object 0x0:11337241 took 261 seconds (limit was 45). 2014-06-27 10:17:04 LustreError: 40070:0:(tgt_handler.c:1982:tgt_brw_write()) Skipped 27 previous similar messages 2014-06-27 10:17:04 Lustre: lustre-OST000c: Bulk IO write error with 7797dbc0-958e-8099-c06f-528944ff8ca4 (at 192.168.124.104@o2ib), client will retry: rc -110 2014-06-27 10:17:04 Lustre: Skipped 27 previous similar messages 2014-06-27 10:17:44 Lustre: ost: This server is not able to keep up with request traffic (cpu-bound). 2014-06-27 10:17:44 Lustre: Skipped 628 previous similar messages 2014-06-27 10:17:44 Lustre: 34303:0:(service.c:1511:ptlrpc_at_check_timed()) earlyQ=2 reqQ=0 recA=0, svcEst=45, delay=0(jiff) 2014-06-27 10:17:44 Lustre: 34303:0:(service.c:1511:ptlrpc_at_check_timed()) Skipped 628 previous similar messages 2014-06-27 10:17:44 Lustre: 34303:0:(service.c:1308:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-11s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff88048c588800 x1471989750388240/t0(0) o400->2ee6a27a-22c7-bfe5-6236-0eb825b6ca93@192.168.118.138@o2ib:0/0 lens 224/0 e 1 to 0 dl 1403889453 ref 2 fl Complete:H/c0/ffffffff rc 0/-1 2014-06-27 10:17:44 Lustre: 34303:0:(service.c:1308:ptlrpc_at_send_early_reply()) Skipped 1175 previous similar messages 2014-06-27 10:18:16 LustreError: 40070:0:(tgt_handler.c:1982:tgt_brw_write()) lustre-OST000c: Dropping timed-out write from 12345-192.168.124.102@o2ib because locking object 0x0:11337285 took 515 seconds (limit was 45). 2014-06-27 10:18:16 LustreError: 40070:0:(tgt_handler.c:1982:tgt_brw_write()) Skipped 12 previous similar messages 2014-06-27 10:18:16 Lustre: lustre-OST000c: Bulk IO write error with 48b1ec79-42e9-4b60-b859-54e03c4e9759 (at 192.168.124.102@o2ib), client will retry: rc -110 2014-06-27 10:18:16 Lustre: Skipped 12 previous similar messages 2014-06-27 10:18:30 Lustre: lustre-OST000c: Client c52d4856-d1df-b87b-911c-f1bfbc23a24d (at 192.168.124.182@o2ib) reconnecting, waiting for 316 clients in recovery for 2:27 2014-06-27 10:18:30 Lustre: Skipped 1321 previous similar messages 2014-06-27 10:19:52 Lustre: lustre-OST0018: recovery is timed out, evict stale exports 2014-06-27 10:19:52 Lustre: lustre-OST0018: disconnecting 1 stale clients 2014-06-27 10:19:52 LustreError: 40133:0:(ldlm_extent.c:696:ldlm_process_extent_lock()) ASSERTION( lock->l_granted_mode != lock->l_req_mode ) failed: 2014-06-27 10:19:53 Initializing cgroup subsys cpu