2016-12-13 14:47:06 [ 138.999726] Lustre: Lustre: Build Version: 2.8.0_5.chaos_6_g0fdca44_dirty 2016-12-13 14:47:07 [ 139.323156] LustreError: 11-0: lquake-MDT0008-osp-MDT000a: operation mds_connect to node 172.19.1.119@o2ib100 failed: rc = -114 2016-12-13 14:47:07 [ 139.355851] Lustre: lquake-MDT000a: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450 2016-12-13 14:47:12 [ 144.299960] Lustre: lquake-MDT000a: Will be in recovery for at least 2:30, or until 31 clients reconnect 2016-12-13 14:47:12 [ 144.310660] Lustre: lquake-MDT000a: Connection restored to 2294a39d-c3ea-4537-cc40-6b83b29b80e6 (at 192.168.128.21@o2ib18) 2016-12-13 14:47:12 [ 144.901073] Lustre: lquake-MDT000a: Connection restored to 7927f07b-2b10-f881-a913-fc8e3ad4e64d (at 192.168.128.30@o2ib18) 2016-12-13 14:47:14 [ 146.250497] Lustre: lquake-MDT000a: Connection restored to a94562e7-af74-55a8-c00d-f8a93049245a (at 192.168.128.23@o2ib18) 2016-12-13 14:47:16 [ 148.881441] Lustre: lquake-MDT000a: Connection restored to 987b4351-b459-8f25-89f9-59fa31045f6b (at 192.168.128.32@o2ib18) 2016-12-13 14:47:16 [ 148.893844] Lustre: Skipped 7 previous similar messages 2016-12-13 14:47:31 [ 163.421029] Lustre: lquake-MDT000a: Connection restored to 39a8e4eb-b239-f869-fb5c-c1fd9c95f91d (at 192.168.128.31@o2ib18) 2016-12-13 14:47:31 [ 163.433397] Lustre: Skipped 2 previous similar messages 2016-12-13 14:47:57 [ 189.262340] LustreError: 137-5: lquake-MDT000b_UUID: not available for connect from 172.19.1.119@o2ib100 (no target). If you are running an HA pair check that the target is mounted on the other server. 2016-12-13 14:47:57 [ 190.010516] Lustre: lquake-MDT000a: Connection restored to (at 172.19.1.122@o2ib100) 2016-12-13 14:47:57 [ 190.019312] Lustre: Skipped 2 previous similar messages 2016-12-13 14:47:58 [ 190.367048] Lustre: 33798:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1481669227/real 1481669227] req@ffff887f42fb1e00 x1553642790322264/t0(0) o38->lquake-MDT000b-osp-MDT000a@172.19.1.122@o2ib100:24/4 lens 520/544 e 0 to 1 dl 1481669277 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 2016-12-13 14:47:59 [ 191.582467] LustreError: 137-5: lquake-MDT000b_UUID: not available for connect from 172.19.1.111@o2ib100 (no target). If you are running an HA pair check that the target is mounted on the other server. 2016-12-13 14:48:03 [ 195.846907] LustreError: 137-5: lquake-MDT000b_UUID: not available for connect from 172.19.1.113@o2ib100 (no target). If you are running an HA pair check that the target is mounted on the other server. 2016-12-13 14:48:21 [ 214.045864] LustreError: 137-5: lquake-MDT000b_UUID: not available for connect from 172.19.1.120@o2ib100 (no target). If you are running an HA pair check that the target is mounted on the other server. 2016-12-13 14:48:22 [ 214.264054] Lustre: lquake-MDT000a: Connection restored to (at 172.19.1.119@o2ib100) 2016-12-13 14:48:22 [ 214.272878] Lustre: Skipped 2 previous similar messages 2016-12-13 14:48:42 [ 234.369062] Lustre: lquake-MDT000a: Client 2294a39d-c3ea-4537-cc40-6b83b29b80e6 (at 192.168.128.21@o2ib18) reconnecting, waiting for 31 clients in recovery for 1:30 2016-12-13 14:48:42 [ 234.387958] LustreError: 33900:0:(ldlm_lib.c:2751:target_queue_recovery_request()) @@@ dropping resent queued req req@ffff883ee8a28900 x1553274421312220/t0(68719541596) o36->2294a39d-c3ea-4537-cc40-6b83b29b80e6@192.168.128.21@o2ib18:3/0 lens 624/0 e 0 to 0 dl 1481669383 ref 2 fl Interpret:/6/ffffffff rc 0/-1 2016-12-13 14:48:42 [ 234.960068] Lustre: lquake-MDT000a: Client 7927f07b-2b10-f881-a913-fc8e3ad4e64d (at 192.168.128.30@o2ib18) reconnecting, waiting for 31 clients in recovery for 1:29 2016-12-13 14:48:42 [ 234.978688] LustreError: 33897:0:(ldlm_lib.c:2751:target_queue_recovery_request()) @@@ dropping resent queued req req@ffff883ee8a29800 x1553274431727596/t0(68719541215) o36->7927f07b-2b10-f881-a913-fc8e3ad4e64d@192.168.128.30@o2ib18:3/0 lens 624/0 e 0 to 0 dl 1481669383 ref 2 fl Interpret:/6/ffffffff rc 0/-1 2016-12-13 14:48:44 [ 236.308408] Lustre: lquake-MDT000a: Client a94562e7-af74-55a8-c00d-f8a93049245a (at 192.168.128.23@o2ib18) reconnecting, waiting for 31 clients in recovery for 1:28 2016-12-13 14:48:44 [ 236.327196] LustreError: 33891:0:(ldlm_lib.c:2751:target_queue_recovery_request()) @@@ dropping resent queued req req@ffff883ef8ab3900 x1553274421273752/t0(68719541425) o36->a94562e7-af74-55a8-c00d-f8a93049245a@192.168.128.23@o2ib18:5/0 lens 624/0 e 0 to 0 dl 1481669385 ref 2 fl Interpret:/6/ffffffff rc 0/-1 2016-12-13 14:48:46 [ 238.946289] Lustre: lquake-MDT000a: Client 987b4351-b459-8f25-89f9-59fa31045f6b (at 192.168.128.32@o2ib18) reconnecting, waiting for 31 clients in recovery for 1:25 2016-12-13 14:48:46 [ 238.965084] Lustre: Skipped 7 previous similar messages 2016-12-13 14:48:46 [ 238.973220] LustreError: 33900:0:(ldlm_lib.c:2751:target_queue_recovery_request()) @@@ dropping resent queued req req@ffff887f48eb7b00 x1553274421557320/t0(68719541334) o36->987b4351-b459-8f25-89f9-59fa31045f6b@192.168.128.32@o2ib18:7/0 lens 624/0 e 0 to 0 dl 1481669387 ref 2 fl Interpret:/6/ffffffff rc 0/-1 2016-12-13 14:48:46 [ 239.007184] LustreError: 33900:0:(ldlm_lib.c:2751:target_queue_recovery_request()) Skipped 7 previous similar messages 2016-12-13 14:48:54 [ 247.148231] Lustre: lquake-MDT000b-osp-MDT000a: Connection restored to 172.19.1.122@o2ib100 (at 172.19.1.122@o2ib100) 2016-12-13 14:48:54 [ 247.161220] Lustre: Skipped 30 previous similar messages 2016-12-13 14:49:01 [ 253.904748] Lustre: lquake-MDT000a: Client 39a8e4eb-b239-f869-fb5c-c1fd9c95f91d (at 192.168.128.31@o2ib18) reconnecting, waiting for 31 clients in recovery for 4:19 2016-12-13 14:49:01 [ 253.923437] Lustre: Skipped 2 previous similar messages 2016-12-13 14:49:01 [ 253.930647] LustreError: 33891:0:(ldlm_lib.c:2751:target_queue_recovery_request()) @@@ dropping resent queued req req@ffff883ef8be8300 x1553274422245048/t0(68719541011) o36->39a8e4eb-b239-f869-fb5c-c1fd9c95f91d@192.168.128.31@o2ib18:22/0 lens 624/0 e 0 to 0 dl 1481669402 ref 2 fl Interpret:/6/ffffffff rc 0/-1 2016-12-13 14:49:01 [ 253.964745] LustreError: 33891:0:(ldlm_lib.c:2751:target_queue_recovery_request()) Skipped 2 previous similar messages 2016-12-13 14:49:27 [ 280.068124] Lustre: lquake-MDT000a: Client lquake-MDT000b-mdtlov_UUID (at 172.19.1.122@o2ib100) reconnecting, waiting for 31 clients in recovery for 5:15 2016-12-13 14:49:27 [ 280.085842] Lustre: Skipped 2 previous similar messages 2016-12-13 14:49:29 [ 282.094632] LustreError: 33918:0:(ldlm_lib.c:2751:target_queue_recovery_request()) @@@ dropping resent queued req req@ffff883efd62e850 x1553641484060400/t0(68719540833) o1000->lquake-MDT000f-mdtlov_UUID@172.19.1.126@o2ib100:50/0 lens 344/0 e 0 to 0 dl 1481669430 ref 2 fl Interpret:/6/ffffffff rc 0/-1 2016-12-13 14:49:29 [ 282.127894] LustreError: 33918:0:(ldlm_lib.c:2751:target_queue_recovery_request()) Skipped 3 previous similar messages 2016-12-13 14:49:52 [ 304.336690] Lustre: lquake-MDT000a: Client lquake-MDT0008-mdtlov_UUID (at 172.19.1.119@o2ib100) reconnecting, waiting for 31 clients in recovery for 4:51 2016-12-13 14:49:52 [ 304.354470] Lustre: Skipped 2 previous similar messages 2016-12-13 14:49:54 [ 307.094143] LustreError: 12455:0:(ldlm_lib.c:2751:target_queue_recovery_request()) @@@ dropping resent queued req req@ffff887f333fcc50 x1553641484081352/t0(68719540836) o1000->lquake-MDT000d-mdtlov_UUID@172.19.1.124@o2ib100:75/0 lens 344/0 e 0 to 0 dl 1481669455 ref 2 fl Interpret:/6/ffffffff rc 0/-1 2016-12-13 14:49:54 [ 307.127463] LustreError: 12455:0:(ldlm_lib.c:2751:target_queue_recovery_request()) Skipped 4 previous similar messages 2016-12-13 14:50:12 [ 325.062211] Lustre: lquake-MDT000a: Connection restored to 2294a39d-c3ea-4537-cc40-6b83b29b80e6 (at 192.168.128.21@o2ib18) 2016-12-13 14:50:12 [ 325.075823] Lustre: Skipped 18 previous similar messages 2016-12-13 14:50:31 [ 343.976187] Lustre: lquake-MDT000a: Client 39a8e4eb-b239-f869-fb5c-c1fd9c95f91d (at 192.168.128.31@o2ib18) reconnecting, waiting for 31 clients in recovery for 4:11 2016-12-13 14:50:31 [ 343.994913] Lustre: Skipped 24 previous similar messages 2016-12-13 14:50:31 [ 344.002215] LustreError: 33891:0:(ldlm_lib.c:2751:target_queue_recovery_request()) @@@ dropping resent queued req req@ffff883eea410600 x1553274422245048/t0(68719541011) o36->39a8e4eb-b239-f869-fb5c-c1fd9c95f91d@192.168.128.31@o2ib18:112/0 lens 624/0 e 0 to 0 dl 1481669492 ref 2 fl Interpret:/6/ffffffff rc 0/-1 2016-12-13 14:50:31 [ 344.036362] LustreError: 33891:0:(ldlm_lib.c:2751:target_queue_recovery_request()) Skipped 18 previous similar messages 2016-12-13 14:51:42 [ 415.172518] Lustre: lquake-MDT000a: Client 7927f07b-2b10-f881-a913-fc8e3ad4e64d (at 192.168.128.30@o2ib18) reconnecting, waiting for 31 clients in recovery for 3:00 2016-12-13 14:51:42 [ 415.180985] LustreError: 33900:0:(ldlm_lib.c:2751:target_queue_recovery_request()) @@@ dropping resent queued req req@ffff883ef8afb900 x1553274421312220/t0(68719541596) o36->2294a39d-c3ea-4537-cc40-6b83b29b80e6@192.168.128.21@o2ib18:183/0 lens 624/0 e 0 to 0 dl 1481669563 ref 2 fl Interpret:/6/ffffffff rc 0/-1 2016-12-13 14:51:42 [ 415.180986] LustreError: 33900:0:(ldlm_lib.c:2751:target_queue_recovery_request()) Skipped 14 previous similar messages 2016-12-13 14:51:43 [ 415.238680] Lustre: Skipped 17 previous similar messages 2016-12-13 14:52:27 [ 460.180427] Lustre: lquake-MDT000a: Connection restored to (at 172.19.1.122@o2ib100) 2016-12-13 14:52:27 [ 460.190432] Lustre: Skipped 46 previous similar messages 2016-12-13 14:53:57 [ 550.240713] Lustre: lquake-MDT000a: Client lquake-MDT000b-mdtlov_UUID (at 172.19.1.122@o2ib100) reconnecting, waiting for 31 clients in recovery for 0:45 2016-12-13 14:53:57 [ 550.258291] Lustre: Skipped 46 previous similar messages 2016-12-13 14:53:59 [ 552.241660] LustreError: 33918:0:(ldlm_lib.c:2751:target_queue_recovery_request()) @@@ dropping resent queued req req@ffff883efd628850 x1553641484060400/t0(68719540833) o1000->lquake-MDT000f-mdtlov_UUID@172.19.1.126@o2ib100:320/0 lens 344/0 e 0 to 0 dl 1481669700 ref 2 fl Interpret:/6/ffffffff rc 0/-1 2016-12-13 14:54:00 [ 552.274899] LustreError: 33918:0:(ldlm_lib.c:2751:target_queue_recovery_request()) Skipped 44 previous similar messages 2016-12-13 14:54:44 [ 597.107557] Lustre: lquake-MDT000a: Recovery already passed deadline 0:01, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. 2016-12-13 14:54:45 [ 598.108926] Lustre: lquake-MDT000a: Recovery already passed deadline 0:02, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. 2016-12-13 14:54:45 [ 598.129849] Lustre: Skipped 2 previous similar messages 2016-12-13 14:54:46 [ 599.211411] Lustre: lquake-MDT000a: Recovery already passed deadline 0:03, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. 2016-12-13 14:54:46 [ 599.232239] Lustre: Skipped 4 previous similar messages 2016-12-13 14:54:48 [ 601.220136] Lustre: lquake-MDT000a: Recovery already passed deadline 0:05, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. 2016-12-13 14:55:01 [ 614.152846] Lustre: lquake-MDT000a: Recovery already passed deadline 0:18, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. 2016-12-13 14:55:28 [ 640.308925] Lustre: lquake-MDT000a: Recovery already passed deadline 0:44, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. 2016-12-13 14:55:28 [ 640.329804] Lustre: Skipped 2 previous similar messages 2016-12-13 14:55:52 [ 664.536848] Lustre: lquake-MDT000a: Recovery already passed deadline 1:09, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. 2016-12-13 14:55:52 [ 664.557680] Lustre: Skipped 2 previous similar messages 2016-12-13 14:56:13 [ 685.728759] Lustre: lquake-MDT000a: recovery is timed out, evict stale exports 2016-12-13 14:56:13 [ 685.737993] Lustre: lquake-MDT000a: disconnecting 16 stale clients 2016-12-13 14:56:13 [ 685.746427] Lustre: 34667:0:(ldlm_lib.c:1589:abort_req_replay_queue()) @@@ aborted: req@ffff887f2c41d850 x1553641484047916/t0(68719540823) o1000->lquake-MDT0006-mdtlov_UUID@172.19.1.117@o2ib100:410/0 lens 344/0 e 17 to 0 dl 1481669790 ref 1 fl Complete:/4/ffffffff rc 0/-1 2016-12-13 14:56:13 [ 685.793338] Lustre: lquake-MDT000a: Denying connection for new client f25a5bc2-93d4-c9d5-aae8-bd315848b7ea(at 192.168.128.24@o2ib18), waiting for 31 known clients (3 recovered, 12 in progress, and 16 evicted) to recover in 21188503:49 2016-12-13 14:56:13 [ 685.818693] Lustre: Skipped 15 previous similar messages 2016-12-13 14:56:13 [ 686.254147] Lustre: 34667:0:(ldlm_lib.c:1589:abort_req_replay_queue()) @@@ aborted: req@ffff887f333f0050 x1553641484047916/t0(68719540823) o1000->lquake-MDT0006-mdtlov_UUID@172.19.1.117@o2ib100:454/0 lens 344/0 e 0 to 0 dl 1481669834 ref 1 fl Complete:/6/ffffffff rc 0/-1 2016-12-13 14:56:13 [ 686.284145] Lustre: 34667:0:(ldlm_lib.c:1589:abort_req_replay_queue()) Skipped 971 previous similar messages 2016-12-13 14:56:22 [ 695.215436] Lustre: 34667:0:(ldlm_lib.c:1589:abort_req_replay_queue()) @@@ aborted: req@ffff887f333f2050 x1553641484047916/t0(68719540823) o1000->lquake-MDT0006-mdtlov_UUID@172.19.1.117@o2ib100:455/0 lens 344/0 e 0 to 0 dl 1481669835 ref 1 fl Complete:/6/ffffffff rc 0/-1 2016-12-13 14:56:22 [ 695.245593] Lustre: 34667:0:(ldlm_lib.c:1589:abort_req_replay_queue()) Skipped 291 previous similar messages 2016-12-13 14:56:23 [ 695.778851] Lustre: 34667:0:(ldlm_lib.c:2004:target_recovery_overseer()) lquake-MDT000a recovery is aborted by hard timeout 2016-12-13 14:56:23 [ 695.792412] Lustre: 34667:0:(ldlm_lib.c:2014:target_recovery_overseer()) recovery is aborted, evict exports in recovery 2016-12-13 14:56:23 [ 695.836422] Lustre: lquake-MDT000a: Recovery over after 9:11, of 31 clients 15 recovered and 16 were evicted. Console [jet11] log at 2016-12-13 15:00:00 PST. Console [jet11] log at 2016-12-13 16:00:00 PST.